**Install all the required Python libraries for this project:**

- `pika` for RabbitMQ communication
- `google-api-python-client` for fetching YouTube data
- `plotly` and `nbformat` for advanced visualization in notebooks


In [None]:
!pip install pika plotly
!pip install nbformat --upgrade
!pip install pika plotly google-api-python-client nbformat


**Set up your credentials and list the YouTube videos to analyze:**

- Add your YouTube Data API key and CloudAMQP connection URL.
- Specify the YouTube video links for which you want to fetch statistics.


In [2]:
api_key = "AIzaSyB9BLZ9ltMmiR3ZmxuM5XEFvPYsIjQQa5c"
amqp_url = "amqps://hiszopzg:hQQb6aEoIIAg93ma2nr8q9u3LaUNnj4m@leopard.lmq.cloudamqp.com/hiszopzg"

video_urls = [
    "https://youtu.be/b_qCoZtwJyo?si=E1--BWV6BeJyfogi",      # ipad 16
    "https://www.youtube.com/watch?v=XqZsoesa55w",           # Baby Shark, comments turned off
    "https://www.youtube.com/watch?v=kJQP7kiw5Fk",           # Despacito
    "https://www.youtube.com/watch?v=JGwWNGJdvx8"            # Shape of You
]


**Define a utility function to extract video IDs from YouTube links:**

- This handles both long and short-form YouTube URLs.
- We'll need these IDs to fetch video stats via the API.


In [9]:
import re

def get_video_id(url):
    """Extracts the video ID from any standard or short YouTube URL."""
    match = re.search(r"(?:v=|\/)([0-9A-Za-z_-]{11})", url)
    return match.group(1) if match else None

video_ids = [get_video_id(url) for url in video_urls if get_video_id(url)]


**Producer code:**

- Connects to both YouTube Data API and RabbitMQ.
- For each video, fetches current stats (viewers, likes, comments).
- Sends each stat as a message to the RabbitMQ queue.
- You can choose to run in a loop (for real-time data) or once for a "snapshot".


In [10]:
from googleapiclient.discovery import build
import pika
import json
import time

youtube = build("youtube", "v3", developerKey=api_key)
params = pika.URLParameters(amqp_url)
connection = pika.BlockingConnection(params)
channel = connection.channel()
queue_name = "youtube_stats"
channel.queue_declare(queue=queue_name)

# One-time fetch for each video (for repeated polling, add a loop here)
for video_id in video_ids:
    res = youtube.videos().list(part="statistics,snippet", id=video_id).execute()
    if not res.get("items"):
        print(f"Video {video_id} not found or cannot access.")
        continue
    stats = res["items"][0]["statistics"]
    snippet = res["items"][0]["snippet"]
    message = {
        "video_id": video_id,
        "title": snippet.get("title"),
        "viewers": int(stats.get("viewCount", 0)),
        "likes": int(stats.get("likeCount", 0)),
        "comments": int(stats.get("commentCount", 0)),
        "timestamp": time.time(),
    }
    channel.basic_publish(
        exchange='',
        routing_key=queue_name,
        body=json.dumps(message)
    )
    print(f"Sent: {message}")

connection.close()


Sent: {'video_id': 'b_qCoZtwJyo', 'title': 'iPadOS 26 Features || who needs a MacBook nowðŸ¥¸? || in Telugu', 'viewers': 129318, 'likes': 5590, 'comments': 210, 'timestamp': 1758888964.2588065}
Sent: {'video_id': 'XqZsoesa55w', 'title': 'Baby Shark Dance | #babyshark Most Viewed Video | Animal Songs | PINKFONG Songs for Children', 'viewers': 16281028259, 'likes': 45914193, 'comments': 0, 'timestamp': 1758888964.44761}
Sent: {'video_id': 'kJQP7kiw5Fk', 'title': 'Luis Fonsi - Despacito ft. Daddy Yankee', 'viewers': 8822837493, 'likes': 55284554, 'comments': 4321778, 'timestamp': 1758888964.6451387}
Sent: {'video_id': 'JGwWNGJdvx8', 'title': 'Ed Sheeran - Shape of You (Official Music Video)', 'viewers': 6569043037, 'likes': 34826518, 'comments': 1212890, 'timestamp': 1758888964.8264928}


**Consumer code:**

- Listens to messages from the RabbitMQ queue.
- Stores incoming data in the `received_stats` list for later analysis.
- Can be set to run indefinitely, or exit after collecting a fixed number of messages.


In [11]:
import pika
import json

received_stats = []

def callback(ch, method, properties, body):
    msg = json.loads(body)
    received_stats.append(msg)
    print(f"Received: {msg}")

params = pika.URLParameters(amqp_url)
connection = pika.BlockingConnection(params)
channel = connection.channel()
queue_name = "youtube_stats"
channel.queue_declare(queue=queue_name)
channel.basic_consume(queue=queue_name, on_message_callback=callback, auto_ack=True)

print("Waiting for messages...")
try:
    channel.start_consuming()
except KeyboardInterrupt:
    channel.stop_consuming()
    connection.close()


Waiting for messages...
Received: {'video_id': 'b_qCoZtwJyo', 'title': 'iPadOS 26 Features || who needs a MacBook nowðŸ¥¸? || in Telugu', 'viewers': 129318, 'likes': 5590, 'comments': 210, 'timestamp': 1758888743.0916035}
Received: {'video_id': 'XqZsoesa55w', 'title': 'Baby Shark Dance | #babyshark Most Viewed Video | Animal Songs | PINKFONG Songs for Children', 'viewers': 16281018461, 'likes': 45914164, 'comments': 0, 'timestamp': 1758888743.2734973}
Received: {'video_id': 'kJQP7kiw5Fk', 'title': 'Luis Fonsi - Despacito ft. Daddy Yankee', 'viewers': 8822837492, 'likes': 55284541, 'comments': 4321774, 'timestamp': 1758888743.4767542}
Received: {'video_id': 'JGwWNGJdvx8', 'title': 'Ed Sheeran - Shape of You (Official Music Video)', 'viewers': 6569040775, 'likes': 34826502, 'comments': 1212890, 'timestamp': 1758888743.68681}
Received: {'video_id': 'b_qCoZtwJyo', 'title': 'iPadOS 26 Features || who needs a MacBook nowðŸ¥¸? || in Telugu', 'viewers': 129318, 'likes': 5590, 'comments': 210, 

**Visualization: Separated time series plots**

- Shows viewers, likes, and comments over time for each trimmed video title.
- Uses subplots so all series are easy to distinguish.
- Big, clear legend at the top explains which color/label corresponds to which video.


In [12]:
import plotly.graph_objects as go
from collections import defaultdict

def trim_title(title, num_words=3):
    """Trim title to the first few words for clear labeling."""
    if not title:
        return "Unknown"
    return " ".join(title.split()[:num_words])

# Collect time series data per video
video_stats = defaultdict(lambda: {"timestamps": [], "viewers": []})
for stat in received_stats:
    video = trim_title(stat.get("title", stat.get("video_id", "Unknown")))
    video_stats[video]["timestamps"].append(stat["timestamp"])
    video_stats[video]["viewers"].append(stat["viewers"])

fig = go.Figure()
for vid, stats in video_stats.items():
    fig.add_trace(go.Scatter(
        x=stats["timestamps"],
        y=stats["viewers"],
        mode='lines+markers',
        name=vid
    ))

fig.update_layout(
    title="YouTube Video Views Over Time",
    xaxis_title="Timestamp",
    yaxis_title="Views",
    height=600,
    width=1000,
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.08,
        xanchor="center",
        x=0.5
    ),
    font=dict(size=16)
)
fig.show()


In [None]:
import plotly.express as px

summary = {}
for stat in received_stats:
    video = trim_title(stat.get("title", stat.get("video_id", "Unknown")))
    summary[video] = stat

videos = list(summary.keys())
total_viewers = [summary[vid]["viewers"] for vid in videos]
total_likes = [summary[vid]["likes"] for vid in videos]
total_comments = [summary[vid]["comments"] for vid in videos]

fig_total = px.bar(
    x=videos,
    y=total_viewers,
    labels={'x': 'Video', 'y': 'Total Views'},
    title="Total Views by Video",
    text=total_viewers
)
fig_total.show()

fig_likes = px.bar(
    x=videos,
    y=total_likes,
    labels={'x': 'Video', 'y': 'Total Likes'},
    title="Total Likes by Video",
    text=total_likes
)
fig_likes.show()

fig_comments = px.bar(
    x=videos,
    y=total_comments,
    labels={'x': 'Video', 'y': 'Total Comments'},
    title="Total Comments by Video",
    text=total_comments
)
fig_comments.show()


**Pie chart: Proportion of total views**

- Shows the share of total views for each video in the dataset.
- Uses only the first three words of the video title for labeling, for clarity.


In [None]:
import plotly.express as px

def trim_title(title, num_words=3):
    if not title:
        return "Unknown"
    return ' '.join(title.split()[:num_words])

summary = {}
for stat in received_stats:
    video = trim_title(stat.get("title", stat.get("video_id", "Unknown")))
    summary[video] = stat

videos = list(summary.keys())
total_viewers = [summary[vid]["viewers"] for vid in videos]

fig_pie = px.pie(
    names=videos,
    values=total_viewers,
    title="Proportion of Total Views by Video (First Three Words Only)"
)
fig_pie.show()


**Scatter plot: Likes vs. Views**

- Each point corresponds to a video, positioned by its total likes (y) and total views (x).
- Offers insight into how engagement (likes) compares with popularity (views).


In [73]:
total_likes = [summary[vid]["likes"] for vid in videos]

fig_scatter = px.scatter(
    x=total_viewers,
    y=total_likes,
    text=videos,
    labels={'x': 'Total Views', 'y': 'Total Likes'},
    title="Likes vs. Views (Engagement vs. Popularity)"
)
fig_scatter.update_traces(textposition='top center')
fig_scatter.show()


**Correlation heatmap: Relationships between views, likes, and comments**

- Shows how strongly each metric is linearly related to the others.
- Larger, labeled heatmap for easy interpretation and no overlap of labels.
- Only generated if enough datapoints exist (more than 5 received stats).


In [75]:
import pandas as pd
import plotly.figure_factory as ff

if len(received_stats) > 5:
    df = pd.DataFrame(received_stats)
    corr_matrix = df[["viewers", "likes", "comments"]].corr()
    fig_corr = ff.create_annotated_heatmap(
        z=corr_matrix.values,
        x=corr_matrix.columns.tolist(),
        y=corr_matrix.columns.tolist(),
        annotation_text=[[f"{v:.2f}" for v in row] for row in corr_matrix.values],
        colorscale='Viridis',
        font_colors=['white'],
    )
    fig_corr.update_layout(
        title="Correlation Matrix (Views, Likes, Comments)",
        title_font_size=28,
        width=750,
        height=750,
        margin=dict(l=120, r=60, t=120, b=60),
        xaxis=dict(tickfont=dict(size=20)),
        yaxis=dict(tickfont=dict(size=20))
    )
    for ann in fig_corr.layout.annotations:
        ann.font.size = 20
    fig_corr.show()
else:
    print("Not enough data points for a meaningful correlation heatmap.")
