# 🎬 Day 1: Real-Time Global Streaming Trends - EDA with Live Data

## 📚 Learning Objectives
Today we'll master data analysis using **real-time entertainment data from TMDB API**. By the end of this notebook, you'll be able to:

- **Fetch live data** from APIs using Python requests
- **Load and inspect** streaming data using pandas
- **Explore data structure** with `.head()`, `.info()`, `.describe()`
- **Handle missing data** and understand data quality
- **Create visualizations** with matplotlib and seaborn
- **Discover insights** about current global entertainment trends
- **Ask data-driven questions** and find answers through analysis

## 🎯 Real-World Application
We're analyzing **what the world is watching RIGHT NOW** using live TMDB API data. This helps:
- **Content creators** understand current trending genres and themes
- **Streaming platforms** optimize their recommendation algorithms
- **Investors** identify successful entertainment companies
- **Marketers** time their campaigns around popular content

---

## 🔧 1. Setting Up Our Environment

First, let's import the essential libraries for data analysis and API integration.

In [3]:
# Essential libraries for data analysis
import pandas as pd                 # Data manipulation and analysis
import numpy as np                  # Numerical computing
import matplotlib.pyplot as plt     # Basic plotting
import seaborn as sns              # Statistical visualization
import requests                    # API calls
import json                        # JSON data handling
from datetime import datetime      # Date/time operations
import warnings

# Configure plotting
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
warnings.filterwarnings('ignore')

print("✅ Libraries imported successfully!")
print(f"📅 Analysis date: {datetime.now().strftime('%Y-%m-%d %H:%M')}")
print("🎬 Ready to fetch REAL-TIME streaming data!")

✅ Libraries imported successfully!
📅 Analysis date: 2025-08-10 10:41
🎬 Ready to fetch REAL-TIME streaming data!


## 🌐 2. Real-Time Data Collection with TMDB API

Now let's create functions to fetch **live trending data** from The Movie Database (TMDB) API. This will give us actual current trending content!

In [4]:
# TMDB API Configuration
API_KEY = "36996c80d376c11630af7cd64254bcb7"  # Your TMDB API key
BASE_URL = "https://api.themoviedb.org/3"

# Genre mapping (TMDB uses numeric genre IDs)
GENRE_MAP = {
    28: "Action", 12: "Adventure", 16: "Animation", 35: "Comedy", 80: "Crime",
    99: "Documentary", 18: "Drama", 10751: "Family", 14: "Fantasy", 36: "History",
    27: "Horror", 10402: "Music", 9648: "Mystery", 10749: "Romance", 878: "Science Fiction",
    10770: "TV Movie", 53: "Thriller", 10752: "War", 37: "Western",
    10759: "Action & Adventure", 10762: "Kids", 10763: "News", 10764: "Reality",
    10765: "Sci-Fi & Fantasy", 10766: "Soap", 10767: "Talk", 10768: "War & Politics"
}

def fetch_trending_content(content_type="all", time_window="day"):
    """
    Fetch real-time trending content from TMDB API.
    
    Parameters:
    - content_type: 'all', 'movie', or 'tv'
    - time_window: 'day' or 'week'
    
    Returns:
    - pandas DataFrame with trending content
    """
    url = f"{BASE_URL}/trending/{content_type}/{time_window}?api_key={API_KEY}"
    
    try:
        print(f"📡 Fetching trending {content_type} for {time_window}...")
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for bad status codes
        
        data = response.json()
        results = data.get('results', [])
        
        print(f"✅ Successfully fetched {len(results)} trending titles!")
        return results
        
    except requests.exceptions.RequestException as e:
        print(f"❌ Error fetching data: {e}")
        return []

def process_streaming_data(raw_data):
    """
    Process raw TMDB API data into a clean pandas DataFrame.
    """
    processed_data = []
    
    for item in raw_data:
        # Determine if it's a movie or TV show
        content_type = "Movie" if "title" in item else "TV Show"
        title = item.get("title") or item.get("name", "Unknown")
        
        # Convert genre IDs to genre names
        genre_ids = item.get("genre_ids", [])
        genres = [GENRE_MAP.get(gid, f"Unknown_{gid}") for gid in genre_ids]
        genre_string = ",".join(genres) if genres else "Unknown"
        
        # Get release date
        release_date = item.get("release_date") or item.get("first_air_date", "Unknown")
        
        processed_item = {
            "title": title,
            "content_type": content_type,
            "vote_average": item.get("vote_average", 0),
            "popularity": item.get("popularity", 0),
            "vote_count": item.get("vote_count", 0),
            "genre_ids": genre_string,
            "release_date": release_date,
            "overview": item.get("overview", ""),
            "original_language": item.get("original_language", "Unknown"),
            "adult": item.get("adult", False),
            "tmdb_id": item.get("id", 0)
        }
        processed_data.append(processed_item)
    
    return pd.DataFrame(processed_data)

# Test API connection
print("🧪 Testing TMDB API connection...")
test_response = requests.get(f"{BASE_URL}/configuration?api_key={API_KEY}")
if test_response.status_code == 200:
    print("✅ TMDB API connection successful!")
    print("🎬 Ready to fetch real-time data!")
else:
    print(f"❌ API connection failed: {test_response.status_code}")

🧪 Testing TMDB API connection...
✅ TMDB API connection successful!
🎬 Ready to fetch real-time data!


## 📡 3. Fetching Live Streaming Data

Now let's fetch **actual current trending content** from TMDB! This is what's trending RIGHT NOW globally.

In [5]:
# Fetch current trending content
print("🌍 FETCHING REAL-TIME GLOBAL STREAMING TRENDS")
print("=" * 55)

# Get trending movies and TV shows separately for better analysis
trending_movies = fetch_trending_content("movie", "day")
trending_tv = fetch_trending_content("tv", "day")

# Combine and process the data
all_trending = trending_movies + trending_tv
df = process_streaming_data(all_trending)

print(f"\n📊 LIVE DATA SUMMARY:")
print(f"   Total trending titles: {len(df)}")
print(f"   Movies: {len(df[df['content_type'] == 'Movie'])}")
print(f"   TV Shows: {len(df[df['content_type'] == 'TV Show'])}")
print(f"   Data timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

print("\n🎉 SUCCESS! You're now analyzing REAL live streaming data!")

🌍 FETCHING REAL-TIME GLOBAL STREAMING TRENDS
📡 Fetching trending movie for day...
✅ Successfully fetched 20 trending titles!
📡 Fetching trending tv for day...
✅ Successfully fetched 20 trending titles!

📊 LIVE DATA SUMMARY:
   Total trending titles: 40
   Movies: 20
   TV Shows: 20
   Data timestamp: 2025-08-10 10:41:53

🎉 SUCCESS! You're now analyzing REAL live streaming data!


## 👀 4. First Look at LIVE Data

Let's explore the **actual current trending content** using pandas fundamentals!

In [6]:
# 📋 Basic dataset information
print("🎬 LIVE GLOBAL STREAMING TRENDS ANALYSIS")
print("=" * 50)
print(f"📊 Dataset shape: {df.shape[0]} rows × {df.shape[1]} columns")
print(f"📅 Live data timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"🌍 Source: TMDB API (real-time data)")
print("\n🔥 Current top 5 trending titles globally:")
display(df.head())

🎬 LIVE GLOBAL STREAMING TRENDS ANALYSIS
📊 Dataset shape: 40 rows × 11 columns
📅 Live data timestamp: 2025-08-10 10:41:55
🌍 Source: TMDB API (real-time data)

🔥 Current top 5 trending titles globally:


Unnamed: 0,title,content_type,vote_average,popularity,vote_count,genre_ids,release_date,overview,original_language,adult,tmdb_id
0,Jurassic World Rebirth,Movie,6.41,1236.3748,1283,"Science Fiction,Adventure,Action",2025-07-01,Five years after the events of Jurassic World ...,en,False,1234821
1,Weapons,Movie,7.8,208.1814,143,"Horror,Mystery",2025-08-06,When all but one child from the same class mys...,en,False,1078605
2,The Pickup,Movie,6.75,477.8545,150,"Action,Comedy,Crime",2025-07-27,A routine cash pickup takes a wild turn when m...,en,False,1106289
3,How to Train Your Dragon,Movie,8.026,396.9524,1513,"Fantasy,Family,Action,Adventure",2025-06-06,"On the rugged isle of Berk, where Vikings and ...",en,False,1087192
4,28 Years Later,Movie,6.881,266.1237,1018,"Horror,Thriller,Science Fiction",2025-06-18,Twenty-eight years since the rage virus escape...,en,False,1100988


In [8]:
print("All current trending titles(livedata)")
print("=" * 50)
print(f"Data Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f" Total titles: {len(df)}")
print("\n Complete list of whats currently trending:")
# display(df)
print("=" * 50)
for i, (idx,row) in enumerate(df.iterrows(),1):
    primary_genre= row['genre_ids'].split(',')[0] if row['genre_ids'] else 'Unknown'
    print("primary genre:", primary_genre)

All current trending titles(livedata)
Data Timestamp: 2025-08-10 10:45:37
 Total titles: 40

 Complete list of whats currently trending:
primary genre: Science Fiction
primary genre: Horror
primary genre: Action
primary genre: Fantasy
primary genre: Horror
primary genre: Animation
primary genre: Action
primary genre: Science Fiction
primary genre: Science Fiction
primary genre: Action
primary genre: Action
primary genre: Animation
primary genre: Science Fiction
primary genre: Science Fiction
primary genre: Documentary
primary genre: Comedy
primary genre: Fantasy
primary genre: Horror
primary genre: Action
primary genre: Horror
primary genre: Sci-Fi & Fantasy
primary genre: Drama
primary genre: Action & Adventure
primary genre: Crime
primary genre: Action & Adventure
primary genre: Drama
primary genre: Animation
primary genre: Animation
primary genre: Action & Adventure
primary genre: Drama
primary genre: Animation
primary genre: Sci-Fi & Fantasy
primary genre: Action & Adventure
primar

In [5]:
# 🔍 Dataset structure and data types
print("📋 LIVE DATASET STRUCTURE ANALYSIS")
print("=" * 40)
print("\n💾 Memory usage and data types:")
df.info()

print("\n📏 Column details:")
for i, col in enumerate(df.columns, 1):
    non_null = df[col].count()
    print(f"{i:2d}. {col:<20}: {non_null}/{len(df)} non-null values")

📋 LIVE DATASET STRUCTURE ANALYSIS

💾 Memory usage and data types:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 11 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   title              40 non-null     object 
 1   content_type       40 non-null     object 
 2   vote_average       40 non-null     float64
 3   popularity         40 non-null     float64
 4   vote_count         40 non-null     int64  
 5   genre_ids          40 non-null     object 
 6   release_date       40 non-null     object 
 7   overview           40 non-null     object 
 8   original_language  40 non-null     object 
 9   adult              40 non-null     bool   
 10  tmdb_id            40 non-null     int64  
dtypes: bool(1), float64(2), int64(2), object(6)
memory usage: 3.3+ KB

📏 Column details:
 1. title               : 40/40 non-null values
 2. content_type        : 40/40 non-null values
 3. vote_average     

In [6]:
# 📊 Statistical summary of current trending data
print("📈 LIVE DATA STATISTICAL SUMMARY")
print("=" * 40)
print("\n🔢 Current numerical trends:")
display(df.describe())

print("\n🎭 Content type distribution (live):")
content_dist = df['content_type'].value_counts()
for content_type, count in content_dist.items():
    percentage = (count / len(df)) * 100
    print(f"   {content_type}: {count} titles ({percentage:.1f}%)")

📈 LIVE DATA STATISTICAL SUMMARY

🔢 Current numerical trends:


Unnamed: 0,vote_average,popularity,vote_count,tmdb_id
count,40.0,40.0,40.0,40.0
mean,7.649325,210.060645,2252.975,543593.2
std,0.99978,362.730996,4788.249734,486464.7
min,4.467,16.1332,6.0,1396.0
25%,6.9935,43.61895,112.0,109944.8
50%,7.777,98.14645,740.5,278371.5
75%,8.371,194.48215,1537.5,1064002.0
max,9.667,1951.6717,25343.0,1513598.0



🎭 Content type distribution (live):
   Movie: 20 titles (50.0%)
   TV Show: 20 titles (50.0%)


### 🔍 Key Observations from LIVE Data

**What's Trending RIGHT NOW:**
- **Real-time snapshot**: This data reflects what people are watching TODAY
- **Global trends**: Data represents worldwide entertainment preferences
- **Current popularity**: Popularity scores show today's engagement levels
- **Fresh content**: Mix of new releases and enduring favorites

**Questions We'll Answer:**
1. What genres are dominating today's trends?
2. Are movies or TV shows more popular right now?
3. What's the relationship between ratings and current popularity?
4. Which languages/regions are trending globally?

## 🔍 5. Live Data Quality Assessment

Let's check the quality of our real-time data and handle any issues.

In [None]:
# 🔍 Missing data analysis for live data
print("🕳️ LIVE DATA QUALITY ANALYSIS")
print("=" * 40)

missing_data = df.isnull().sum()
missing_percentage = (missing_data / len(df)) * 100

missing_summary = pd.DataFrame({
    'Missing Count': missing_data,
    'Missing %': missing_percentage.round(2)
})

print("\n📊 Missing data by column:")
missing_cols = missing_summary[missing_summary['Missing Count'] > 0]
if len(missing_cols) > 0:
    display(missing_cols)
    print(f"⚠️ Found {missing_summary['Missing Count'].sum()} missing values total")
else:
    print("✅ Excellent! No missing data in live dataset.")

# Check for empty or unknown values
print("\n🔍 Data completeness check:")
unknown_genres = len(df[df['genre_ids'].str.contains('Unknown', na=False)])
unknown_dates = len(df[df['release_date'] == 'Unknown'])
print(f"   Unknown genres: {unknown_genres} titles")
print(f"   Unknown release dates: {unknown_dates} titles")
print(f"   Zero ratings: {len(df[df['vote_average'] == 0])} titles")

In [None]:
# 🔧 Data preprocessing and enhancement
print("🔧 LIVE DATA PREPROCESSING")
print("=" * 35)

# Convert release_date to datetime (handle unknown dates)
df['release_date_clean'] = pd.to_datetime(df['release_date'], errors='coerce')
valid_dates = df['release_date_clean'].notna().sum()
print(f"✅ Converted {valid_dates}/{len(df)} release dates to datetime format")

# Create additional useful columns
df['release_year'] = df['release_date_clean'].dt.year
df['content_age_days'] = (datetime.now() - df['release_date_clean']).dt.days
df['is_recent'] = df['content_age_days'] <= 365  # Released in last year
df['popularity_tier'] = pd.cut(df['popularity'], bins=3, labels=['Low', 'Medium', 'High'])
df['rating_tier'] = pd.cut(df['vote_average'], bins=[0, 6, 8, 10], labels=['Average', 'Good', 'Excellent'])

print("✅ Added enhanced columns:")
print("   • release_year: Year of release")
print("   • content_age_days: Age in days")
print("   • is_recent: Released in last 365 days")
print("   • popularity_tier: Low/Medium/High popularity")
print("   • rating_tier: Average/Good/Excellent ratings")

# Show current data types
print("\n📋 Enhanced data structure:")
for col, dtype in df.dtypes.items():
    print(f"  {col:<20}: {dtype}")

## 📊 6. Live Trending Analysis

Now let's analyze what's **actually trending right now** using pandas operations!

In [None]:
# 🔥 Current trending analysis
print("🔥 WHAT'S TRENDING RIGHT NOW")
print("=" * 35)

# Top trending by popularity
top_trending = df.nlargest(10, 'popularity')[['title', 'content_type', 'popularity', 'vote_average']]
print("\n🏆 Top 10 most popular titles TODAY:")
for i, (idx, row) in enumerate(top_trending.iterrows(), 1):
    print(f"  {i:2d}. {row['title']} ({row['content_type']})")
    print(f"       Popularity: {row['popularity']:.1f} | Rating: {row['vote_average']:.1f}/10")

# Rating analysis
print(f"\n⭐ CURRENT RATING LANDSCAPE:")
print(f"   Average rating: {df['vote_average'].mean():.2f}/10")
print(f"   Highest rated: {df['vote_average'].max():.1f}/10")
print(f"   Most votes: {df['vote_count'].max():,} votes")

# Content type performance
content_stats = df.groupby('content_type').agg({
    'popularity': 'mean',
    'vote_average': 'mean',
    'vote_count': 'mean'
}).round(2)

print("\n📺 Movies vs TV Shows (current trends):")
display(content_stats)

In [None]:
# 🎭 Current genre trends
print("🎭 TODAY'S GENRE TRENDS")
print("=" * 30)

# Extract all genres from current trending content
all_current_genres = []
for genres in df['genre_ids']:
    if pd.notna(genres) and genres != 'Unknown':
        genre_list = [g.strip() for g in genres.split(',')]
        all_current_genres.extend(genre_list)

# Count current genre popularity
current_genre_counts = pd.Series(all_current_genres).value_counts()

print("\n🎬 Most trending genres TODAY:")
for i, (genre, count) in enumerate(current_genre_counts.head(10).items(), 1):
    percentage = (count / len(df)) * 100
    print(f"   {i:2d}. {genre:<15}: {count:2d} titles ({percentage:.1f}%)")

print(f"\n📊 Total unique genres trending: {len(current_genre_counts)}")
print(f"💡 Top 3 genres: {', '.join(current_genre_counts.head(3).index.tolist())}")

In [None]:
# 🌍 Global language trends
print("🌍 GLOBAL LANGUAGE TRENDS")
print("=" * 30)

language_trends = df['original_language'].value_counts().head(10)
print("\n🗣️ Most popular languages in trending content:")
for i, (lang, count) in enumerate(language_trends.items(), 1):
    percentage = (count / len(df)) * 100
    print(f"   {i:2d}. {lang.upper():<4}: {count:2d} titles ({percentage:.1f}%)")

# Recent vs older content
recent_count = df['is_recent'].sum()
print(f"\n📅 CONTENT RECENCY:")
print(f"   Recent (last year): {recent_count} titles ({recent_count/len(df)*100:.1f}%)")
print(f"   Older content: {len(df) - recent_count} titles ({(len(df) - recent_count)/len(df)*100:.1f}%)")

## 📈 7. Live Data Visualization

Let's create compelling visualizations of **current trending patterns**!

In [None]:
# 📊 Current trending content visualization
fig, axes = plt.subplots(2, 2, figsize=(18, 12))
fig.suptitle('🔥 LIVE GLOBAL STREAMING TRENDS ANALYSIS', fontsize=16, fontweight='bold')

# 1. Popularity distribution
axes[0,0].hist(df['popularity'], bins=15, color='lightcoral', alpha=0.7, edgecolor='black')
axes[0,0].axvline(df['popularity'].mean(), color='red', linestyle='--', 
                 label=f'Mean: {df["popularity"].mean():.1f}')
axes[0,0].set_title('Current Popularity Distribution')
axes[0,0].set_xlabel('Popularity Score')
axes[0,0].set_ylabel('Number of Titles')
axes[0,0].legend()
axes[0,0].grid(True, alpha=0.3)

# 2. Rating vs Popularity scatter
colors = ['red' if ct == 'Movie' else 'blue' for ct in df['content_type']]
axes[0,1].scatter(df['vote_average'], df['popularity'], c=colors, alpha=0.6, s=60)
axes[0,1].set_title('Rating vs Popularity (Live Data)')
axes[0,1].set_xlabel('Vote Average (Rating)')
axes[0,1].set_ylabel('Popularity Score')
axes[0,1].grid(True, alpha=0.3)

# Add legend for scatter plot
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor='red', label='Movies'),
                  Patch(facecolor='blue', label='TV Shows')]
axes[0,1].legend(handles=legend_elements)

# 3. Content type comparison
content_stats = df.groupby('content_type')[['popularity', 'vote_average']].mean()
x = np.arange(len(content_stats.index))
width = 0.35

axes[1,0].bar(x - width/2, content_stats['popularity'], width, 
              label='Avg Popularity', color='lightgreen', alpha=0.7)
axes[1,0].bar(x + width/2, content_stats['vote_average']*100, width, 
              label='Avg Rating (×100)', color='lightblue', alpha=0.7)
axes[1,0].set_title('Movies vs TV Shows Performance')
axes[1,0].set_xlabel('Content Type')
axes[1,0].set_ylabel('Score')
axes[1,0].set_xticks(x)
axes[1,0].set_xticklabels(content_stats.index)
axes[1,0].legend()
axes[1,0].grid(True, alpha=0.3)

# 4. Top genres
top_genres_plot = current_genre_counts.head(8)
axes[1,1].barh(range(len(top_genres_plot)), top_genres_plot.values, color='gold', alpha=0.7)
axes[1,1].set_title('Top Trending Genres Today')
axes[1,1].set_xlabel('Number of Titles')
axes[1,1].set_yticks(range(len(top_genres_plot)))
axes[1,1].set_yticklabels(top_genres_plot.index)
axes[1,1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("📊 Live Data Insights:")
print(f"   • Currently trending: {len(df)} titles globally")
print(f"   • Top genre today: {current_genre_counts.index[0]}")
print(f"   • Average popularity: {df['popularity'].mean():.1f}")
print(f"   • Average rating: {df['vote_average'].mean():.2f}/10")

In [None]:
# 🔗 Live data correlation analysis
print("🔗 LIVE DATA CORRELATION ANALYSIS")
print("=" * 40)

# Select numerical columns for correlation
numerical_cols = ['vote_average', 'popularity', 'vote_count', 'content_age_days']
correlation_data = df[numerical_cols].dropna()

# Calculate correlation matrix
correlation_matrix = correlation_data.corr()

# Create correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, 
            annot=True, 
            cmap='RdYlBu_r', 
            center=0,
            square=True,
            fmt='.3f',
            cbar_kws={'shrink': 0.8})

plt.title('🔗 Live Entertainment Data Correlations', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Highlight key correlations
rating_pop_corr = df['vote_average'].corr(df['popularity'])
votes_pop_corr = df['vote_count'].corr(df['popularity'])

print("\n🔍 Key correlations in current trends:")
print(f"   Rating ↔ Popularity: {rating_pop_corr:.3f}")
print(f"   Vote Count ↔ Popularity: {votes_pop_corr:.3f}")

if abs(rating_pop_corr) < 0.3:
    print("   💡 Insight: High ratings don't guarantee trending status!")
else:
    print("   💡 Insight: Quality and popularity are linked in current trends")

## 🎯 8. Live Data Business Insights

Let's extract actionable insights from **today's actual trending data**!

In [None]:
# 🏆 Current success patterns
print("🏆 LIVE SUCCESS PATTERN ANALYSIS")
print("=" * 45)

# Define success based on current data
high_pop_threshold = df['popularity'].quantile(0.75)
high_rating_threshold = df['vote_average'].quantile(0.75)

# Categorize current trending content
df['current_success'] = 'Standard Trending'
df.loc[(df['popularity'] >= high_pop_threshold) & 
       (df['vote_average'] >= high_rating_threshold), 'current_success'] = 'Viral Hit'
df.loc[(df['vote_average'] >= high_rating_threshold) & 
       (df['popularity'] < high_pop_threshold), 'current_success'] = 'Critical Darling'
df.loc[(df['popularity'] >= high_pop_threshold) & 
       (df['vote_average'] < high_rating_threshold), 'current_success'] = 'Popular Buzz'

success_dist = df['current_success'].value_counts()
print("📊 Current trending success categories:")
for category, count in success_dist.items():
    percentage = (count / len(df)) * 100
    print(f"   {category:<18}: {count:2d} titles ({percentage:.1f}%)")

# Show viral hits
viral_hits = df[df['current_success'] == 'Viral Hit']
if len(viral_hits) > 0:
    print(f"\n🌟 Current Viral Hits ({len(viral_hits)} titles):")
    for idx, row in viral_hits.iterrows():
        print(f"   • {row['title']} ({row['content_type']})")
        print(f"     Rating: {row['vote_average']}/10 | Popularity: {row['popularity']:.1f}")

# Language success patterns
print(f"\n🌍 Global reach analysis:")
non_english = len(df[df['original_language'] != 'en'])
print(f"   Non-English content: {non_english}/{len(df)} ({non_english/len(df)*100:.1f}%)")
if non_english > 0:
    print("   💡 Global content is successfully trending!")

In [None]:
# 💼 Market opportunities from live data
print("💼 CURRENT MARKET OPPORTUNITIES")
print("=" * 40)

# Genre performance analysis
genre_performance = []
for genre in current_genre_counts.head(8).index:
    genre_content = df[df['genre_ids'].str.contains(genre, na=False)]
    if len(genre_content) > 0:
        avg_rating = genre_content['vote_average'].mean()
        avg_popularity = genre_content['popularity'].mean()
        count = len(genre_content)
        
        genre_performance.append({
            'genre': genre,
            'count': count,
            'avg_rating': avg_rating,
            'avg_popularity': avg_popularity
        })

genre_df = pd.DataFrame(genre_performance)
genre_df = genre_df.sort_values('avg_popularity', ascending=False)

print("🎭 Genre performance in current trends:")
print("-" * 55)
print(f"{'Genre':<15} {'Count':<6} {'Avg Rating':<12} {'Avg Popularity':<15}")
print("-" * 55)

for _, row in genre_df.iterrows():
    print(f"{row['genre']:<15} {row['count']:<6} {row['avg_rating']:<12.2f} {row['avg_popularity']:<15.1f}")

# Strategic recommendations
print("\n🎯 STRATEGIC RECOMMENDATIONS (based on live data):")
print("-" * 60)

top_genre = genre_df.iloc[0]
print(f"1. 🏆 Focus on {top_genre['genre']} content")
print(f"   Currently most successful with {top_genre['avg_popularity']:.1f} avg popularity")

movies_avg = df[df['content_type'] == 'Movie']['popularity'].mean()
tv_avg = df[df['content_type'] == 'TV Show']['popularity'].mean()

if movies_avg > tv_avg:
    print(f"2. 🎬 Movies are trending higher today ({movies_avg:.1f} vs {tv_avg:.1f})")
    print(f"   Consider prioritizing movie content")
else:
    print(f"2. 📺 TV Shows are trending higher today ({tv_avg:.1f} vs {movies_avg:.1f})")
    print(f"   Consider prioritizing series content")

recent_trending = df[df['is_recent'] == True]
if len(recent_trending) > len(df) / 2:
    print(f"3. 🆕 Fresh content dominates ({len(recent_trending)}/{len(df)} recent titles)")
    print(f"   Focus on new releases for trending success")
else:
    print(f"3. 📚 Classic content is strong ({len(df) - len(recent_trending)}/{len(df)} older titles)")
    print(f"   Evergreen content still has trending power")

## 📋 9. Live Data Key Findings

Let's summarize the most important discoveries from **today's actual trending data**!

In [None]:
# 📋 Generate live data summary
print("🎬 LIVE GLOBAL STREAMING TRENDS - KEY FINDINGS")
print("=" * 55)
print(f"📅 Live analysis timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"📊 Dataset: {len(df)} currently trending titles worldwide")
print(f"🌍 Source: TMDB API (real-time data)")
print("\n" + "=" * 55)

print("\n🏆 TODAY'S TOP FINDINGS:")
print("-" * 25)

# Finding 1: Current content mix
movie_count = len(df[df['content_type'] == 'Movie'])
tv_count = len(df[df['content_type'] == 'TV Show'])
print(f"1. 📺 Live Content Mix: {movie_count} movies, {tv_count} TV shows")
if movie_count > tv_count:
    print(f"   Movies dominate today's trends")
elif tv_count > movie_count:
    print(f"   TV shows dominate today's trends")
else:
    print(f"   Even split in today's trending content")

# Finding 2: Quality vs popularity today
today_correlation = df['vote_average'].corr(df['popularity'])
print(f"\n2. 🔗 Today's Success Pattern: Rating vs Popularity = {today_correlation:.3f}")
if abs(today_correlation) < 0.3:
    print(f"   Today: Quality doesn't guarantee viral success!")
else:
    print(f"   Today: Quality and virality are linked")

# Finding 3: Current genre dominance
top_3_current = current_genre_counts.head(3)
print(f"\n3. 🎭 Today's Genre Leaders: {', '.join(top_3_current.index)}")
print(f"   These genres appear in {top_3_current.sum()}/{len(df)} trending titles today")

# Finding 4: Global reach
english_count = len(df[df['original_language'] == 'en'])
global_count = len(df) - english_count
print(f"\n4. 🌍 Global Diversity: {global_count}/{len(df)} non-English titles trending")
print(f"   International content: {global_count/len(df)*100:.1f}% of trends")

# Finding 5: Today's champions
most_popular_now = df.loc[df['popularity'].idxmax()]
highest_rated_now = df.loc[df['vote_average'].idxmax()]
print(f"\n5. 🌟 Today's Champions:")
print(f"   Most Popular NOW: {most_popular_now['title']} (Pop: {most_popular_now['popularity']:.0f})")
print(f"   Highest Rated NOW: {highest_rated_now['title']} ({highest_rated_now['vote_average']}/10)")

print("\n" + "=" * 55)
print("💡 LIVE DATA BUSINESS INSIGHTS:")
print("-" * 35)
print(f"• Today's winning genre: {current_genre_counts.index[0]}")
print(f"• Current average rating: {df['vote_average'].mean():.2f}/10")
print(f"• Global content is {global_count/len(df)*100:.1f}% of trends")
print(f"• Fresh vs classic content split: {len(recent_trending)}/{len(df)} are recent")
print(f"• Real-time data reveals actual audience preferences")

print("\n🎯 NEXT STEPS:")
print("-" * 15)
print("Tomorrow: Clean this live data and engineer features")
print("Day 3: Build popularity prediction model with real data")
print("Week 2: Advanced recommendation algorithms")
print("Week 7: Deploy live 'What Should I Watch?' web app")

print("\n✅ Day 1 Complete: LIVE Entertainment Data EDA Mastery Achieved! 🎉")
print("🚀 You analyzed REAL trending data - this is actual industry-level analysis!")

## 🎓 Learning Reflection & Next Steps

### 📚 What You Mastered Today with LIVE Data

**API Integration Skills:**
- ✅ **Real-time data fetching** from TMDB API
- ✅ **Error handling** for API calls
- ✅ **Data transformation** from JSON to pandas
- ✅ **Professional data processing** workflows

**Enhanced Pandas Skills:**
- ✅ **Live data exploration** with `.head()`, `.info()`, `.describe()`
- ✅ **Real-time groupby** operations for current trends
- ✅ **Dynamic filtering** based on live popularity/ratings
- ✅ **Current pattern recognition** in entertainment data

**Professional Analysis:**
- ✅ **Industry-standard data sources** (TMDB API)
- ✅ **Real-time market analysis** capabilities
- ✅ **Live business insights** generation
- ✅ **Current trend visualization** techniques

### 🎯 Tomorrow's Challenge: Live Data Cleaning & Feature Engineering

**Day 2 Preview with Real Data:**
- Handle missing values in live streaming datasets
- Engineer features from real API responses
- Process text data (genres, overviews) from actual content
- Prepare live data for machine learning models

### 💼 Professional Impact

**What makes this analysis industry-ready:**
- ✅ **Real data source**: Using actual TMDB API (Netflix, Hulu, etc. use similar)
- ✅ **Live insights**: Analysis reflects current market conditions
- ✅ **Scalable approach**: Code works for any day's trending data
- ✅ **Business relevance**: Insights directly applicable to content strategy

### 🔗 Real-World Applications

Your live data skills apply directly to:
- **Streaming platforms**: Real-time recommendation system updates
- **Content studios**: Live market trend monitoring
- **Marketing agencies**: Current campaign optimization
- **Investment firms**: Real-time entertainment market analysis
- **Any data role**: Live API integration and analysis

---

**🎉 Congratulations! You've completed Day 1 with REAL live streaming data!**

*You didn't just learn pandas - you built an industry-standard live data analysis pipeline!*

**Tomorrow we'll clean and engineer features from this live data to build powerful ML models!**