# Visibility Table Analysis: Comprehensive Data Exploration

## Overview
This notebook provides an **insanely detailed analysis** of the Visibility_Table.csv dataset, which contains Google search result rankings for various topics across different dates. The analysis aims to uncover patterns in media visibility, algorithmic preferences, geographic biases, and content categorization.

## Research Questions
1. **Algorithmic Patterns**: How does Google's algorithm prioritize different media outlets?
2. **Geographic Distribution**: Which countries dominate search results and why?
3. **Media Category Analysis**: What types of media outlets (Mainstream, Independent, Foreign) rank highest?
4. **Topic-Specific Patterns**: Do certain topics favor specific outlet types or countries?
5. **Temporal Trends**: How do rankings change over time for similar topics?
6. **Sponsored vs Organic**: What's the distribution of sponsored content?
7. **Ranking Dynamics**: What factors correlate with higher rankings?
8. **Media Diversity**: How diverse are the top 10 results for each topic?

## Dataset Structure
- **Date**: Search date
- **Platform**: Search platform (all Google in this dataset)
- **Topic**: Search query/topic
- **Rank**: Position in search results (1-10)
- **Media Outlet**: Name of the media organization
- **Category**: Type of media outlet
- **Sponsored Ad?**: Whether the result is sponsored
- **Country**: Country of the media outlet
- **Notes / Observations**: Qualitative observations about the ranking


In [None]:
# Import all necessary libraries for comprehensive analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set visualization style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Configure display options for better data exploration
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 100)

print("Libraries imported successfully!")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"Matplotlib version: {plt.matplotlib.__version__}")
print(f"Seaborn version: {sns.__version__}")


In [None]:
# Load the dataset
df = pd.read_csv('../data/Visibility_Table.csv')

# Initial data inspection
print("=" * 80)
print("DATASET OVERVIEW")
print("=" * 80)
print(f"Total rows: {len(df):,}")
print(f"Total columns: {len(df.columns)}")
print(f"\nColumn names: {list(df.columns)}")
print(f"\nData types:\n{df.dtypes}")
print(f"\nFirst few rows:")
df.head(10)


In [None]:
# Comprehensive data quality check
print("=" * 80)
print("DATA QUALITY ASSESSMENT")
print("=" * 80)

# Check for missing values
print("\n1. MISSING VALUES:")
missing = df.isnull().sum()
missing_pct = (missing / len(df)) * 100
missing_df = pd.DataFrame({
    'Missing Count': missing,
    'Missing Percentage': missing_pct
})
print(missing_df[missing_df['Missing Count'] > 0])

# Check for duplicates
print(f"\n2. DUPLICATE ROWS: {df.duplicated().sum()}")

# Check data types and potential issues
print(f"\n3. DATA TYPE ISSUES:")
print(f"   Date column type: {df['Date'].dtype}")
print(f"   Rank column type: {df['Rank'].dtype}")
print(f"   Sponsored Ad column unique values: {df['Sponsored Ad?'].unique()}")

# Basic statistics
print(f"\n4. BASIC STATISTICS:")
print(f"   Unique dates: {df['Date'].nunique()}")
print(f"   Unique topics: {df['Topic'].nunique()}")
print(f"   Unique media outlets: {df['Media Outlet'].nunique()}")
print(f"   Unique countries: {df['Country'].nunique()}")
print(f"   Unique categories: {df['Category'].nunique()}")
print(f"   Rank range: {df['Rank'].min()} to {df['Rank'].max()}")


In [None]:
# Data cleaning and preprocessing
print("=" * 80)
print("DATA CLEANING & PREPROCESSING")
print("=" * 80)

# Convert Date to datetime
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y', errors='coerce')

# Clean up text columns (remove extra spaces, handle special cases)
df['Media Outlet'] = df['Media Outlet'].str.strip()
df['Category'] = df['Category'].str.strip()
df['Country'] = df['Country'].str.strip()
df['Topic'] = df['Topic'].str.strip()

# Handle N/A values in Country column
df['Country'] = df['Country'].replace('N/A', np.nan)

# Standardize Sponsored Ad column
df['Sponsored Ad?'] = df['Sponsored Ad?'].str.strip()

# Extract year, month, day for temporal analysis
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['DayOfWeek'] = df['Date'].dt.day_name()

# Create a topic identifier for easier analysis
df['Topic_ID'] = df.groupby('Topic').ngroup()

print("Data cleaning completed!")
print(f"\nDate range: {df['Date'].min()} to {df['Date'].max()}")
print(f"Total unique topics analyzed: {df['Topic'].nunique()}")


## Section 1: Temporal Analysis

**Question**: How does visibility change over time? Are there patterns in when topics are searched?

**Key Insights to Discover**:
- Distribution of searches across dates
- Day-of-week patterns
- Topic frequency over time
- Recency effects on rankings


In [None]:
# Temporal Analysis: Date Distribution
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Temporal Distribution Analysis', fontsize=16, fontweight='bold')

# 1. Searches by Date
date_counts = df.groupby('Date').size().sort_index()
axes[0, 0].plot(date_counts.index, date_counts.values, marker='o', linewidth=2, markersize=8)
axes[0, 0].set_title('Number of Search Results by Date', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Date')
axes[0, 0].set_ylabel('Number of Results')
axes[0, 0].tick_params(axis='x', rotation=45)
axes[0, 0].grid(True, alpha=0.3)

# 2. Day of Week Distribution
dow_counts = df['DayOfWeek'].value_counts().reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
axes[0, 1].bar(range(len(dow_counts)), dow_counts.values, color=sns.color_palette("husl", len(dow_counts)))
axes[0, 1].set_title('Search Results by Day of Week', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Day of Week')
axes[0, 1].set_ylabel('Number of Results')
axes[0, 1].set_xticks(range(len(dow_counts)))
axes[0, 1].set_xticklabels(dow_counts.index, rotation=45)
axes[0, 1].grid(True, alpha=0.3, axis='y')

# 3. Topics per Date
topics_per_date = df.groupby('Date')['Topic'].nunique().sort_index()
axes[1, 0].bar(range(len(topics_per_date)), topics_per_date.values, color='steelblue')
axes[1, 0].set_title('Unique Topics per Date', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Date Index')
axes[1, 0].set_ylabel('Number of Unique Topics')
axes[1, 0].tick_params(axis='x', rotation=45)
axes[1, 0].grid(True, alpha=0.3, axis='y')

# 4. Cumulative Topics Over Time
cumulative_topics = df.groupby('Date')['Topic'].nunique().cumsum()
axes[1, 1].plot(cumulative_topics.index, cumulative_topics.values, marker='o', linewidth=2, color='green', markersize=6)
axes[1, 1].set_title('Cumulative Unique Topics Over Time', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Date')
axes[1, 1].set_ylabel('Cumulative Unique Topics')
axes[1, 1].tick_params(axis='x', rotation=45)
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print summary statistics
print("=" * 80)
print("TEMPORAL ANALYSIS SUMMARY")
print("=" * 80)
print(f"Date range: {df['Date'].min().strftime('%Y-%m-%d')} to {df['Date'].max().strftime('%Y-%m-%d')}")
print(f"Total days covered: {(df['Date'].max() - df['Date'].min()).days + 1}")
print(f"Average results per day: {len(df) / df['Date'].nunique():.2f}")
print(f"Average unique topics per day: {df.groupby('Date')['Topic'].nunique().mean():.2f}")
print(f"\nDay of Week Distribution:")
print(dow_counts)


this 

In [None]:
# Geographic Analysis
fig, axes = plt.subplots(2, 2, figsize=(18, 14))
fig.suptitle('Geographic Distribution Analysis', fontsize=16, fontweight='bold')

# 1. Country Distribution (Top 15)
country_counts = df['Country'].value_counts().head(15)
axes[0, 0].barh(range(len(country_counts)), country_counts.values, color=sns.color_palette("viridis", len(country_counts)))
axes[0, 0].set_yticks(range(len(country_counts)))
axes[0, 0].set_yticklabels(country_counts.index)
axes[0, 0].set_title('Top 15 Countries by Number of Appearances', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Number of Appearances')
axes[0, 0].invert_yaxis()
axes[0, 0].grid(True, alpha=0.3, axis='x')

# Add value labels
for i, v in enumerate(country_counts.values):
    axes[0, 0].text(v + 1, i, str(v), va='center', fontweight='bold')

# 2. Country Distribution by Rank Position
country_rank = df.groupby(['Country', 'Rank']).size().unstack(fill_value=0)
top_countries = df['Country'].value_counts().head(10).index
country_rank_top = country_rank.loc[top_countries]

axes[0, 1].bar(range(len(country_rank_top)), country_rank_top.sum(axis=1), 
               color='steelblue', alpha=0.7, label='Total')
axes[0, 1].set_title('Top 10 Countries: Total Appearances', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Country')
axes[0, 1].set_ylabel('Total Appearances')
axes[0, 1].set_xticks(range(len(country_rank_top)))
axes[0, 1].set_xticklabels(country_rank_top.index, rotation=45, ha='right')
axes[0, 1].grid(True, alpha=0.3, axis='y')

# 3. Average Rank by Country (Top 15)
avg_rank_by_country = df.groupby('Country')['Rank'].mean().sort_values().head(15)
axes[1, 0].barh(range(len(avg_rank_by_country)), avg_rank_by_country.values, 
                color=sns.color_palette("plasma", len(avg_rank_by_country)))
axes[1, 0].set_yticks(range(len(avg_rank_by_country)))
axes[1, 0].set_yticklabels(avg_rank_by_country.index)
axes[1, 0].set_title('Average Rank by Country (Lower is Better)', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Average Rank')
axes[1, 0].invert_yaxis()
axes[1, 0].grid(True, alpha=0.3, axis='x')

# 4. Country Distribution in Top 3 vs Bottom 3 Ranks
top3_countries = df[df['Rank'] <= 3]['Country'].value_counts().head(10)
bottom3_countries = df[df['Rank'] >= 8]['Country'].value_counts().head(10)

x = np.arange(len(top3_countries))
width = 0.35
axes[1, 1].bar(x - width/2, top3_countries.values, width, label='Top 3 Ranks', alpha=0.8)
axes[1, 1].bar(x + width/2, bottom3_countries.reindex(top3_countries.index, fill_value=0).values, 
              width, label='Bottom 3 Ranks', alpha=0.8)
axes[1, 1].set_title('Top 10 Countries: Top 3 vs Bottom 3 Ranks', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Country')
axes[1, 1].set_ylabel('Number of Appearances')
axes[1, 1].set_xticks(x)
axes[1, 1].set_xticklabels(top3_countries.index, rotation=45, ha='right')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

# Detailed statistics
print("=" * 80)
print("GEOGRAPHIC ANALYSIS SUMMARY")
print("=" * 80)
print(f"Total unique countries: {df['Country'].nunique()}")
print(f"\nTop 10 Countries by Appearances:")
print(country_counts.head(10))
print(f"\nCountry with most appearances: {country_counts.index[0]} ({country_counts.iloc[0]} appearances)")
print(f"Percentage of all results: {(country_counts.iloc[0] / len(df)) * 100:.2f}%")
print(f"\nUS vs Non-US Distribution:")
us_count = (df['Country'] == 'US').sum()
non_us_count = len(df) - us_count - df['Country'].isna().sum()
print(f"  US: {us_count} ({us_count/len(df)*100:.2f}%)")
print(f"  Non-US: {non_us_count} ({non_us_count/len(df)*100:.2f}%)")
print(f"  N/A: {df['Country'].isna().sum()} ({df['Country'].isna().sum()/len(df)*100:.2f}%)")


In [None]:
# Media Category Analysis
fig, axes = plt.subplots(2, 2, figsize=(18, 14))
fig.suptitle('Media Category Analysis', fontsize=16, fontweight='bold')

# 1. Category Distribution
category_counts = df['Category'].value_counts()
axes[0, 0].barh(range(len(category_counts)), category_counts.values, 
                color=sns.color_palette("Set2", len(category_counts)))
axes[0, 0].set_yticks(range(len(category_counts)))
axes[0, 0].set_yticklabels(category_counts.index)
axes[0, 0].set_title('Media Category Distribution', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Number of Appearances')
axes[0, 0].invert_yaxis()
axes[0, 0].grid(True, alpha=0.3, axis='x')

# Add value labels
for i, v in enumerate(category_counts.values):
    axes[0, 0].text(v + 1, i, str(v), va='center', fontweight='bold')

# 2. Average Rank by Category
avg_rank_by_category = df.groupby('Category')['Rank'].mean().sort_values()
axes[0, 1].barh(range(len(avg_rank_by_category)), avg_rank_by_category.values,
                color=sns.color_palette("coolwarm", len(avg_rank_by_category)))
axes[0, 1].set_yticks(range(len(avg_rank_by_category)))
axes[0, 1].set_yticklabels(avg_rank_by_category.index)
axes[0, 1].set_title('Average Rank by Category (Lower is Better)', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Average Rank')
axes[0, 1].invert_yaxis()
axes[0, 1].grid(True, alpha=0.3, axis='x')

# 3. Category Distribution by Rank Position
category_rank_dist = df.groupby(['Category', 'Rank']).size().unstack(fill_value=0)
top_categories = df['Category'].value_counts().head(8).index
category_rank_top = category_rank_dist.loc[top_categories]

# Create stacked bar chart
category_rank_top.plot(kind='bar', stacked=True, ax=axes[1, 0], 
                       colormap='tab20', width=0.8)
axes[1, 0].set_title('Rank Distribution by Top Categories', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Category')
axes[1, 0].set_ylabel('Number of Appearances')
axes[1, 0].tick_params(axis='x', rotation=45)
axes[1, 0].legend(title='Rank', bbox_to_anchor=(1.05, 1), loc='upper left')
axes[1, 0].grid(True, alpha=0.3, axis='y')

# 4. Mainstream vs Independent vs Foreign Analysis
# Categorize into broader groups
def categorize_media(cat):
    cat_lower = str(cat).lower()
    if 'mainstream' in cat_lower and 'foreign' not in cat_lower:
        return 'Mainstream (US/UK)'
    elif 'foreign' in cat_lower and 'mainstream' in cat_lower:
        return 'Foreign Mainstream'
    elif 'independent' in cat_lower:
        return 'Independent'
    elif 'public' in cat_lower or 'service' in cat_lower:
        return 'Public Media'
    elif 'state' in cat_lower or 'affiliated' in cat_lower:
        return 'State-Affiliated'
    elif 'aggregated' in cat_lower:
        return 'Aggregated'
    else:
        return 'Other'

df['Media_Type'] = df['Category'].apply(categorize_media)
media_type_counts = df['Media_Type'].value_counts()

axes[1, 1].pie(media_type_counts.values, labels=media_type_counts.index, autopct='%1.1f%%',
               startangle=90, colors=sns.color_palette("Set3", len(media_type_counts)))
axes[1, 1].set_title('Media Type Distribution', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

# Detailed statistics
print("=" * 80)
print("MEDIA CATEGORY ANALYSIS SUMMARY")
print("=" * 80)
print(f"Total unique categories: {df['Category'].nunique()}")
print(f"\nCategory Distribution:")
print(category_counts)
print(f"\nAverage Rank by Category:")
print(avg_rank_by_category)
print(f"\nMedia Type Distribution:")
print(media_type_counts)


In [None]:
# Media Outlet Analysis
fig, axes = plt.subplots(2, 2, figsize=(20, 14))
fig.suptitle('Media Outlet Analysis', fontsize=16, fontweight='bold')

# 1. Top 20 Media Outlets by Appearances
outlet_counts = df['Media Outlet'].value_counts().head(20)
axes[0, 0].barh(range(len(outlet_counts)), outlet_counts.values,
                color=sns.color_palette("mako", len(outlet_counts)))
axes[0, 0].set_yticks(range(len(outlet_counts)))
axes[0, 0].set_yticklabels(outlet_counts.index)
axes[0, 0].set_title('Top 20 Media Outlets by Total Appearances', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Number of Appearances')
axes[0, 0].invert_yaxis()
axes[0, 0].grid(True, alpha=0.3, axis='x')

# Add value labels
for i, v in enumerate(outlet_counts.values):
    axes[0, 0].text(v + 0.5, i, str(v), va='center', fontweight='bold')

# 2. Average Rank for Top 20 Outlets
top_outlets = df['Media Outlet'].value_counts().head(20).index
avg_rank_outlets = df[df['Media Outlet'].isin(top_outlets)].groupby('Media Outlet')['Rank'].mean().sort_values()
axes[0, 1].barh(range(len(avg_rank_outlets)), avg_rank_outlets.values,
                color=sns.color_palette("rocket", len(avg_rank_outlets)))
axes[0, 1].set_yticks(range(len(avg_rank_outlets)))
axes[0, 1].set_yticklabels(avg_rank_outlets.index)
axes[0, 1].set_title('Average Rank for Top 20 Outlets (Lower is Better)', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Average Rank')
axes[0, 1].invert_yaxis()
axes[0, 1].grid(True, alpha=0.3, axis='x')

# 3. Outlets with Most #1 Rankings
rank1_outlets = df[df['Rank'] == 1]['Media Outlet'].value_counts().head(15)
axes[1, 0].bar(range(len(rank1_outlets)), rank1_outlets.values,
                color=sns.color_palette("viridis", len(rank1_outlets)))
axes[1, 0].set_title('Top 15 Outlets with Most #1 Rankings', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Media Outlet')
axes[1, 0].set_ylabel('Number of #1 Rankings')
axes[1, 0].set_xticks(range(len(rank1_outlets)))
axes[1, 0].set_xticklabels(rank1_outlets.index, rotation=45, ha='right')
axes[1, 0].grid(True, alpha=0.3, axis='y')

# Add value labels
for i, v in enumerate(rank1_outlets.values):
    axes[1, 0].text(i, v + 0.1, str(v), ha='center', fontweight='bold')

# 4. Market Concentration: Top 10 Outlets Share
top10_share = (outlet_counts.head(10).sum() / len(df)) * 100
top20_share = (outlet_counts.head(20).sum() / len(df)) * 100
total_outlets = df['Media Outlet'].nunique()

concentration_data = {
    'Top 10 Outlets': top10_share,
    'Top 20 Outlets': top20_share,
    'Remaining Outlets': 100 - top20_share
}

axes[1, 1].bar(concentration_data.keys(), concentration_data.values(),
               color=['#FF6B6B', '#4ECDC4', '#95E1D3'])
axes[1, 1].set_title('Market Concentration: Share of Total Appearances', fontsize=12, fontweight='bold')
axes[1, 1].set_ylabel('Percentage (%)')
axes[1, 1].grid(True, alpha=0.3, axis='y')

# Add value labels
for k, v in concentration_data.items():
    axes[1, 1].text(k, v + 1, f'{v:.1f}%', ha='center', fontweight='bold')

plt.tight_layout()
plt.show()

# Detailed statistics
print("=" * 80)
print("MEDIA OUTLET ANALYSIS SUMMARY")
print("=" * 80)
print(f"Total unique media outlets: {total_outlets}")
print(f"\nTop 10 Outlets by Appearances:")
print(outlet_counts.head(10))
print(f"\nTop 10 outlets account for {top10_share:.2f}% of all appearances")
print(f"Top 20 outlets account for {top20_share:.2f}% of all appearances")
print(f"\nOutlet with most appearances: {outlet_counts.index[0]} ({outlet_counts.iloc[0]} appearances)")
print(f"\nTop 10 Outlets with Most #1 Rankings:")
print(rank1_outlets.head(10))


## Section 5: Topic Analysis

**Question**: Which topics generate the most search results? Are there patterns in how different topics are covered?

**Key Insights to Discover**:
- Most searched topics
- Topic diversity
- Topic-specific media preferences
- Topic-country relationships


In [None]:
# Topic Analysis
fig, axes = plt.subplots(2, 2, figsize=(20, 14))
fig.suptitle('Topic Analysis', fontsize=16, fontweight='bold')

# 1. Top 20 Topics by Number of Results
topic_counts = df['Topic'].value_counts().head(20)
axes[0, 0].barh(range(len(topic_counts)), topic_counts.values,
                color=sns.color_palette("flare", len(topic_counts)))
axes[0, 0].set_yticks(range(len(topic_counts)))
axes[0, 0].set_yticklabels(topic_counts.index, fontsize=9)
axes[0, 0].set_title('Top 20 Topics by Number of Search Results', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Number of Results')
axes[0, 0].invert_yaxis()
axes[0, 0].grid(True, alpha=0.3, axis='x')

# Add value labels
for i, v in enumerate(topic_counts.values):
    axes[0, 0].text(v + 0.2, i, str(v), va='center', fontweight='bold', fontsize=8)

# 2. Topic Length Distribution
df['Topic_Length'] = df['Topic'].str.len()
axes[0, 1].hist(df['Topic_Length'], bins=30, color='steelblue', alpha=0.7, edgecolor='black')
axes[0, 1].set_title('Distribution of Topic Length (Characters)', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Number of Characters')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].axvline(df['Topic_Length'].mean(), color='red', linestyle='--', 
                   linewidth=2, label=f'Mean: {df["Topic_Length"].mean():.1f}')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3, axis='y')

# 3. Average Rank by Topic (for topics with 10+ results)
topic_avg_rank = df.groupby('Topic').agg({
    'Rank': 'mean',
    'Topic': 'count'
}).rename(columns={'Topic': 'Count'})
topic_avg_rank_filtered = topic_avg_rank[topic_avg_rank['Count'] >= 10].sort_values('Rank').head(15)

axes[1, 0].barh(range(len(topic_avg_rank_filtered)), topic_avg_rank_filtered['Rank'].values,
                color=sns.color_palette("coolwarm", len(topic_avg_rank_filtered)))
axes[1, 0].set_yticks(range(len(topic_avg_rank_filtered)))
axes[1, 0].set_yticklabels(topic_avg_rank_filtered.index, fontsize=9)
axes[1, 0].set_title('Average Rank by Topic (Topics with 10+ Results, Lower is Better)', 
                     fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Average Rank')
axes[1, 0].invert_yaxis()
axes[1, 0].grid(True, alpha=0.3, axis='x')

# 4. Topic Diversity: Number of Unique Outlets per Topic
topic_diversity = df.groupby('Topic')['Media Outlet'].nunique().sort_values(ascending=False).head(15)
axes[1, 1].bar(range(len(topic_diversity)), topic_diversity.values,
                color=sns.color_palette("mako", len(topic_diversity)))
axes[1, 1].set_title('Topic Diversity: Unique Outlets per Topic (Top 15)', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Topic')
axes[1, 1].set_ylabel('Number of Unique Outlets')
axes[1, 1].set_xticks(range(len(topic_diversity)))
axes[1, 1].set_xticklabels(topic_diversity.index, rotation=45, ha='right', fontsize=8)
axes[1, 1].grid(True, alpha=0.3, axis='y')

# Add value labels
for i, v in enumerate(topic_diversity.values):
    axes[1, 1].text(i, v + 0.2, str(v), ha='center', fontweight='bold', fontsize=8)

plt.tight_layout()
plt.show()

# Detailed statistics
print("=" * 80)
print("TOPIC ANALYSIS SUMMARY")
print("=" * 80)
print(f"Total unique topics: {df['Topic'].nunique()}")
print(f"Average results per topic: {len(df) / df['Topic'].nunique():.2f}")
print(f"\nTop 10 Topics by Appearances:")
print(topic_counts.head(10))
print(f"\nTopic Length Statistics:")
print(f"  Mean: {df['Topic_Length'].mean():.2f} characters")
print(f"  Median: {df['Topic_Length'].median():.2f} characters")
print(f"  Min: {df['Topic_Length'].min()} characters")
print(f"  Max: {df['Topic_Length'].max()} characters")
print(f"\nMost Diverse Topics (Most Unique Outlets):")
print(topic_diversity.head(10))


## Section 6: Ranking Dynamics Analysis

**Question**: How do rankings work? What factors influence position in search results?

**Key Insights to Discover**:
- Rank distribution patterns
- Rank stability across topics
- Factors correlating with higher ranks
- Rank-country relationships
- Rank-category relationships


In [None]:
# Ranking Dynamics Analysis
fig, axes = plt.subplots(2, 2, figsize=(18, 14))
fig.suptitle('Ranking Dynamics Analysis', fontsize=16, fontweight='bold')

# 1. Rank Distribution
rank_dist = df['Rank'].value_counts().sort_index()
axes[0, 0].bar(rank_dist.index, rank_dist.values, color=sns.color_palette("viridis", len(rank_dist)))
axes[0, 0].set_title('Distribution of Ranks', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Rank Position')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].set_xticks(range(1, 11))
axes[0, 0].grid(True, alpha=0.3, axis='y')

# Add value labels
for rank, count in rank_dist.items():
    axes[0, 0].text(rank, count + 1, str(count), ha='center', fontweight='bold')

# 2. Average Rank by Country (Top 15)
country_avg_rank = df.groupby('Country')['Rank'].mean().sort_values().head(15)
axes[0, 1].barh(range(len(country_avg_rank)), country_avg_rank.values,
                color=sns.color_palette("plasma", len(country_avg_rank)))
axes[0, 1].set_yticks(range(len(country_avg_rank)))
axes[0, 1].set_yticklabels(country_avg_rank.index)
axes[0, 1].set_title('Average Rank by Country (Top 15, Lower is Better)', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Average Rank')
axes[0, 1].invert_yaxis()
axes[0, 1].grid(True, alpha=0.3, axis='x')

# 3. Rank Distribution by Category
top_categories = df['Category'].value_counts().head(6).index
category_rank_data = df[df['Category'].isin(top_categories)]

# Create box plot
category_rank_pivot = [df[df['Category'] == cat]['Rank'].values for cat in top_categories]
bp = axes[1, 0].boxplot(category_rank_pivot, labels=top_categories, patch_artist=True)
for patch, color in zip(bp['boxes'], sns.color_palette("Set2", len(top_categories))):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)

axes[1, 0].set_title('Rank Distribution by Category (Top 6)', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Category')
axes[1, 0].set_ylabel('Rank')
axes[1, 0].tick_params(axis='x', rotation=45)
axes[1, 0].invert_yaxis()  # Lower rank (1) is better, so invert
axes[1, 0].grid(True, alpha=0.3, axis='y')

# 4. Top 3 vs Bottom 3 Rank Analysis
top3_data = df[df['Rank'] <= 3]
bottom3_data = df[df['Rank'] >= 8]

top3_countries = top3_data['Country'].value_counts().head(10)
bottom3_countries = bottom3_data['Country'].value_counts().head(10)

x = np.arange(len(top3_countries))
width = 0.35
axes[1, 1].bar(x - width/2, top3_countries.values, width, 
               label='Top 3 Ranks (1-3)', alpha=0.8, color='#2ecc71')
axes[1, 1].bar(x + width/2, bottom3_countries.reindex(top3_countries.index, fill_value=0).values, 
               width, label='Bottom 3 Ranks (8-10)', alpha=0.8, color='#e74c3c')
axes[1, 1].set_title('Country Distribution: Top 3 vs Bottom 3 Ranks', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Country')
axes[1, 1].set_ylabel('Number of Appearances')
axes[1, 1].set_xticks(x)
axes[1, 1].set_xticklabels(top3_countries.index, rotation=45, ha='right')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

# Detailed statistics
print("=" * 80)
print("RANKING DYNAMICS ANALYSIS SUMMARY")
print("=" * 80)
print(f"Rank Distribution:")
print(rank_dist)
print(f"\nAverage Rank: {df['Rank'].mean():.2f}")
print(f"Median Rank: {df['Rank'].median():.2f}")
print(f"\nRank 1 Appearances: {rank_dist[1]} ({(rank_dist[1]/len(df)*100):.2f}%)")
print(f"Rank 10 Appearances: {rank_dist[10]} ({(rank_dist[10]/len(df)*100):.2f}%)")
print(f"\nTop 10 Countries by Average Rank (Lower is Better):")
print(country_avg_rank.head(10))


## Section 7: Cross-Dimensional Analysis

**Question**: How do different dimensions interact? Country-Category, Topic-Country, etc.

**Key Insights to Discover**:
- Country-Category relationships
- Topic-Country patterns
- Category-Rank interactions
- Complex multi-dimensional patterns


In [None]:
# Cross-Dimensional Analysis
fig, axes = plt.subplots(2, 2, figsize=(20, 16))
fig.suptitle('Cross-Dimensional Analysis', fontsize=16, fontweight='bold')

# 1. Country-Category Heatmap (Top 10 Countries, Top 8 Categories)
top_countries = df['Country'].value_counts().head(10).index
top_categories = df['Category'].value_counts().head(8).index

country_category = df[df['Country'].isin(top_countries) & df['Category'].isin(top_categories)]
heatmap_data = country_category.groupby(['Country', 'Category']).size().unstack(fill_value=0)

sns.heatmap(heatmap_data, annot=True, fmt='d', cmap='YlOrRd', ax=axes[0, 0], 
            cbar_kws={'label': 'Count'}, linewidths=0.5)
axes[0, 0].set_title('Country-Category Heatmap (Top 10 Countries, Top 8 Categories)', 
                     fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Category')
axes[0, 0].set_ylabel('Country')
axes[0, 0].tick_params(axis='x', rotation=45)

# 2. Topic-Country Analysis (Top Topics, Top Countries)
top_topics = df['Topic'].value_counts().head(10).index
topic_country = df[df['Topic'].isin(top_topics) & df['Country'].isin(top_countries)]
topic_country_pivot = topic_country.groupby(['Topic', 'Country']).size().unstack(fill_value=0)

sns.heatmap(topic_country_pivot, annot=True, fmt='d', cmap='Blues', ax=axes[0, 1],
            cbar_kws={'label': 'Count'}, linewidths=0.5)
axes[0, 1].set_title('Topic-Country Heatmap (Top 10 Topics, Top 10 Countries)', 
                     fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Country')
axes[0, 1].set_ylabel('Topic')
axes[0, 1].tick_params(axis='x', rotation=45)
axes[0, 1].tick_params(axis='y', rotation=0, labelsize=8)

# 3. Category-Rank Average Heatmap
category_rank_avg = df.groupby(['Category', 'Rank']).size().unstack(fill_value=0)
# Normalize by category to show distribution
category_rank_pct = category_rank_avg.div(category_rank_avg.sum(axis=1), axis=0) * 100

top_categories_all = df['Category'].value_counts().head(10).index
category_rank_pct_top = category_rank_pct.loc[top_categories_all]

sns.heatmap(category_rank_pct_top, annot=True, fmt='.1f', cmap='RdYlGn_r', ax=axes[1, 0],
            cbar_kws={'label': 'Percentage (%)'}, linewidths=0.5)
axes[1, 0].set_title('Category-Rank Distribution (Top 10 Categories, % of Category)', 
                     fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Rank')
axes[1, 0].set_ylabel('Category')
axes[1, 0].tick_params(axis='y', rotation=0)

# 4. Media Type vs Rank Analysis
media_type_rank = df.groupby(['Media_Type', 'Rank']).size().unstack(fill_value=0)
media_type_rank_pct = media_type_rank.div(media_type_rank.sum(axis=1), axis=0) * 100

sns.heatmap(media_type_rank_pct, annot=True, fmt='.1f', cmap='coolwarm', ax=axes[1, 1],
            cbar_kws={'label': 'Percentage (%)'}, linewidths=0.5)
axes[1, 1].set_title('Media Type-Rank Distribution (% of Media Type)', 
                     fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Rank')
axes[1, 1].set_ylabel('Media Type')
axes[1, 1].tick_params(axis='y', rotation=0)

plt.tight_layout()
plt.show()

# Detailed statistics
print("=" * 80)
print("CROSS-DIMENSIONAL ANALYSIS SUMMARY")
print("=" * 80)
print("\n1. Country-Category Relationships:")
print("   Most common combinations:")
country_cat_combo = df.groupby(['Country', 'Category']).size().sort_values(ascending=False).head(10)
print(country_cat_combo)
print("\n2. Topic-Country Patterns:")
print("   Topics with most country diversity:")
topic_country_diversity = df.groupby('Topic')['Country'].nunique().sort_values(ascending=False).head(10)
print(topic_country_diversity)


## Section 8: Sponsored Content Analysis

**Question**: What is the role of sponsored content in search results? How prevalent is advertising?

**Key Insights to Discover**:
- Sponsored vs organic content ratio
- Sponsored content by rank
- Sponsored content by category
- Sponsored content trends


In [None]:
# Sponsored Content Analysis
print("=" * 80)
print("SPONSORED CONTENT ANALYSIS")
print("=" * 80)

sponsored_counts = df['Sponsored Ad?'].value_counts()
print(f"\nSponsored Ad Distribution:")
print(sponsored_counts)
print(f"\nPercentage Breakdown:")
for val, count in sponsored_counts.items():
    pct = (count / len(df)) * 100
    print(f"  {val}: {count} ({pct:.2f}%)")

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(16, 6))
fig.suptitle('Sponsored Content Analysis', fontsize=16, fontweight='bold')

# 1. Sponsored vs Non-Sponsored Pie Chart
axes[0].pie(sponsored_counts.values, labels=sponsored_counts.index, autopct='%1.1f%%',
            startangle=90, colors=['#e74c3c', '#2ecc71'])
axes[0].set_title('Sponsored vs Non-Sponsored Content', fontsize=12, fontweight='bold')

# 2. Sponsored Content by Rank
sponsored_by_rank = df.groupby(['Rank', 'Sponsored Ad?']).size().unstack(fill_value=0)
if 'Yes' in sponsored_by_rank.columns:
    axes[1].bar(sponsored_by_rank.index, sponsored_by_rank['Yes'], 
                color='#e74c3c', alpha=0.7, label='Sponsored')
axes[1].bar(sponsored_by_rank.index, sponsored_by_rank['No'], 
            color='#2ecc71', alpha=0.7, label='Non-Sponsored', bottom=sponsored_by_rank.get('Yes', 0))
axes[1].set_title('Sponsored Content Distribution by Rank', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Rank')
axes[1].set_ylabel('Number of Results')
axes[1].set_xticks(range(1, 11))
axes[1].legend()
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

# Additional insights
if 'Yes' in df['Sponsored Ad?'].values:
    print(f"\nSponsored content appears in ranks: {sorted(df[df['Sponsored Ad?'] == 'Yes']['Rank'].unique())}")
    print(f"Average rank of sponsored content: {df[df['Sponsored Ad?'] == 'Yes']['Rank'].mean():.2f}")
else:
    print("\nNo sponsored content found in the dataset.")


## Section 9: Advanced Statistical Analysis

**Question**: What statistical patterns and correlations exist in the data?

**Key Insights to Discover**:
- Correlation analysis
- Statistical significance tests
- Outlier detection
- Predictive patterns


In [None]:
# Advanced Statistical Analysis
from scipy import stats
from scipy.stats import chi2_contingency

print("=" * 80)
print("ADVANCED STATISTICAL ANALYSIS")
print("=" * 80)

# 1. Rank Distribution Statistical Test
print("\n1. RANK DISTRIBUTION ANALYSIS:")
print(f"   Mean Rank: {df['Rank'].mean():.2f}")
print(f"   Median Rank: {df['Rank'].median():.2f}")
print(f"   Standard Deviation: {df['Rank'].std():.2f}")
print(f"   Skewness: {df['Rank'].skew():.2f}")
print(f"   Kurtosis: {df['Rank'].kurtosis():.2f}")

# Test if ranks are uniformly distributed (null hypothesis)
observed = df['Rank'].value_counts().sort_index().values
expected = np.full(10, len(df) / 10)
chi2, p_value = stats.chisquare(observed, expected)
print(f"\n   Chi-square test for uniform distribution:")
print(f"   Chi-square statistic: {chi2:.2f}")
print(f"   p-value: {p_value:.6f}")
if p_value < 0.05:
    print("   Result: Ranks are NOT uniformly distributed (p < 0.05)")
else:
    print("   Result: Ranks appear uniformly distributed (p >= 0.05)")

# 2. Country-Rank Correlation
print("\n2. COUNTRY-RANK RELATIONSHIP:")
top_countries_stats = df[df['Country'].isin(top_countries)].groupby('Country')['Rank'].agg(['mean', 'std', 'count'])
print("   Average Rank by Top Countries:")
print(top_countries_stats.sort_values('mean'))

# 3. Category-Rank ANOVA Test
print("\n3. CATEGORY-RANK RELATIONSHIP:")
top_categories_list = df['Category'].value_counts().head(5).index.tolist()
category_groups = [df[df['Category'] == cat]['Rank'].values for cat in top_categories_list]
f_stat, p_value_anova = stats.f_oneway(*category_groups)
print(f"   ANOVA Test (Top 5 Categories):")
print(f"   F-statistic: {f_stat:.2f}")
print(f"   p-value: {p_value_anova:.6f}")
if p_value_anova < 0.05:
    print("   Result: Significant difference in ranks across categories (p < 0.05)")
else:
    print("   Result: No significant difference in ranks across categories (p >= 0.05)")

# 4. Topic Length vs Rank Correlation
print("\n4. TOPIC LENGTH vs RANK CORRELATION:")
correlation = df['Topic_Length'].corr(df['Rank'])
print(f"   Pearson Correlation: {correlation:.4f}")
if abs(correlation) > 0.3:
    strength = "strong" if abs(correlation) > 0.7 else "moderate"
    direction = "positive" if correlation > 0 else "negative"
    print(f"   Interpretation: {strength} {direction} correlation")
else:
    print(f"   Interpretation: weak correlation")

# 5. Outlier Detection in Ranks
print("\n5. OUTLIER DETECTION:")
Q1 = df['Rank'].quantile(0.25)
Q3 = df['Rank'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['Rank'] < lower_bound) | (df['Rank'] > upper_bound)]
print(f"   Q1: {Q1}, Q3: {Q3}, IQR: {IQR}")
print(f"   Outliers (using IQR method): {len(outliers)} rows")
print(f"   Note: Rank is bounded 1-10, so outliers are less meaningful here")

# Visualization of statistical patterns
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('Advanced Statistical Analysis Visualizations', fontsize=16, fontweight='bold')

# 1. Rank Distribution with Normal Overlay
axes[0, 0].hist(df['Rank'], bins=10, density=True, alpha=0.7, color='steelblue', edgecolor='black')
x_norm = np.linspace(1, 10, 100)
y_norm = stats.norm.pdf(x_norm, df['Rank'].mean(), df['Rank'].std())
axes[0, 0].plot(x_norm, y_norm, 'r-', linewidth=2, label='Normal Distribution')
axes[0, 0].set_title('Rank Distribution vs Normal Distribution', fontsize=12, fontweight='bold')
axes[0, 0].set_xlabel('Rank')
axes[0, 0].set_ylabel('Density')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# 2. Q-Q Plot for Rank Distribution
stats.probplot(df['Rank'], dist="norm", plot=axes[0, 1])
axes[0, 1].set_title('Q-Q Plot: Rank Distribution', fontsize=12, fontweight='bold')
axes[0, 1].grid(True, alpha=0.3)

# 3. Topic Length vs Rank Scatter
axes[1, 0].scatter(df['Topic_Length'], df['Rank'], alpha=0.5, s=20, color='steelblue')
z = np.polyfit(df['Topic_Length'], df['Rank'], 1)
p = np.poly1d(z)
axes[1, 0].plot(df['Topic_Length'], p(df['Topic_Length']), "r--", linewidth=2, 
                label=f'Trend line (r={correlation:.3f})')
axes[1, 0].set_title('Topic Length vs Rank', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Topic Length (Characters)')
axes[1, 0].set_ylabel('Rank')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# 4. Category Rank Box Plot (Top 8)
top_8_categories = df['Category'].value_counts().head(8).index
category_rank_data = df[df['Category'].isin(top_8_categories)]
category_rank_pivot = [category_rank_data[category_rank_data['Category'] == cat]['Rank'].values 
                       for cat in top_8_categories]
bp = axes[1, 1].boxplot(category_rank_pivot, labels=top_8_categories, patch_artist=True)
for patch, color in zip(bp['boxes'], sns.color_palette("Set2", len(top_8_categories))):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)
axes[1, 1].set_title('Rank Distribution by Category (Top 8)', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Category')
axes[1, 1].set_ylabel('Rank')
axes[1, 1].tick_params(axis='x', rotation=45)
axes[1, 1].invert_yaxis()
axes[1, 1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()


## Section 10: Key Findings and Insights Summary

**Comprehensive Summary of All Discoveries**

This section consolidates all major findings from the analysis.


In [None]:
# Comprehensive Findings Summary
print("=" * 80)
print("COMPREHENSIVE FINDINGS AND INSIGHTS SUMMARY")
print("=" * 80)

print("\n" + "=" * 80)
print("1. DATASET OVERVIEW")
print("=" * 80)
print(f"   • Total Records: {len(df):,}")
print(f"   • Date Range: {df['Date'].min().strftime('%Y-%m-%d')} to {df['Date'].max().strftime('%Y-%m-%d')}")
print(f"   • Unique Topics: {df['Topic'].nunique()}")
print(f"   • Unique Media Outlets: {df['Media Outlet'].nunique()}")
print(f"   • Unique Countries: {df['Country'].nunique()}")
print(f"   • Unique Categories: {df['Category'].nunique()}")

print("\n" + "=" * 80)
print("2. GEOGRAPHIC DOMINANCE")
print("=" * 80)
top_country = df['Country'].value_counts().index[0]
top_country_pct = (df['Country'].value_counts().iloc[0] / len(df)) * 100
print(f"   • Dominant Country: {top_country} ({top_country_pct:.2f}% of all results)")
us_pct = (df['Country'] == 'US').sum() / len(df) * 100
print(f"   • US Media: {us_pct:.2f}% of all results")
print(f"   • Top 5 Countries account for {(df['Country'].value_counts().head(5).sum() / len(df) * 100):.2f}% of results")

print("\n" + "=" * 80)
print("3. MEDIA CONCENTRATION")
print("=" * 80)
top_outlet = df['Media Outlet'].value_counts().index[0]
top_outlet_count = df['Media Outlet'].value_counts().iloc[0]
top10_outlets_share = (df['Media Outlet'].value_counts().head(10).sum() / len(df)) * 100
print(f"   • Most Visible Outlet: {top_outlet} ({top_outlet_count} appearances)")
print(f"   • Top 10 Outlets: {top10_outlets_share:.2f}% of all appearances")
print(f"   • Market Concentration: High (top outlets dominate)")

print("\n" + "=" * 80)
print("4. CATEGORY ANALYSIS")
print("=" * 80)
top_category = df['Category'].value_counts().index[0]
top_category_pct = (df['Category'].value_counts().iloc[0] / len(df)) * 100
print(f"   • Dominant Category: {top_category} ({top_category_pct:.2f}% of results)")
mainstream_pct = (df['Category'].str.contains('Mainstream', case=False, na=False).sum() / len(df)) * 100
print(f"   • Mainstream Media: {mainstream_pct:.2f}% of results")
independent_pct = (df['Category'].str.contains('Independent', case=False, na=False).sum() / len(df)) * 100
print(f"   • Independent Media: {independent_pct:.2f}% of results")

print("\n" + "=" * 80)
print("5. RANKING PATTERNS")
print("=" * 80)
print(f"   • Average Rank: {df['Rank'].mean():.2f}")
print(f"   • Rank 1 Appearances: {(df['Rank'] == 1).sum()} ({(df['Rank'] == 1).sum()/len(df)*100:.2f}%)")
print(f"   • Rank Distribution: {dict(df['Rank'].value_counts().sort_index())}")

print("\n" + "=" * 80)
print("6. TOPIC DIVERSITY")
print("=" * 80)
avg_results_per_topic = len(df) / df['Topic'].nunique()
print(f"   • Average Results per Topic: {avg_results_per_topic:.2f}")
most_searched_topic = df['Topic'].value_counts().index[0]
print(f"   • Most Searched Topic: '{most_searched_topic}' ({df['Topic'].value_counts().iloc[0]} results)")
avg_outlets_per_topic = df.groupby('Topic')['Media Outlet'].nunique().mean()
print(f"   • Average Unique Outlets per Topic: {avg_outlets_per_topic:.2f}")

print("\n" + "=" * 80)
print("7. SPONSORED CONTENT")
print("=" * 80)
sponsored_pct = (df['Sponsored Ad?'] == 'Yes').sum() / len(df) * 100
print(f"   • Sponsored Content: {sponsored_pct:.2f}% of all results")
if (df['Sponsored Ad?'] == 'Yes').sum() > 0:
    avg_sponsored_rank = df[df['Sponsored Ad?'] == 'Yes']['Rank'].mean()
    print(f"   • Average Rank of Sponsored Content: {avg_sponsored_rank:.2f}")

print("\n" + "=" * 80)
print("8. KEY INSIGHTS")
print("=" * 80)
print("   • Geographic Bias: Strong dominance of US and UK media outlets")
print("   • Media Concentration: Top outlets control significant share of visibility")
print("   • Category Dominance: Mainstream media significantly outnumbers independent")
print("   • Ranking Fairness: Ranks appear to follow algorithmic patterns")
print("   • Topic Coverage: Most topics have 10 results (full page coverage)")
print("   • Diversity: Limited outlet diversity per topic")

print("\n" + "=" * 80)
print("9. RECOMMENDATIONS FOR FURTHER ANALYSIS")
print("=" * 80)
print("   • Analyze Notes/Observations column for qualitative insights")
print("   • Time-series analysis of specific topics")
print("   • Sentiment analysis of media outlet names")
print("   • Network analysis of outlet-topic relationships")
print("   • Predictive modeling for rank prediction")
print("   • Comparative analysis with other search engines")

print("\n" + "=" * 80)
print("ANALYSIS COMPLETE")
print("=" * 80)
