# Netflix Global Pricing Analysis (2025)

This analysis examines how Netflix subscription prices and content libraries vary across 128 countries worldwide. The dataset captures pricing for all subscription tiers, including the recently introduced ad-supported plan, alongside each country's total content library size.

## Research Questions

1. **How much do subscription prices vary between countries?**
2. **Which countries provide the most content for each dollar spent?**
3. **How does the ad-supported tier compare to traditional plans?**
4. **What pricing patterns emerge across different world regions?**
5. **Is there a relationship between library size and subscription cost?**

## Setup and Data Loading

We begin by importing the necessary libraries and loading the dataset. The data includes subscription pricing across four tiers (Basic with Ads, Basic, Standard, and Premium) and the total number of titles available in each country.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pycountry_convert as pc
import os
%matplotlib inline

# Create output directory for visualizations
os.makedirs('docs/images', exist_ok=True)

# Configure visualization style
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['axes.formatter.use_locale'] = True

In [None]:
# Load the dataset
df = pd.read_csv('netflix-2025.csv')
print(f"Dataset contains {len(df):,} countries")
df.head()

## Data Preparation

The original dataset provides "cost per title" metrics, which show how much subscribers pay per piece of content. We convert these values to actual monthly subscription costs by multiplying the cost per title by the total library size. This calculation gives us the standard monthly subscription price that customers see when signing up.

In [None]:
# Clean column names
df.columns = df.columns.str.strip()

# Remove empty columns
df = df.drop(columns=[col for col in df.columns if 'Unnamed' in col or col == 'X.1'], errors='ignore')

# Calculate monthly subscription costs from cost-per-title data
df['Cost - Basic with Ads ($)'] = (df['Cost per Title - Basic with Ads ($)'] * df['Total Library Size']).round(2)
df['Cost - Basic ($)'] = (df['Cost per Title - Basic ($)'] * df['Total Library Size']).round(2)
df['Cost - Standard ($)'] = (df['Cost per Title - Standard ($)'] * df['Total Library Size']).round(2)
df['Cost - Premium ($)'] = (df['Cost per Title - Premium ($)'] * df['Total Library Size']).round(2)

print("Calculated monthly subscription costs for all tiers")
df[['Country', 'Total Library Size', 'Cost - Basic ($)', 'Cost - Standard ($)', 'Cost - Premium ($)']].head()

## Regional Classification

To identify geographical pricing patterns, we classify each country by its continental region. This grouping allows us to compare average pricing and content availability across Europe, Asia, Africa, North America, South America, and Oceania.

In [None]:
def country_to_continent(country_name):
    """Map country names to their continental region."""
    # Handle special cases not recognized by standard libraries
    special_cases = {
        'Côte d\'Ivoire': 'Africa',
        'Democratic Republic of the Congo': 'Africa',
        'Trinidad and Tobago': 'North America',
        'Antigua & Barbuda': 'North America',
        'St. Lucia': 'North America',
        'Turks & Caicos Islands': 'North America',
        'Bosnia & Herzegovina': 'Europe',
        'Palestine': 'Asia',
        'Guernsey': 'Europe',
        'French Guiana': 'South America',
        'French Polynesia': 'Oceania'
    }
    
    if country_name in special_cases:
        return special_cases[country_name]
    
    try:
        country_alpha2 = pc.country_name_to_country_alpha2(country_name)
        country_continent_code = pc.country_alpha2_to_continent_code(country_alpha2)
        return pc.convert_continent_code_to_continent_name(country_continent_code)
    except:
        return 'Unknown'

# Add regional classification
df['Region'] = df['Country'].apply(country_to_continent)

print("Countries per region:")
print(df['Region'].value_counts().to_frame('Count'))

## Content Library Analysis

### Distribution of Library Sizes

Netflix does not offer the same content catalog in every country. Licensing restrictions, local regulations, and content production agreements result in substantial variation in library sizes. This histogram shows the distribution of content availability across all 128 countries.

**What to look for:** The spread indicates how much content availability varies globally. A wide distribution suggests significant differences in the user experience depending on location.

In [None]:
# Calculate library size statistics
print("Library Size Statistics:")
print(f"Average: {df['Total Library Size'].mean():,.0f} titles")
print(f"Median: {df['Total Library Size'].median():,.0f} titles")
print(f"Range: {df['Total Library Size'].min():,.0f} to {df['Total Library Size'].max():,.0f} titles")
print(f"Standard Deviation: {df['Total Library Size'].std():,.0f} titles\n")

# Create distribution plot
fig, ax = plt.subplots(figsize=(12, 6))
sns.histplot(data=df, x='Total Library Size', kde=True, bins=30, color='#E50914')
plt.title('Distribution of Netflix Library Sizes Across Countries', fontsize=16, fontweight='bold')
plt.xlabel('Total Titles Available', fontsize=12)
plt.ylabel('Number of Countries', fontsize=12)

# Format x-axis with commas
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{int(x):,}'))

plt.tight_layout()
plt.savefig('docs/images/library_distribution.png', dpi=150, bbox_inches='tight')
plt.show()

### Countries with Largest and Smallest Libraries

These comparisons reveal which countries have access to the most comprehensive Netflix catalogs and which have the most limited selections. The difference between the largest and smallest libraries can exceed 5,000 titles—representing a substantial disparity in available content.

In [None]:
# Identify extremes
max_lib = df.loc[df['Total Library Size'].idxmax()]
min_lib = df.loc[df['Total Library Size'].idxmin()]

print(f"Largest library: {max_lib['Country']} with {max_lib['Total Library Size']:,.0f} titles")
print(f"Smallest library: {min_lib['Country']} with {min_lib['Total Library Size']:,.0f} titles")
print(f"Difference: {max_lib['Total Library Size'] - min_lib['Total Library Size']:,.0f} titles\n")

# Create comparison chart
fig, axes = plt.subplots(1, 2, figsize=(16, 8))

# Top 15 countries
top_lib = df.nlargest(15, 'Total Library Size')
sns.barplot(data=top_lib, y='Country', x='Total Library Size', ax=axes[0], palette='Reds_r')
axes[0].set_title('Countries with Largest Content Libraries', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Number of Titles', fontsize=11)
axes[0].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{int(x):,}'))

# Bottom 15 countries
bot_lib = df.nsmallest(15, 'Total Library Size')
sns.barplot(data=bot_lib, y='Country', x='Total Library Size', ax=axes[1], palette='Blues_r')
axes[1].set_title('Countries with Smallest Content Libraries', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Number of Titles', fontsize=11)
axes[1].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{int(x):,}'))

plt.tight_layout()
plt.savefig('docs/images/library_top_bottom.png', dpi=150, bbox_inches='tight')
plt.show()

## Subscription Cost Analysis

### Price Distribution Across Tiers

Netflix offers four subscription tiers with different features. The Basic with Ads tier includes commercial interruptions, the Basic tier removes ads but limits video quality, the Standard tier adds HD streaming and multiple screens, and the Premium tier provides 4K quality and the most simultaneous streams.

**Note:** Not all countries offer the ad-supported tier. This newer option is available in select markets.

In [None]:
# Calculate cost statistics for each tier
cost_cols = ['Cost - Basic with Ads ($)', 'Cost - Basic ($)', 'Cost - Standard ($)', 'Cost - Premium ($)']

print("Monthly Subscription Cost Statistics (USD):\n")
for col in cost_cols:
    valid_data = df[col].dropna()
    if len(valid_data) > 0:
        tier_name = col.replace('Cost - ', '').replace(' ($)', '')
        print(f"{tier_name}:")
        print(f"  Average: ${valid_data.mean():.2f}")
        print(f"  Range: ${valid_data.min():.2f} to ${valid_data.max():.2f}")
        print(f"  Available in: {len(valid_data)} countries\n")

In [None]:
# Visualize cost distributions
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
colors = ['#564d4d', '#831010', '#E50914', '#ff4d4d']
tier_names = ['Basic with Ads', 'Basic', 'Standard', 'Premium']

for i, (col, color, name) in enumerate(zip(cost_cols, colors, tier_names)):
    ax = axes[i//2, i%2]
    data = df[col].dropna()
    
    sns.histplot(data=data, kde=True, ax=ax, color=color, bins=20)
    ax.set_title(f'{name} Tier Price Distribution', fontsize=12, fontweight='bold')
    ax.set_xlabel('Monthly Cost (USD)', fontsize=10)
    ax.set_ylabel('Number of Countries', fontsize=10)
    
    # Add mean line
    mean_val = data.mean()
    ax.axvline(mean_val, color='black', linestyle='--', linewidth=1.5, label=f'Average: ${mean_val:.2f}')
    ax.legend(fontsize=9)

plt.suptitle('Netflix Subscription Pricing by Tier (2025)', fontsize=16, fontweight='bold', y=1.00)
plt.tight_layout()
plt.savefig('docs/images/cost_distribution.png', dpi=150, bbox_inches='tight')
plt.show()

### Most and Least Expensive Markets

Premium tier pricing varies substantially across countries. High-income nations in Europe and North America typically charge higher subscription fees, while developing economies in Asia, Africa, and South America generally offer lower prices. This pricing strategy reflects Netflix's approach to market-specific affordability.

In [None]:
# Create cost comparison
fig, axes = plt.subplots(1, 2, figsize=(16, 8))

# Most expensive countries
top_cost = df.nlargest(15, 'Cost - Premium ($)')
sns.barplot(data=top_cost, y='Country', x='Cost - Premium ($)', ax=axes[0], palette='Reds_r')
axes[0].set_title('Most Expensive Premium Subscriptions', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Monthly Cost (USD)', fontsize=11)
axes[0].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:.2f}'))

# Least expensive countries
bot_cost = df[df['Cost - Premium ($)'] > 0].nsmallest(15, 'Cost - Premium ($)')
sns.barplot(data=bot_cost, y='Country', x='Cost - Premium ($)', ax=axes[1], palette='Greens_r')
axes[1].set_title('Least Expensive Premium Subscriptions', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Monthly Cost (USD)', fontsize=11)
axes[1].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:.2f}'))

plt.tight_layout()
plt.savefig('docs/images/cost_top_bottom.png', dpi=150, bbox_inches='tight')
plt.show()

# Show price ratio
max_price = df['Cost - Premium ($)'].max()
min_price = df[df['Cost - Premium ($)'] > 0]['Cost - Premium ($)'].min()
print(f"\nPremium tier price variation: {max_price / min_price:.1f}x difference between most and least expensive markets")

## Value Analysis: Cost Per Title

### Which Countries Offer the Best Value?

The most useful measure of value combines both pricing and content availability. Cost per title divides the monthly subscription fee by the number of titles available, showing how much subscribers effectively pay for each piece of content they can access.

A lower cost per title indicates better value—either through lower subscription prices, larger content libraries, or both. Countries with high subscription costs but limited content catalogs provide the worst value by this metric.

In [None]:
# Calculate value statistics
avg_cost_per_title = df['Cost per Title - Premium ($)'].mean()
print(f"Average cost per title (Premium): ${avg_cost_per_title:.4f}")
print(f"This means subscribers pay an average of ${avg_cost_per_title:.4f} per title they can access.\n")

# Create value comparison
fig, axes = plt.subplots(1, 2, figsize=(16, 8))

# Best value countries
best_value = df[df['Cost per Title - Premium ($)'] > 0].nsmallest(15, 'Cost per Title - Premium ($)')
sns.barplot(data=best_value, y='Country', x='Cost per Title - Premium ($)', ax=axes[0], palette='Greens_r')
axes[0].set_title('Best Value: Lowest Cost Per Title', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Cost Per Title (USD)', fontsize=11)
axes[0].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:.4f}'))

# Worst value countries
worst_value = df.nlargest(15, 'Cost per Title - Premium ($)')
sns.barplot(data=worst_value, y='Country', x='Cost per Title - Premium ($)', ax=axes[1], palette='Reds_r')
axes[1].set_title('Worst Value: Highest Cost Per Title', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Cost Per Title (USD)', fontsize=11)
axes[1].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:.4f}'))

plt.tight_layout()
plt.savefig('docs/images/value_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

## Ad-Supported Tier Analysis

### Savings from Choosing the Ads Plan

Netflix introduced an ad-supported tier in 2022 to offer a lower-cost alternative. This plan shows commercials during content but provides access to most of the same catalog as the Basic plan. The ad-supported option is currently available in select markets.

This analysis compares the ad-supported tier to the standard Basic plan, showing the percentage discount subscribers receive in exchange for viewing advertisements.

In [None]:
# Identify countries with ads tier
ads_df = df[df['Cost - Basic with Ads ($)'].notna() & (df['Cost - Basic with Ads ($)'] > 0)].copy()
print(f"Ad-supported tier available in {len(ads_df)} out of {len(df)} countries ({len(ads_df)/len(df)*100:.1f}%)\n")

# Calculate savings
ads_df['Ads Savings ($)'] = ads_df['Cost - Basic ($)'] - ads_df['Cost - Basic with Ads ($)']
ads_df['Ads Savings (%)'] = ((ads_df['Ads Savings ($)'] / ads_df['Cost - Basic ($)']) * 100).round(1)

print(f"Average savings with ads tier: ${ads_df['Ads Savings ($)'].mean():.2f} ({ads_df['Ads Savings (%)'].mean():.1f}%)")
print(f"Savings range: ${ads_df['Ads Savings ($)'].min():.2f} to ${ads_df['Ads Savings ($)'].max():.2f}\n")

# Create visualization
fig, ax = plt.subplots(figsize=(12, 8))
ads_sorted = ads_df.sort_values('Ads Savings (%)', ascending=True)

# Color bars based on savings level
colors = ['#E50914' if x > 40 else '#831010' for x in ads_sorted['Ads Savings (%)']]

sns.barplot(data=ads_sorted, y='Country', x='Ads Savings (%)', palette=colors)
plt.title('Discount for Choosing Ad-Supported Tier vs Basic', fontsize=16, fontweight='bold')
plt.xlabel('Percentage Savings (%)', fontsize=12)
plt.ylabel('')

# Add percentage formatting
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x:.0f}%'))

plt.tight_layout()
plt.savefig('docs/images/ads_savings.png', dpi=150, bbox_inches='tight')
plt.show()

## Regional Patterns

### Comparing Netflix Across Continents

Regional analysis reveals distinct patterns in how Netflix prices its service and distributes content across the world. European markets generally pay higher subscription fees, while Asian, African, and South American countries often have lower prices but comparable content libraries.

These patterns reflect differences in purchasing power, market competition, and Netflix's strategic priorities in each region.

In [None]:
# Calculate regional statistics
region_stats = df.groupby('Region').agg({
    'Total Library Size': 'mean',
    'Cost - Basic ($)': 'mean',
    'Cost - Standard ($)': 'mean',
    'Cost - Premium ($)': 'mean',
    'Cost per Title - Premium ($)': 'mean',
    'Country': 'count'
}).round(2)

region_stats.columns = ['Avg Library Size', 'Avg Basic ($)', 'Avg Standard ($)', 'Avg Premium ($)', 'Avg Cost/Title', 'Countries']
region_stats = region_stats.sort_values('Countries', ascending=False)

print("Regional Summary Statistics:\n")
print(region_stats.to_string())

In [None]:
# Create regional comparison visualizations
fig, axes = plt.subplots(2, 2, figsize=(14, 12))

# Regional coverage
region_counts = df['Region'].value_counts()
axes[0, 0].pie(region_counts.values, labels=region_counts.index, autopct='%1.0f%%', 
               colors=sns.color_palette('Reds_r', len(region_counts)), startangle=90)
axes[0, 0].set_title('Netflix Country Coverage by Region', fontsize=12, fontweight='bold')

# Average library size by region
region_lib = df.groupby('Region')['Total Library Size'].mean().sort_values(ascending=True)
sns.barplot(x=region_lib.values, y=region_lib.index, ax=axes[0, 1], palette='Reds_r')
axes[0, 1].set_title('Average Library Size by Region', fontsize=12, fontweight='bold')
axes[0, 1].set_xlabel('Average Number of Titles', fontsize=10)
axes[0, 1].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{int(x):,}'))

# Average premium cost by region
region_cost = df.groupby('Region')['Cost - Premium ($)'].mean().sort_values(ascending=True)
sns.barplot(x=region_cost.values, y=region_cost.index, ax=axes[1, 0], palette='Blues_r')
axes[1, 0].set_title('Average Premium Cost by Region', fontsize=12, fontweight='bold')
axes[1, 0].set_xlabel('Average Monthly Cost (USD)', fontsize=10)
axes[1, 0].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:.2f}'))

# Average cost per title by region
region_value = df.groupby('Region')['Cost per Title - Premium ($)'].mean().sort_values(ascending=True)
sns.barplot(x=region_value.values, y=region_value.index, ax=axes[1, 1], palette='Greens_r')
axes[1, 1].set_title('Average Cost Per Title by Region', fontsize=12, fontweight='bold')
axes[1, 1].set_xlabel('Cost Per Title (USD)', fontsize=10)
axes[1, 1].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:.4f}'))

plt.suptitle('Regional Comparison of Netflix Subscriptions', fontsize=16, fontweight='bold', y=1.00)
plt.tight_layout()
plt.savefig('docs/images/regional_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

## Relationship Between Library Size and Cost

### Does More Content Mean Higher Prices?

This scatter plot examines whether countries with larger content libraries charge higher subscription fees. Each point represents one country, colored by region. The dashed trend line shows the overall relationship between library size and premium subscription cost.

A positive correlation would suggest that Netflix charges more in markets where it offers more content. A weak or negative correlation would indicate that pricing depends more on local economic factors than content volume.

In [None]:
# Create scatter plot with regional colors
fig, ax = plt.subplots(figsize=(14, 8))

regions = df['Region'].unique()
colors = dict(zip(regions, sns.color_palette('husl', len(regions))))

for region in regions:
    region_data = df[df['Region'] == region]
    ax.scatter(region_data['Total Library Size'], region_data['Cost - Premium ($)'], 
               label=region, alpha=0.7, s=80, c=[colors[region]])

# Add trend line
z = np.polyfit(df['Total Library Size'], df['Cost - Premium ($)'], 1)
p = np.poly1d(z)
ax.plot(df['Total Library Size'].sort_values(), p(df['Total Library Size'].sort_values()), 
        'r--', alpha=0.8, linewidth=2, label='Overall Trend')

# Format axes
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{int(x):,}'))
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:.2f}'))

ax.set_xlabel('Total Library Size (Number of Titles)', fontsize=12)
ax.set_ylabel('Premium Subscription Cost (USD)', fontsize=12)
ax.set_title('Relationship Between Content Library Size and Subscription Cost', fontsize=16, fontweight='bold')
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=10)

plt.tight_layout()
plt.savefig('docs/images/correlation_scatter.png', dpi=150, bbox_inches='tight')
plt.show()

# Calculate and display correlation
corr = df['Total Library Size'].corr(df['Cost - Premium ($)'])
print(f"\nCorrelation coefficient: {corr:.3f}")

if abs(corr) < 0.3:
    strength = "weak"
elif abs(corr) < 0.7:
    strength = "moderate"
else:
    strength = "strong"

direction = "positive" if corr > 0 else "negative"
print(f"This indicates a {strength} {direction} relationship between library size and subscription cost.")

## Price Tier Spread Analysis

### How Much More Does Premium Cost Than Basic?

Netflix encourages subscribers to upgrade from Basic to Premium by offering additional features like higher video quality and more simultaneous streams. The price difference between these tiers varies significantly by country.

Some countries show a small spread between Basic and Premium pricing (around $2-3), while others show spreads exceeding $10. These differences reveal how Netflix adjusts its pricing strategy to match local market conditions and willingness to pay for premium features.

In [None]:
# Calculate tier spread metrics
df['Basic to Premium Spread ($)'] = df['Cost - Premium ($)'] - df['Cost - Basic ($)']
df['Premium Markup (%)'] = ((df['Cost - Premium ($)'] / df['Cost - Basic ($)'] - 1) * 100).round(1)

print(f"Average price difference (Basic to Premium): ${df['Basic to Premium Spread ($)'].mean():.2f}")
print(f"Average premium markup: {df['Premium Markup (%)'].mean():.1f}%\n")

# Create spread comparison
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Largest spreads
top_spread = df.nlargest(15, 'Basic to Premium Spread ($)')
sns.barplot(data=top_spread, y='Country', x='Basic to Premium Spread ($)', ax=axes[0], palette='Reds_r')
axes[0].set_title('Largest Price Gap Between Basic and Premium', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Price Difference (USD)', fontsize=11)
axes[0].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:.2f}'))

# Smallest spreads
bot_spread = df[df['Basic to Premium Spread ($)'] > 0].nsmallest(15, 'Basic to Premium Spread ($)')
sns.barplot(data=bot_spread, y='Country', x='Basic to Premium Spread ($)', ax=axes[1], palette='Greens_r')
axes[1].set_title('Smallest Price Gap Between Basic and Premium', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Price Difference (USD)', fontsize=11)
axes[1].xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:.2f}'))

plt.tight_layout()
plt.savefig('docs/images/price_spread.png', dpi=150, bbox_inches='tight')
plt.show()

## Export Data for Dashboard

This section generates summary statistics and exports data files for use in the interactive web dashboard. The exported files include key findings and the processed dataset.

In [None]:
# Compile key summary statistics
summary = {
    'total_countries': len(df),
    'avg_library_size': int(df['Total Library Size'].mean()),
    'max_library_country': df.loc[df['Total Library Size'].idxmax(), 'Country'],
    'max_library_size': int(df['Total Library Size'].max()),
    'min_library_country': df.loc[df['Total Library Size'].idxmin(), 'Country'],
    'min_library_size': int(df['Total Library Size'].min()),
    'avg_premium_cost': round(df['Cost - Premium ($)'].mean(), 2),
    'most_expensive_country': df.loc[df['Cost - Premium ($)'].idxmax(), 'Country'],
    'most_expensive_cost': round(df['Cost - Premium ($)'].max(), 2),
    'cheapest_country': df.loc[df[df['Cost - Premium ($)'] > 0]['Cost - Premium ($)'].idxmin(), 'Country'],
    'cheapest_cost': round(df[df['Cost - Premium ($)'] > 0]['Cost - Premium ($)'].min(), 2),
    'best_value_country': df.loc[df[df['Cost per Title - Premium ($)'] > 0]['Cost per Title - Premium ($)'].idxmin(), 'Country'],
    'countries_with_ads': len(ads_df)
}

print("Key Findings Summary:\n")
for key, value in summary.items():
    formatted_key = key.replace('_', ' ').title()
    print(f"{formatted_key}: {value}")

# Export summary to JSON
import json
with open('docs/summary.json', 'w') as f:
    json.dump(summary, f, indent=2)

# Export processed dataset
export_cols = ['Country', 'Region', 'Total Library Size', 'Cost - Basic with Ads ($)', 
               'Cost - Basic ($)', 'Cost - Standard ($)', 'Cost - Premium ($)',
               'Cost per Title - Premium ($)']
df[export_cols].to_json('docs/data.json', orient='records', indent=2)

print("\nData files exported successfully to docs/ directory.")

## Summary of Findings

### Global Coverage and Market Presence

Netflix operates in 128 countries across six continents, with the majority of markets located in Europe. Africa has the fewest Netflix-available countries in this dataset. This distribution reflects Netflix's expansion strategy and the regulatory or infrastructure challenges in different regions.

### Content Library Variation

Content availability varies substantially between countries, with library sizes ranging from approximately 5,000 to nearly 10,000 titles. European countries tend to have larger libraries, while some markets in Africa and South America have smaller catalogs. This variation stems from licensing agreements, local content regulations, and regional production investments.

### Pricing Disparities

Premium subscription costs show dramatic variation, with the most expensive markets charging approximately 6-7 times more than the least expensive ones. High-income countries in Western Europe and North America pay the highest prices, while developing markets in Asia and Africa have significantly lower subscription fees. This pricing reflects Netflix's localization strategy based on purchasing power and market conditions.

### Value for Money

When measuring value through cost per title, several patterns emerge. Some countries combine lower prices with substantial content libraries, providing exceptional value. Conversely, certain markets charge high subscription fees while offering smaller catalogs. The best value markets are typically found in developing economies where Netflix prices aggressively to build market share.

### Ad-Supported Tier

The ad-supported tier is available in approximately 15 markets and offers savings averaging 30-60% compared to the Basic plan. This option provides Netflix with a competitive response to other streaming services and creates an entry-level price point for cost-conscious subscribers.

### Regional Patterns

Clear regional patterns exist in Netflix's pricing and content strategy:

- **Europe**: Highest average subscription costs, large content libraries, moderate value per dollar
- **Asia**: Variable pricing with many low-cost markets, good value for money in developing economies
- **North America**: High subscription costs but also large libraries, moderate value
- **South America**: Lower average prices, moderate library sizes, good relative value
- **Africa**: Limited country availability, lower prices, moderate libraries
- **Oceania**: High prices similar to developed markets, large content offerings

### Subscription Tier Pricing

The price gap between Basic and Premium tiers varies significantly by country. Some markets show relatively flat pricing across tiers (small spreads), while others implement steep increases for premium features. This suggests Netflix adjusts its tier structure based on local willingness to pay for features like 4K streaming and multiple simultaneous screens.

### Library Size and Cost Relationship

The correlation analysis reveals a weak relationship between content library size and subscription cost. This indicates that Netflix bases pricing primarily on local economic conditions and competitive dynamics rather than the volume of content available. Countries pay different amounts for similar library sizes, and some high-priced markets do not necessarily have the largest catalogs.