# Video Game Sales - Exploratory Data Analysis

This notebook performs exploratory data analysis on video game sales data.

## Dataset Information

The dataset contains the following columns:
- **Name**: Title of the game
- **Platform**: Gaming platform
- **Year_of_Release**: Year the game was released
- **Genre**: Game genre
- **Publisher**: Company that published the game
- **NA_Sales**: Sales in North America (millions)
- **EU_Sales**: Sales in Europe (millions)
- **JP_Sales**: Sales in Japan (millions)
- **Other_Sales**: Sales in other regions (millions)
- **Global_Sales**: Total worldwide sales (millions)
- **Critic_Score**: Aggregate score by Metacritic staff
- **Critic_Count**: Number of critics
- **User_Score**: Score by Metacritic subscribers
- **User_Count**: Number of user ratings
- **Developer**: Game developer
- **Rating**: ESRB rating

## 1. Setup and Data Loading

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import os

# Add src directory to path
sys.path.append('../src')

# Import custom utilities
from data_utils import (
    load_video_game_data,
    get_data_info,
    clean_video_game_data,
    filter_by_year,
    get_top_games_by_sales,
    get_sales_by_platform,
    get_sales_by_genre
)

from visualization import (
    plot_sales_trend_by_year,
    plot_top_platforms,
    plot_top_genres,
    plot_regional_sales_comparison,
    plot_sales_distribution,
    plot_score_vs_sales,
    plot_correlation_heatmap,
    plot_top_publishers
)

# Configure display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

# Set plot style
sns.set_style('whitegrid')
%matplotlib inline

In [None]:
# Load the data
# Replace 'your_data_file.csv' with the actual path to your data file
# data_path = '../data/raw/video_game_sales.csv'
# df = load_video_game_data(data_path)

# For demonstration purposes, we'll create a sample dataframe structure
print("Please load your video game sales data file")
print("Example: df = load_video_game_data('../data/raw/video_game_sales.csv')")

## 2. Initial Data Exploration

In [None]:
# Display basic information about the dataset
# get_data_info(df)

In [None]:
# Display first few rows
# df.head(10)

## 3. Data Cleaning

In [None]:
# Clean the data
# df_clean = clean_video_game_data(df)
# print("Data cleaned successfully")

## 4. Sales Analysis

### 4.1 Top Games by Global Sales

In [None]:
# Get top 20 games by global sales
# top_games = get_top_games_by_sales(df_clean, n=20)
# top_games[['Name', 'Platform', 'Year_of_Release', 'Genre', 'Publisher', 'Global_Sales']]

### 4.2 Sales by Platform

In [None]:
# Analyze sales by platform
# platform_sales = get_sales_by_platform(df_clean)
# print(platform_sales.head(10))

# Visualize top platforms
# plot_top_platforms(df_clean, n=15)

### 4.3 Sales by Genre

In [None]:
# Analyze sales by genre
# genre_sales = get_sales_by_genre(df_clean)
# print(genre_sales)

# Visualize top genres
# plot_top_genres(df_clean, n=12)

### 4.4 Regional Sales Comparison

In [None]:
# Compare sales across regions
# plot_regional_sales_comparison(df_clean)

### 4.5 Sales Trends Over Time

In [None]:
# Plot sales trends by year
# plot_sales_trend_by_year(df_clean, sales_column='Global_Sales')

## 5. Publisher and Developer Analysis

In [None]:
# Analyze top publishers
# plot_top_publishers(df_clean, n=15)

## 6. Score Analysis

### 6.1 Critic Score vs Sales

In [None]:
# Analyze relationship between critic scores and sales
# plot_score_vs_sales(df_clean, score_column='Critic_Score', sales_column='Global_Sales')

### 6.2 User Score vs Sales

In [None]:
# Analyze relationship between user scores and sales
# plot_score_vs_sales(df_clean, score_column='User_Score', sales_column='Global_Sales')

## 7. Correlation Analysis

In [None]:
# Generate correlation heatmap
# plot_correlation_heatmap(df_clean)

## 8. Rating Analysis

In [None]:
# Analyze sales by ESRB rating
# if 'Rating' in df_clean.columns:
#     rating_sales = df_clean.groupby('Rating')['Global_Sales'].agg(['sum', 'mean', 'count']).sort_values('sum', ascending=False)
#     print(rating_sales)
#     
#     # Visualize
#     plt.figure(figsize=(10, 6))
#     rating_sales['sum'].plot(kind='bar', color='steelblue')
#     plt.xlabel('Rating')
#     plt.ylabel('Total Global Sales (millions)')
#     plt.title('Total Sales by ESRB Rating')
#     plt.xticks(rotation=45)
#     plt.tight_layout()
#     plt.show()

## 9. Summary and Insights

Based on the analysis, key insights can be derived:

1. **Sales Trends**: Identify how video game sales have evolved over time
2. **Popular Platforms**: Determine which gaming platforms have the highest sales
3. **Top Genres**: Understand which game genres are most successful
4. **Regional Preferences**: Analyze regional differences in gaming preferences
5. **Score Impact**: Evaluate how critic and user scores correlate with sales
6. **Publisher Success**: Identify the most successful publishers in the industry
7. **Rating Distribution**: Understand the relationship between ESRB ratings and sales