# <div style="text-align: center; background-color:pink; font-family:Georgia, serif; color: black; padding: 20px;line-height: 1;border-radius:5px; border: 2px solid black;">Video Game Sales Analysis: Global Trends and Regional Insights</div>


# Import Library

In [15]:
import pandas as pd
import numpy as np

# About Dataset
* This dataset appears to represent video game sales data, containing information about 16,598 video games. 
* Each row corresponds to a specific video game, and the dataset includes the following columns:

    * Rank: A unique integer identifier that ranks the video games based on a specific criterion (likely sales).
    * Name: The name of the video game.
    * Platform: The gaming platform on which the video game was released (e.g., PS4, Xbox One, PC).
    * Year: The year the game was released. This column has some missing values, with 16,327 non-null entries out of 16,598.
    * Genre: The genre of the video game (e.g., Action, Sports, RPG).
    * Publisher: The company that published the video game. There are a few missing values, with 16,540 non-null entries.
    * NA_Sales: Sales in North America, measured in millions of units.
    * EU_Sales: Sales in Europe, measured in millions of units.
    * JP_Sales: Sales in Japan, measured in millions of units.
    * Other_Sales: Sales in regions other than North America, Europe, and Japan, measured in millions of units.
    * Global_Sales: The total global sales of the video game, measured in millions of units.


# Import data

In [16]:
data=pd.read_csv('Online_game.csv')

In [17]:
data

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.00,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


In [18]:
data.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


In [19]:
data.tail()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.0,0.0,0.0,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.0,0.0,0.0,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.0,0.0,0.0,0.0,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.0,0.01,0.0,0.0,0.01
16597,16600,Spirits & Spells,GBA,2003.0,Platform,Wanadoo,0.01,0.0,0.0,0.0,0.01


In [20]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16598 non-null  int64  
 1   Name          16598 non-null  object 
 2   Platform      16598 non-null  object 
 3   Year          16327 non-null  float64
 4   Genre         16598 non-null  object 
 5   Publisher     16540 non-null  object 
 6   NA_Sales      16598 non-null  float64
 7   EU_Sales      16598 non-null  float64
 8   JP_Sales      16598 non-null  float64
 9   Other_Sales   16598 non-null  float64
 10  Global_Sales  16598 non-null  float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB


# Key Observations:
* The dataset is mostly complete, but there are missing values in the Year and Publisher columns.
* Sales data is provided for multiple regions, allowing for regional analysis and comparison of video game popularity.
* The dataset includes both categorical data (e.g., Name, Platform, Genre, Publisher) and numerical data (e.g., Sales figures).

In [21]:
data.describe()

Unnamed: 0,Rank,Year,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
count,16598.0,16327.0,16598.0,16598.0,16598.0,16598.0,16598.0
mean,8300.605254,2006.406443,0.264667,0.146652,0.077782,0.048063,0.537441
std,4791.853933,5.828981,0.816683,0.505351,0.309291,0.188588,1.555028
min,1.0,1980.0,0.0,0.0,0.0,0.0,0.01
25%,4151.25,2003.0,0.0,0.0,0.0,0.0,0.06
50%,8300.5,2007.0,0.08,0.02,0.0,0.01,0.17
75%,12449.75,2010.0,0.24,0.11,0.04,0.04,0.47
max,16600.0,2020.0,41.49,29.02,10.22,10.57,82.74


In [22]:
data.describe(include='all')

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
count,16598.0,16598,16598,16327.0,16598,16540,16598.0,16598.0,16598.0,16598.0,16598.0
unique,,11493,31,,12,578,,,,,
top,,Need for Speed: Most Wanted,DS,,Action,Electronic Arts,,,,,
freq,,12,2163,,3316,1351,,,,,
mean,8300.605254,,,2006.406443,,,0.264667,0.146652,0.077782,0.048063,0.537441
std,4791.853933,,,5.828981,,,0.816683,0.505351,0.309291,0.188588,1.555028
min,1.0,,,1980.0,,,0.0,0.0,0.0,0.0,0.01
25%,4151.25,,,2003.0,,,0.0,0.0,0.0,0.0,0.06
50%,8300.5,,,2007.0,,,0.08,0.02,0.0,0.01,0.17
75%,12449.75,,,2010.0,,,0.24,0.11,0.04,0.04,0.47


In [23]:
data.describe(include='object')

Unnamed: 0,Name,Platform,Genre,Publisher
count,16598,16598,16598,16540
unique,11493,31,12,578
top,Need for Speed: Most Wanted,DS,Action,Electronic Arts
freq,12,2163,3316,1351


# Checking Null values

In [24]:
data.isnull().sum()

Rank              0
Name              0
Platform          0
Year            271
Genre             0
Publisher        58
NA_Sales          0
EU_Sales          0
JP_Sales          0
Other_Sales       0
Global_Sales      0
dtype: int64

* Data have null values at feature year

# Identifing number of unique values

In [11]:
data.nunique()

Rank            16598
Name            11493
Platform           31
Year               39
Genre              12
Publisher         578
NA_Sales          409
EU_Sales          305
JP_Sales          244
Other_Sales       157
Global_Sales      623
dtype: int64

# Checking duplicate values in data

In [12]:
data.duplicated().sum()

0

In [13]:
data[data.duplicated()]

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales


* No duplicate values

In [None]:
data.info()

# How many unique video game titles are there in the dataset?

In [None]:
data['Name'].unique()

In [None]:
data['Name'].nunique()

* In this dataset contain 11493 unique vedio games 

# Which gaming platform has the most titles?

In [None]:
data.groupby('Platform')['Name'].count().sort_values(ascending=False).head()

In [None]:
data['Platform'].value_counts().idxmax()

* DS platform has the most title

# What is the most common genre of video games?

In [None]:
data['Genre'].value_counts().sort_values(ascending=False)

In [None]:
data['Genre'].value_counts().sort_values(ascending=False).idxmax()

* Action type of games are more common genre

# How many missing values are there in the Year and Publisher columns?

In [None]:
print(f'Missing values in Year :',data['Year'].isnull().sum())
print(f'Missing values in Publisher :',data['Publisher'].isnull().sum())

# How can you fill the missing values in the Year column?

In [None]:
data['Year'].fillna(data['Year'].median(),inplace=True)

# How can you deal with the missing Publisher values?

In [None]:
data['Publisher'].fillna('Unknown',inplace=True)

# Convert year to integer

In [14]:
data['Year']=data['Year'].astype('int64')

IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer

In [None]:
data.isnull().sum()

# Sales Analysis

# Which video game has the highest global sales?

In [None]:
data.groupby('Name')['Global_Sales'].max().sort_values(ascending=False)

In [None]:
data.groupby('Name')['Global_Sales'].max().sort_values(ascending=False).idxmax()

In [None]:
data.loc[data['Global_Sales'].idxmax()]

* Wii Sports has the highest global sales

# How do sales compare across different regions (NA, EU, JP, Other)?

In [None]:
data[['NA_Sales','EU_Sales','JP_Sales','Other_Sales']].mean()

In [None]:
data.groupby('Year')[['NA_Sales','EU_Sales','JP_Sales','Other_Sales']].mean()

# What is the average sales per genre?

In [None]:
data.groupby(['Genre'])['Global_Sales'].mean()

# How have global sales trended over the years?

In [None]:
data.groupby('Year')['Global_Sales'].mean()

In [None]:
data.groupby('Year')['Global_Sales'].sum()

# How can you filter the dataset to show only video games released after 2010?

In [None]:
data_2010=data[data['Year']>2010].reset_index(drop=True)
data_2010

# How can you sort the dataset by Global_Sales in descending order?

In [None]:
sort_gb=data.sort_values(by='Global_Sales',ascending=False).reset_index(drop=True)
sort_gb

# Filter and sort the dataset to find the top 10 games in the Action genre by global sales?

In [None]:
action_data=data[data['Genre'].isin(['Action'])]
action_data.sort_values(by='Global_Sales',ascending=False).head(10).reset_index(drop=True)

# Filter and sort the dataset to find the top 10 games in the Adventure genre by global sales?

In [None]:
Adventure_data=data[data['Genre'].isin(['Adventure'])]
Adventure_data.sort_values(by='Global_Sales',ascending=False).head(10).reset_index(drop=True)

In [None]:
data['Genre'].unique()

# What are the total global sales per genre?

In [None]:
data.groupby('Genre')['Global_Sales'].sum()

# Which publisher has the highest total sales in North America?

In [None]:
data.groupby('Publisher')['NA_Sales'].sum().sort_values(ascending=False)

In [None]:
data.groupby('Publisher')['NA_Sales'].sum().idxmax()

# Which publisher has the highest total sales in Europe?

In [None]:
data.groupby('Publisher')['EU_Sales'].sum().idxmax()

# Which publisher has the highest total sales in Japan?

In [None]:
data.groupby('Publisher')['JP_Sales'].sum().idxmax()

# Which publisher has the highest total sales in other than North America, Europe, and Japan?

In [None]:
data.groupby('Publisher')['Other_Sales'].sum().idxmax()

### Best Publishers in Different Regions:

* Europe, Japan, and North America:
  * Nintendo is the top-performing publisher across these regions. It has the highest total sales in Europe, Japan, and North America, showcasing its dominance in these key gaming markets.

* Other Regions (excluding North America, Europe, and Japan):
  * Electronic Arts (EA) leads in the regions outside of North America, Europe, and Japan. EA has the highest total sales in these other markets, making it the best publisher in these regions.



# What is the average sales per platform?

In [None]:
data.groupby('Platform')['Global_Sales'].mean()

# How can you pivot the dataset to show the total sales by genre and platform?

In [None]:
data.pivot_table(index='Genre',columns='Platform',values='Global_Sales',aggfunc='sum')

In [None]:
data.pivot_table(columns='Genre',index='Platform',values='Global_Sales',aggfunc='sum')

# What is the median sales value per year for each platform?

In [None]:
data.groupby(['Year','Platform'])['Global_Sales'].median()

In [None]:
data.pivot_table(index='Platform',columns='Year',values='Global_Sales',aggfunc='median')

# Market share by publisher?

In [None]:
data.groupby('Publisher')['Global_Sales'].sum()

# What is the correlation between North American and European sales?

In [None]:
correlation_na_eu = data['NA_Sales'].corr(data['EU_Sales'])
correlation_na_eu

# Create a new column that categorizes games into high, medium, and low sales based on their global sales figures?

In [None]:
data['Sales_Category'] = pd.cut(data['Global_Sales'], bins=[0, 1, 5, data['Global_Sales'].max()], labels=['Low', 'Medium', 'High'])


In [None]:
data.head()