# <div style="text-align: center; background-color:pink; font-family:Georgia, serif; color: black; padding: 20px;line-height: 1;border-radius:5px; border: 2px solid black;">Video Game Sales Analysis: Global Trends and Regional Insights</div>


# Import Library

In [1]:
import pandas as pd
import numpy as np

# About Dataset
* This dataset appears to represent video game sales data, containing information about 16,598 video games. 
* Each row corresponds to a specific video game, and the dataset includes the following columns:

    * Rank: A unique integer identifier that ranks the video games based on a specific criterion (likely sales).
    * Name: The name of the video game.
    * Platform: The gaming platform on which the video game was released (e.g., PS4, Xbox One, PC).
    * Year: The year the game was released. This column has some missing values, with 16,327 non-null entries out of 16,598.
    * Genre: The genre of the video game (e.g., Action, Sports, RPG).
    * Publisher: The company that published the video game. There are a few missing values, with 16,540 non-null entries.
    * NA_Sales: Sales in North America, measured in millions of units.
    * EU_Sales: Sales in Europe, measured in millions of units.
    * JP_Sales: Sales in Japan, measured in millions of units.
    * Other_Sales: Sales in regions other than North America, Europe, and Japan, measured in millions of units.
    * Global_Sales: The total global sales of the video game, measured in millions of units.


# Import data

In [2]:
data=pd.read_csv('Online_game.csv')

In [3]:
data

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.00,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


In [4]:
data.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


In [5]:
data.tail()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.0,0.0,0.0,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.0,0.0,0.0,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.0,0.0,0.0,0.0,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.0,0.01,0.0,0.0,0.01
16597,16600,Spirits & Spells,GBA,2003.0,Platform,Wanadoo,0.01,0.0,0.0,0.0,0.01


In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16598 non-null  int64  
 1   Name          16598 non-null  object 
 2   Platform      16598 non-null  object 
 3   Year          16327 non-null  float64
 4   Genre         16598 non-null  object 
 5   Publisher     16540 non-null  object 
 6   NA_Sales      16598 non-null  float64
 7   EU_Sales      16598 non-null  float64
 8   JP_Sales      16598 non-null  float64
 9   Other_Sales   16598 non-null  float64
 10  Global_Sales  16598 non-null  float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB


# Key Observations:
* The dataset is mostly complete, but there are missing values in the Year and Publisher columns.
* Sales data is provided for multiple regions, allowing for regional analysis and comparison of video game popularity.
* The dataset includes both categorical data (e.g., Name, Platform, Genre, Publisher) and numerical data (e.g., Sales figures).

In [7]:
data.describe()

Unnamed: 0,Rank,Year,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
count,16598.0,16327.0,16598.0,16598.0,16598.0,16598.0,16598.0
mean,8300.605254,2006.406443,0.264667,0.146652,0.077782,0.048063,0.537441
std,4791.853933,5.828981,0.816683,0.505351,0.309291,0.188588,1.555028
min,1.0,1980.0,0.0,0.0,0.0,0.0,0.01
25%,4151.25,2003.0,0.0,0.0,0.0,0.0,0.06
50%,8300.5,2007.0,0.08,0.02,0.0,0.01,0.17
75%,12449.75,2010.0,0.24,0.11,0.04,0.04,0.47
max,16600.0,2020.0,41.49,29.02,10.22,10.57,82.74


In [8]:
data.describe(include='all')

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
count,16598.0,16598,16598,16327.0,16598,16540,16598.0,16598.0,16598.0,16598.0,16598.0
unique,,11493,31,,12,578,,,,,
top,,Need for Speed: Most Wanted,DS,,Action,Electronic Arts,,,,,
freq,,12,2163,,3316,1351,,,,,
mean,8300.605254,,,2006.406443,,,0.264667,0.146652,0.077782,0.048063,0.537441
std,4791.853933,,,5.828981,,,0.816683,0.505351,0.309291,0.188588,1.555028
min,1.0,,,1980.0,,,0.0,0.0,0.0,0.0,0.01
25%,4151.25,,,2003.0,,,0.0,0.0,0.0,0.0,0.06
50%,8300.5,,,2007.0,,,0.08,0.02,0.0,0.01,0.17
75%,12449.75,,,2010.0,,,0.24,0.11,0.04,0.04,0.47


In [9]:
data.describe(include='object')

Unnamed: 0,Name,Platform,Genre,Publisher
count,16598,16598,16598,16540
unique,11493,31,12,578
top,Need for Speed: Most Wanted,DS,Action,Electronic Arts
freq,12,2163,3316,1351


# Checking Null values

In [10]:
data.isnull().sum()

Rank              0
Name              0
Platform          0
Year            271
Genre             0
Publisher        58
NA_Sales          0
EU_Sales          0
JP_Sales          0
Other_Sales       0
Global_Sales      0
dtype: int64

* Data have null values at feature year

# Identifing number of unique values

In [11]:
data.nunique()

Rank            16598
Name            11493
Platform           31
Year               39
Genre              12
Publisher         578
NA_Sales          409
EU_Sales          305
JP_Sales          244
Other_Sales       157
Global_Sales      623
dtype: int64

# Checking duplicate values in data

In [12]:
data.duplicated().sum()

0

In [13]:
data[data.duplicated()]

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales


* No duplicate values

In [14]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16598 non-null  int64  
 1   Name          16598 non-null  object 
 2   Platform      16598 non-null  object 
 3   Year          16327 non-null  float64
 4   Genre         16598 non-null  object 
 5   Publisher     16540 non-null  object 
 6   NA_Sales      16598 non-null  float64
 7   EU_Sales      16598 non-null  float64
 8   JP_Sales      16598 non-null  float64
 9   Other_Sales   16598 non-null  float64
 10  Global_Sales  16598 non-null  float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB


# How many unique video game titles are there in the dataset?

In [15]:
data['Name'].unique()

array(['Wii Sports', 'Super Mario Bros.', 'Mario Kart Wii', ...,
       'Plushees', 'Woody Woodpecker in Crazy Castle 5', 'Know How 2'],
      dtype=object)

In [16]:
data['Name'].nunique()

11493

* In this dataset contain 11493 unique vedio games 

# Which gaming platform has the most titles?

In [17]:
data.groupby('Platform')['Name'].count().sort_values(ascending=False).head()

Platform
DS      2163
PS2     2161
PS3     1329
Wii     1325
X360    1265
Name: Name, dtype: int64

In [18]:
data['Platform'].value_counts().idxmax()

'DS'

* DS platform has the most title

# What is the most common genre of video games?

In [19]:
data['Genre'].value_counts().sort_values(ascending=False)

Action          3316
Sports          2346
Misc            1739
Role-Playing    1488
Shooter         1310
Adventure       1286
Racing          1249
Platform         886
Simulation       867
Fighting         848
Strategy         681
Puzzle           582
Name: Genre, dtype: int64

In [20]:
data['Genre'].value_counts().sort_values(ascending=False).idxmax()

'Action'

* Action type of games are more common genre

# How many missing values are there in the Year and Publisher columns?

In [21]:
print(f'Missing values in Year :',data['Year'].isnull().sum())
print(f'Missing values in Publisher :',data['Publisher'].isnull().sum())

Missing values in Year : 271
Missing values in Publisher : 58


# How can you fill the missing values in the Year column?

In [22]:
data['Year'].fillna(data['Year'].median(),inplace=True)

# How can you deal with the missing Publisher values?

In [23]:
data['Publisher'].fillna('Unknown',inplace=True)

# Convert year to integer

In [24]:
data['Year']=data['Year'].astype('int64')

In [25]:
data.isnull().sum()

Rank            0
Name            0
Platform        0
Year            0
Genre           0
Publisher       0
NA_Sales        0
EU_Sales        0
JP_Sales        0
Other_Sales     0
Global_Sales    0
dtype: int64

# Sales Analysis

# Which video game has the highest global sales?

In [26]:
data.groupby('Name')['Global_Sales'].max().sort_values(ascending=False)

Name
Wii Sports                                        82.74
Super Mario Bros.                                 40.24
Mario Kart Wii                                    35.82
Wii Sports Resort                                 33.00
Pokemon Red/Pokemon Blue                          31.37
                                                  ...  
Muv-Luv Alternative                                0.01
Bullet Soul: Tama Tamashii                         0.01
Miyako: Awayuki no Utage                           0.01
Space Hulk                                         0.01
Horse Life 4: My Horse, My Friend, My Champion     0.01
Name: Global_Sales, Length: 11493, dtype: float64

In [27]:
data.groupby('Name')['Global_Sales'].max().sort_values(ascending=False).idxmax()

'Wii Sports'

In [28]:
data.loc[data['Global_Sales'].idxmax()]

Rank                     1
Name            Wii Sports
Platform               Wii
Year                  2006
Genre               Sports
Publisher         Nintendo
NA_Sales             41.49
EU_Sales             29.02
JP_Sales              3.77
Other_Sales           8.46
Global_Sales         82.74
Name: 0, dtype: object

* Wii Sports has the highest global sales

# How do sales compare across different regions (NA, EU, JP, Other)?

In [29]:
data[['NA_Sales','EU_Sales','JP_Sales','Other_Sales']].mean()

NA_Sales       0.264667
EU_Sales       0.146652
JP_Sales       0.077782
Other_Sales    0.048063
dtype: float64

In [30]:
data.groupby('Year')[['NA_Sales','EU_Sales','JP_Sales','Other_Sales']].mean()

Unnamed: 0_level_0,NA_Sales,EU_Sales,JP_Sales,Other_Sales
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1980,1.176667,0.074444,0.0,0.013333
1981,0.726087,0.042609,0.0,0.006957
1982,0.747778,0.045833,0.0,0.008611
1983,0.456471,0.047059,0.476471,0.008235
1984,2.377143,0.15,1.019286,0.05
1985,2.409286,0.338571,1.04,0.065714
1986,0.595238,0.135238,0.943333,0.091905
1987,0.52875,0.088125,0.726875,0.0125
1988,1.591333,0.439333,1.050667,0.066
1989,2.655882,0.496471,1.08,0.088235


# What is the average sales per genre?

In [54]:
data.groupby(['Genre'])['Global_Sales'].mean().sort_values(ascending=False)

Genre
Platform        0.938341
Shooter         0.791885
Role-Playing    0.623233
Racing          0.586101
Sports          0.567319
Fighting        0.529375
Action          0.528100
Misc            0.465762
Simulation      0.452364
Puzzle          0.420876
Strategy        0.257151
Adventure       0.185879
Name: Global_Sales, dtype: float64

# How have global sales trended over the years?

In [56]:
data.groupby('Year')['Global_Sales'].mean()

Year
1980    1.264444
1981    0.777609
1982    0.801667
1983    0.987647
1984    3.597143
1985    3.852857
1986    1.765238
1987    1.358750
1988    3.148000
1989    4.320588
1990    3.086875
1991    0.786098
1992    1.771163
1993    0.766333
1994    0.654298
1995    0.402329
1996    0.757224
1997    0.695433
1998    0.676702
1999    0.743402
2000    0.577536
2001    0.687697
2002    0.477105
2003    0.461742
2004    0.549554
2005    0.488778
2006    0.516905
2007    0.482831
2008    0.475420
2009    0.466317
2010    0.476926
2011    0.453020
2012    0.553333
2013    0.674194
2014    0.579124
2015    0.430684
2016    0.206192
2017    0.016667
2020    0.290000
Name: Global_Sales, dtype: float64

In [33]:
data.groupby('Year')['Global_Sales'].sum()

Year
1980     11.38
1981     35.77
1982     28.86
1983     16.79
1984     50.36
1985     53.94
1986     37.07
1987     21.74
1988     47.22
1989     73.45
1990     49.39
1991     32.23
1992     76.16
1993     45.98
1994     79.17
1995     88.11
1996    199.15
1997    200.98
1998    256.47
1999    251.27
2000    201.56
2001    331.47
2002    395.52
2003    357.85
2004    419.31
2005    459.94
2006    521.04
2007    711.21
2008    678.90
2009    667.30
2010    600.45
2011    515.99
2012    363.54
2013    368.11
2014    337.05
2015    264.44
2016     70.93
2017      0.05
2020      0.29
Name: Global_Sales, dtype: float64

# Video games released after 2010

In [34]:
data_2010=data[data['Year']>2010].reset_index(drop=True)
data_2010

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,17,Grand Theft Auto V,PS3,2013,Action,Take-Two Interactive,7.01,9.27,0.97,4.14,21.40
1,24,Grand Theft Auto V,X360,2013,Action,Take-Two Interactive,9.63,5.31,0.06,1.38,16.38
2,30,Call of Duty: Modern Warfare 3,X360,2011,Shooter,Activision,9.03,4.28,0.13,1.32,14.76
3,33,Pokemon X/Pokemon Y,3DS,2013,Role-Playing,Nintendo,5.17,4.05,4.34,0.79,14.35
4,34,Call of Duty: Black Ops 3,PS4,2015,Shooter,Activision,5.77,5.81,0.35,2.31,14.24
...,...,...,...,...,...,...,...,...,...,...,...
3881,16579,Rugby Challenge 3,XOne,2016,Sports,Alternative Software,0.00,0.01,0.00,0.00,0.01
3882,16581,Outdoors Unleashed: Africa 3D,3DS,2011,Sports,Mastiff,0.01,0.00,0.00,0.00,0.01
3883,16584,Fit & Fun,Wii,2011,Sports,Unknown,0.00,0.01,0.00,0.00,0.01
3884,16588,Breach,PC,2011,Shooter,Destineer,0.01,0.00,0.00,0.00,0.01


# Global_Sales in descending order

In [35]:
sort_gb=data.sort_values(by='Global_Sales',ascending=False).reset_index(drop=True)
sort_gb

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16593,16189,BattleForge,PC,2009,Strategy,Electronic Arts,0.00,0.01,0.00,0.00,0.01
16594,16190,Jewel Quest II,PC,2007,Puzzle,Avanquest,0.00,0.01,0.00,0.00,0.01
16595,16191,Toro to Morimori,PS3,2009,Misc,Sony Computer Entertainment,0.00,0.00,0.01,0.00,0.01
16596,16192,Sonic & All-Stars Racing Transformed,PC,2013,Racing,Sega,0.00,0.01,0.00,0.00,0.01


# Find the top 10 games in the Action genre by global sales?

In [36]:
action_data=data[data['Genre'].isin(['Action'])]
action_data.sort_values(by='Global_Sales',ascending=False).head(10).reset_index(drop=True)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,17,Grand Theft Auto V,PS3,2013,Action,Take-Two Interactive,7.01,9.27,0.97,4.14,21.4
1,18,Grand Theft Auto: San Andreas,PS2,2004,Action,Take-Two Interactive,9.43,0.4,0.41,10.57,20.81
2,24,Grand Theft Auto V,X360,2013,Action,Take-Two Interactive,9.63,5.31,0.06,1.38,16.38
3,25,Grand Theft Auto: Vice City,PS2,2002,Action,Take-Two Interactive,8.41,5.49,0.47,1.78,16.15
4,39,Grand Theft Auto III,PS2,2001,Action,Take-Two Interactive,6.99,4.51,0.3,1.3,13.1
5,45,Grand Theft Auto V,PS4,2014,Action,Take-Two Interactive,3.8,5.81,0.36,2.02,11.98
6,46,Pokemon HeartGold/Pokemon SoulSilver,DS,2009,Action,Nintendo,4.4,2.77,3.96,0.77,11.9
7,52,Grand Theft Auto IV,X360,2008,Action,Take-Two Interactive,6.76,3.1,0.14,1.03,11.02
8,57,Grand Theft Auto IV,PS3,2008,Action,Take-Two Interactive,4.76,3.76,0.44,1.62,10.57
9,83,FIFA Soccer 13,PS3,2012,Action,Electronic Arts,1.06,5.05,0.13,2.01,8.24


# Find the top 10 games in the Adventure genre by global sales?

In [37]:
Adventure_data=data[data['Genre'].isin(['Adventure'])]
Adventure_data.sort_values(by='Global_Sales',ascending=False).head(10).reset_index(drop=True)

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,51,Super Mario Land 2: 6 Golden Coins,GB,1992,Adventure,Nintendo,6.16,2.04,2.69,0.29,11.18
1,159,Assassin's Creed,X360,2007,Adventure,Ubisoft,3.28,1.65,0.07,0.55,5.55
2,219,Assassin's Creed,PS3,2007,Adventure,Ubisoft,1.91,2.0,0.09,0.83,4.83
3,252,Zelda II: The Adventure of Link,NES,1987,Adventure,Nintendo,2.19,0.5,1.61,0.08,4.38
4,401,Rugrats: Search For Reptar,PS,1998,Adventure,THQ,1.63,1.53,0.0,0.18,3.34
5,418,L.A. Noire,PS3,2011,Adventure,Take-Two Interactive,1.27,1.33,0.12,0.51,3.23
6,435,Club Penguin: Elite Penguin Force,DS,2008,Adventure,Disney Interactive Studios,1.88,0.98,0.0,0.3,3.16
7,448,Heavy Rain,PS3,2010,Adventure,Sony Computer Entertainment,1.29,1.27,0.06,0.5,3.12
8,522,Myst,PC,1994,Adventure,Red Orb,0.02,2.79,0.0,0.0,2.81
9,550,L.A. Noire,X360,2011,Adventure,Take-Two Interactive,1.52,0.94,0.02,0.24,2.72


In [38]:
data['Genre'].unique()

array(['Sports', 'Platform', 'Racing', 'Role-Playing', 'Puzzle', 'Misc',
       'Shooter', 'Simulation', 'Action', 'Fighting', 'Adventure',
       'Strategy'], dtype=object)

# What are the total global sales per genre?

In [57]:
data.groupby('Genre')['Global_Sales'].sum().sort_values(ascending=False)

Genre
Action          1751.18
Sports          1330.93
Shooter         1037.37
Role-Playing     927.37
Platform         831.37
Misc             809.96
Racing           732.04
Fighting         448.91
Simulation       392.20
Puzzle           244.95
Adventure        239.04
Strategy         175.12
Name: Global_Sales, dtype: float64

# Which publisher has the highest total sales in North America?

In [40]:
data.groupby('Publisher')['NA_Sales'].sum().sort_values(ascending=False)

Publisher
Nintendo                       816.87
Electronic Arts                595.07
Activision                     429.70
Sony Computer Entertainment    265.22
Ubisoft                        253.43
                                ...  
Graphsim Entertainment           0.00
Revolution (Japan)               0.00
Revolution Software              0.00
Grand Prix Games                 0.00
Seventh Chord                    0.00
Name: NA_Sales, Length: 578, dtype: float64

In [41]:
data.groupby('Publisher')['NA_Sales'].sum().idxmax()

'Nintendo'

# Which publisher has the highest total sales in Europe?

In [42]:
data.groupby('Publisher')['EU_Sales'].sum().idxmax()

'Nintendo'

# Which publisher has the maximum sales in Europe?

In [60]:
data.groupby('Publisher')['EU_Sales'].max().idxmax()

'Nintendo'

# Which publisher has the highest total sales in Japan?

In [43]:
data.groupby('Publisher')['JP_Sales'].sum().idxmax()

'Nintendo'

# Which publisher has the highest total sales in Global_Sales?

In [61]:
data.groupby('Publisher')['Global_Sales'].sum().idxmax()

'Nintendo'

# Which publisher has the highest total sales in other than North America, Europe, and Japan?

In [44]:
data.groupby('Publisher')['Other_Sales'].sum().idxmax()

'Electronic Arts'

### Best Publishers in Different Regions:

* Europe, Japan, and North America:
  * Nintendo is the top-performing publisher across these regions. It has the highest total sales in Europe, Japan, and North America, showcasing its dominance in these key gaming markets.

* Other Regions (excluding North America, Europe, and Japan):
  * Electronic Arts (EA) leads in the regions outside of North America, Europe, and Japan. EA has the highest total sales in these other markets, making it the best publisher in these regions.



# What is the average sales per platform?

In [45]:
data.groupby('Platform')['Global_Sales'].mean()

Platform
2600    0.729925
3DO     0.033333
3DS     0.486169
DC      0.307115
DS      0.380254
GB      2.606633
GBA     0.387470
GC      0.358561
GEN     1.050370
GG      0.040000
N64     0.686144
NES     2.561939
NG      0.120000
PC      0.269604
PCFX    0.030000
PS      0.610920
PS2     0.581046
PS3     0.720722
PS4     0.827679
PSP     0.244254
PSV     0.149952
SAT     0.194162
SCD     0.311667
SNES    0.837029
TG16    0.080000
WS      0.236667
Wii     0.699404
WiiU    0.572448
X360    0.774672
XB      0.313422
XOne    0.662254
Name: Global_Sales, dtype: float64

#  Total sales by genre and platform?

In [46]:
data.pivot_table(index='Genre',columns='Platform',values='Global_Sales',aggfunc='sum')

Platform,2600,3DO,3DS,DC,DS,GB,GBA,GC,GEN,GG,...,SAT,SCD,SNES,TG16,WS,Wii,WiiU,X360,XB,XOne
Genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Action,29.34,,57.02,1.26,115.56,7.92,55.76,37.84,2.74,,...,0.65,,10.08,,,118.58,19.35,242.67,49.28,33.79
Adventure,1.7,0.06,4.81,2.5,47.29,17.16,14.68,5.93,0.19,,...,4.16,,1.5,0.14,,18.43,0.17,15.23,3.06,2.51
Fighting,1.24,,10.46,1.83,7.2,,4.21,18.43,5.9,,...,8.52,,26.95,,,23.86,6.36,37.64,13.55,2.31
Misc,3.58,,10.48,,137.76,13.35,36.25,16.73,0.03,,...,1.2,0.1,5.02,,,221.06,12.23,91.96,9.58,6.86
Platform,13.27,,32.23,2.54,77.45,54.91,78.3,28.66,15.45,0.04,...,0.76,1.5,65.65,,,90.74,21.24,11.39,9.66,0.81
Puzzle,14.68,0.02,5.57,,84.29,47.47,12.92,4.7,,,...,1.0,,6.38,,,15.67,1.33,0.85,0.42,
Racing,2.91,,14.49,2.65,38.64,4.55,18.8,21.89,0.26,,...,2.4,0.07,13.49,,,61.28,7.77,65.99,31.49,8.8
Role-Playing,,,75.74,0.68,126.85,88.24,64.21,13.15,0.27,,...,3.76,0.06,36.43,,1.22,14.06,2.47,71.98,13.51,9.48
Shooter,26.48,,1.29,0.33,8.2,1.2,3.6,13.63,0.13,,...,3.98,,6.07,0.02,,28.77,6.17,278.55,63.55,51.61
Simulation,0.45,0.02,27.08,0.52,132.03,3.55,5.91,8.59,,,...,1.13,,5.63,,,36.97,0.21,14.45,7.11,0.54


In [47]:
data.pivot_table(columns='Genre',index='Platform',values='Global_Sales',aggfunc='sum')

Genre,Action,Adventure,Fighting,Misc,Platform,Puzzle,Racing,Role-Playing,Shooter,Simulation,Sports,Strategy
Platform,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2600,29.34,1.7,1.24,3.58,13.27,14.68,2.91,,26.48,0.45,3.43,
3DO,,0.06,,,,0.02,,,,0.02,,
3DS,57.02,4.81,10.46,10.48,32.23,5.57,14.49,75.74,1.29,27.08,6.2,2.09
DC,1.26,2.5,1.83,,2.54,,2.65,0.68,0.33,0.52,3.66,
DS,115.56,47.29,7.2,137.76,77.45,84.29,38.64,126.85,8.2,132.03,31.83,15.39
GB,7.92,17.16,,13.35,54.91,47.47,4.55,88.24,1.2,3.55,9.05,8.05
GBA,55.76,14.68,4.21,36.25,78.3,12.92,18.8,64.21,3.6,5.91,16.41,7.45
GC,37.84,5.93,18.43,16.73,28.66,4.7,21.89,13.15,13.63,8.59,25.49,4.32
GEN,2.74,0.19,5.9,0.03,15.45,,0.26,0.27,0.13,,3.2,0.19
GG,,,,,0.04,,,,,,,


# What is the median sales value per year for each platform?

In [48]:
data.groupby(['Year','Platform'])['Global_Sales'].median()

Year  Platform
1980  2600        0.770
1981  2600        0.465
1982  2600        0.540
1983  2600        0.460
      NES         1.635
                  ...  
2016  X360        0.065
      XOne        0.065
2017  PS4         0.030
      PSV         0.010
2020  DS          0.290
Name: Global_Sales, Length: 247, dtype: float64

In [49]:
data.pivot_table(index='Platform',columns='Year',values='Global_Sales',aggfunc='median')

Year,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,...,2009,2010,2011,2012,2013,2014,2015,2016,2017,2020
Platform,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2600,0.77,0.465,0.54,0.46,0.27,0.45,0.33,0.355,0.375,0.31,...,,,,,,,,,,
3DO,,,,,,,,,,,...,,,,,,,,,,
3DS,,,,,,,,,,,...,,,0.145,0.19,0.1,0.09,0.085,0.07,,
DC,,,,,,,,,,,...,,,,,,,,,,
DS,,,,,,0.02,,,,,...,0.11,0.11,0.09,0.03,0.15,0.02,,,,0.29
GB,,,,,,,,,1.43,1.965,...,,,,,,,,,,
GBA,,,,,,,,,,,...,,,,,,,,,,
GC,,,,,,,,,,,...,,,,,,,,,,
GEN,,,,,,,,,,,...,,,,,,,,,,
GG,,,,,,,,,,,...,,,,,,,,,,


# Market share by publisher?

In [50]:
data.groupby('Publisher')['Global_Sales'].sum()

Publisher
10TACLE Studios                  0.11
1C Company                       0.10
20th Century Fox Video Games     1.94
2D Boy                           0.04
3DO                             10.12
                                ...  
id Software                      0.03
imageepoch Inc.                  0.04
inXile Entertainment             0.10
mixi, Inc                        0.86
responDESIGN                     0.13
Name: Global_Sales, Length: 578, dtype: float64

# What is the correlation between North American and European sales?

In [51]:
correlation_na_eu = data['NA_Sales'].corr(data['EU_Sales'])
correlation_na_eu

0.7677267483702562

# Categorizes games into high, medium, and low sales based on their global sales figures?

In [52]:
data['Sales_Category'] = pd.cut(data['Global_Sales'], bins=[0, 1, 5, data['Global_Sales'].max()], labels=['Low', 'Medium', 'High'])


In [53]:
data.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Sales_Category
0,1,Wii Sports,Wii,2006,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74,High
1,2,Super Mario Bros.,NES,1985,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24,High
2,3,Mario Kart Wii,Wii,2008,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82,High
3,4,Wii Sports Resort,Wii,2009,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0,High
4,5,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37,High
