To start with analyzing dataset, pre-analysis is very important, sometimes more important than the EDA itself.

There are enormous amount of information that can be abstracted from the dataset. Some of them are valuable, some are not.

In terms of the pre_analysis, I try to figure out the following questions:
* what valueable information this dataset can give us
* what questions people will ask when they first see this dataset
* what insight we can do by combining various variables together
* what tools or figures will be just good enough to answer the good questions

**Then my pre-analysis concluded in the following items that are interested for me:**

1. Video Game Market Scale
2. Market's preference interms of different games

**To respond to these items, I decided to play with the dataset in the following aspects:**

1. Video Game Market Scale
   - [Game Amount by Year](#section1)
   - [Game Amount by Platform by Year](#section2)
   - [Game Amount by Genre by Year](#section3)
   - [Game Amount by Publisher by Year](#section4)
   - [Publisher Amount by Year](#section5)
   - [Global and Regional Sales by year](#section6)
   - [Sales Analysis by Region](#section7)

2. Market's preference interms of different games
   - [The most popular genre, regional and global, based on sales](#section8)
   - [The most popular publisher, regional and global, based on sales](#section9)
   - [The most popular platform, regional and global, based on sales](#section10)
   - [Publisher's preference in terms of Genre](#section11)

Please have a look on this EDA practice and any comment is surely welcome.

**Chapter 1: Pre-processing of the dataset**

In [None]:
#Data Analysis & Data wrangling
import numpy as np
import pandas as pd
import missingno as mn
from collections import Counter

#Visualization
import matplotlib.pyplot as plt
import matplotlib.style as style
from matplotlib.colors import ListedColormap
from matplotlib import cm
import seaborn as sns
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
%matplotlib inline

#Plotly Libraris
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.colors import n_colors
from plotly.subplots import make_subplots
from plotly.offline import init_notebook_mode, iplot
from plotly import tools
from IPython.display import display, HTML

import warnings
warnings.filterwarnings('ignore')

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
original_df = pd.read_csv('/kaggle/input/videogamesales/vgsales.csv')
original_df.head()

In [None]:
original_df.describe()

In [None]:
original_df.info()

In [None]:
Null = pd.DataFrame({'null values': original_df.isna().sum(), 'Percentage null values': original_df.isna().sum()/(original_df.shape[0]*100)})
Null

In [None]:
mean_year = int(original_df['Year'].mean())
mean_year

In [None]:
original_df['Year'] = original_df['Year'].fillna(mean_year)
clean_df = original_df.dropna(axis = 0)
clean_df.info()

In [None]:
filter = clean_df['Year']==2020
clean_df = clean_df[~filter] #delete the publish year 2020 since the dataset is updated before 2020

In [None]:
clean_df.head()

In [None]:
mn.bar(clean_df)

**Chapter 2: Global view of data set**

In [None]:
#for col in clean_df.columns:
    #print(f'{col}: \n {clean_df[col].unique()} \n')

***Chapter 3: Video Game Market Scale***

**Chapter 3.1: Game Amount per Year**
<a id="section1"></a>

In [None]:
for x in clean_df['Year']:
    int(x)

clean_df.head()

In [None]:
Amount_per_year = clean_df['Year'].value_counts().reset_index()
Amount_per_year.columns = ['Year', 'Amount_per_year']
Amount_per_year

In [None]:
px.bar(Amount_per_year, x = 'Year', y = 'Amount_per_year', title = 'Amount of Game per Year')

***conclusion：***
* The release amount of the video game increases fastly after 1995.
* The release amount reaches top around 2008, which is around 1400 games per year.
* After 2011, the game amount reduces a lot about 40% comparing to that in 2011.
* During 2003 and 2004, game amount dereases a bit as well. The reason for these two decreasing is quite insteresting.

**Chapter 3.2: Game Amount by Platform by Year**
<a id="section2"></a>

In [None]:
Amount_per_platform = clean_df['Platform'].value_counts().reset_index()
Amount_per_platform.columns = ['Platform', 'Amount_of_publish']
Amount_per_platform

In [None]:
Amount_by_platform = clean_df['Platform'].groupby(clean_df['Year']).value_counts().reset_index('Year')
Amount_by_platform.columns = ['Year', 'Amount']
Amount_by_platform.reset_index(inplace = True)
Amount_by_platform

In [None]:
list(Amount_by_platform.columns)

In [None]:
px.bar(Amount_by_platform, x = 'Year', y = 'Amount',color = 'Platform' ,
      title = 'Games Amount by Platform by Year')

In [None]:
px.line(Amount_by_platform, x = 'Year', y = 'Amount',color = 'Platform',
       title = 'Games Amount by Platform by Year')

***Conclusion:***

* Most games run on PS before 2000, while Wii get the most games (492 games) in 2008.
* It seems like PS series including (PS and PS2) are the most popular game platform before 2006 and Nintendo's protable DS and platform Wii are most popular after 2006.
* Interms of the game platform, Microsoft's XB series are not that competitive comparing with Sony and Nintendo.

**Chaper 3.3: Games Amount by Genre by Year**
<a id="section3"></a>

In [None]:
Amount_by_genre = clean_df.groupby('Year')['Genre'].value_counts().reset_index('Year')
Amount_by_genre.columns = ['Year','Amount']
Amount_by_genre.reset_index(inplace=True)
Amount_by_genre


In [None]:
Top_genre = clean_df['Genre'].value_counts().reset_index()
type(Top_genre)
Top_genre_10 = Top_genre['index'][:10]
Top_genre_10 = Top_genre_10.tolist()

Amount_by_genre_top = Amount_by_genre.set_index('Genre')
Amount_by_genre_top

In [None]:
Amount_by_genre_top = Amount_by_genre_top.loc[Top_genre_10, :]
Amount_by_genre_top.reset_index(inplace = True)
Amount_by_genre_top

In [None]:
px.line(Amount_by_genre_top, x = 'Year', y = 'Amount',color = 'Genre',
       title = 'Games Amount by Genre by Year')

***Conclusion:***

* In early stage (before 2002), Sports game is the most popular one. However, Action game goes beyond since 2003.
* It is observed that during 2003 and 2004 most game amount reduces, which refers to the bad market during this two years. 
* The genre of Misc increases fastly after 2004, which means that the genre of game diversified.
* After 2011, though the total amount of game decreases, Action game is still at high level so that Action games occupies most of the game market.

**Chapter 3.4: Games Amount by Publisher by Year**
<a id="section4"></a>

In [None]:
Amount_by_pub = clean_df.groupby('Year')['Publisher'].value_counts().reset_index('Year')
Amount_by_pub.columns = ['Year','Amount']
Amount_by_pub.reset_index(inplace=True)
Amount_by_pub

In [None]:
Top_pub = clean_df['Publisher'].value_counts().reset_index()
Top_pub_10 = Top_pub['index'][:10]
Top_pub_10 = Top_pub_10.tolist()

Amount_by_pub_top = Amount_by_pub.set_index('Publisher')
Amount_by_pub_top

In [None]:
Amount_by_pub_top = Amount_by_pub_top.loc[Top_pub_10, :]
Amount_by_pub_top.reset_index(inplace = True)
Amount_by_pub_top

In [None]:
px.line(Amount_by_pub_top, x = 'Year', y = 'Amount',color = 'Publisher',
       title = 'Games Amount by Publisher by Year')

***Conclusion:***
* EA is the top game producer during 2000 and 2008, while Activision reaches top fastly in 2009. 

**Chapter 3.5: Publisher Amount by Year**
<a id="section5"></a>

In [None]:
Publisher_Year = clean_df.groupby('Year')['Publisher'].count().reset_index('Year')
Publisher_Year

In [None]:
px.line(Publisher_Year, x = 'Year', y = 'Publisher',
       title = 'Publisher Amount by Year')

***Conclusion:***

* Along with the VG market increases, more publishers show up after 2000, while those old players such as EA, Activision and Ubisoft are long-living and seems cannot be beaten.

**Chapter 3.6: Global and Regional Sales by Year**
<a id="section6"></a>

In [None]:
Sales_Year = clean_df.groupby('Year')['NA_Sales','EU_Sales','JP_Sales', 'Other_Sales', 'Global_Sales'].sum().reset_index('Year')
#Sales_Year['Global_Sales'] = 0
Sales_Year

In [None]:
fig = go.Figure()

fig.add_trace(go.Bar(x = Sales_Year['Year'], 
                     y = Sales_Year['NA_Sales'],
                     name = 'Sales in North America',
                     marker = {'color':Sales_Year['NA_Sales'], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Year['Year'], 
                     y = Sales_Year['EU_Sales'],
                     name = 'Sales in Europe',
                     marker = {'color':Sales_Year['EU_Sales'], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Year['Year'], 
                     y = Sales_Year['JP_Sales'],
                     name = 'Sales in Japan',
                     marker = {'color':Sales_Year['JP_Sales'], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Year['Year'], 
                     y = Sales_Year['Other_Sales'],
                     name = 'Sales in Other parts',
                     marker = {'color':Sales_Year['Other_Sales'], 'colorscale':'Viridis' }))


fig.add_trace(go.Bar(x = Sales_Year['Year'], 
                     y = Sales_Year['NA_Sales'],
                     name = 'Sales in North America',
                     marker = {'color':'cornsilk' }))

fig.add_trace(go.Bar(x = Sales_Year['Year'], 
                     y = Sales_Year['EU_Sales'],
                     name = 'Sales in Europe',
                     marker = {'color':'burlywood' }))

fig.add_trace(go.Bar(x = Sales_Year['Year'], 
                     y = Sales_Year['JP_Sales'],
                     name = 'Sales in Japan',
                     marker = {'color':'coral' }))

fig.add_trace(go.Bar(x = Sales_Year['Year'], 
                     y = Sales_Year['Other_Sales'],
                     name = 'Sales in Other parts',
                     marker = {'color':'khaki' }))


                               
fig.update_layout(updatemenus = 
                 [dict(type = 'buttons',
                     direction = 'right',
                     active = 0,
                     x = 1,
                     y = 1.2,
                     buttons = list([
                         dict(label = 'North America',
                              method = 'update',
                              args = [{'visible': [True,False,False,False,False,False,False,False]},
                                     {'title':'Sales in North America'}]),
                         dict(label = 'Europe',
                              method = 'update',
                              args = [{'visible': [False,True,False,False,False,False,False,False]},
                                     {'title':'Sales in Europe'}]),
                         dict(label = 'Japan',
                              method = 'update',
                              args = [{'visible': [False,False,True,False,False,False,False,False]},
                                     {'title':'Sales in Japan'}]),
                         dict(label = 'Other',
                              method = 'update',
                              args = [{'visible': [False,False,False,True,False,False,False,False]},
                                     {'title':'Sales in Other Parts'}]),
                         dict(label = 'Global',
                              method = 'update',
                              args = [{'visible': [False,False,False,False,True,True,True,True]},
                                     {'title':'Global Sales'}])
                                   ])
                     )
                 ])



fig.update_layout(title_text = 'Global and Regional Sales by Year', barmode = 'stack')

***Conclusion:***
* The trend of the market sales globally is quite the same as the game amount.
* North America, Europe and Japan are the top three game market in the world.
* Looking at the devvelopment histroy of VG, it started in North America and Europe and then expanded to Japan in around 1985. People in other parts of the world started playing VG only after 1996. It is an interesting saying that the development of VG is following the PC-related industry.

**Chapter 3.7: Sales Analysis per Region**
<a id="section7"></a>

In [None]:
NA_Sales_all = clean_df['NA_Sales'].sum()
EU_Sales_all = clean_df['EU_Sales'].sum()
JP_Sales_all = clean_df['JP_Sales'].sum()
Other_Sales_all = clean_df['Other_Sales'].sum()

In [None]:
All_Sales = pd.Series([NA_Sales_all, EU_Sales_all, JP_Sales_all, Other_Sales_all], index = ['NA','EU', 'JP', 'Other'])
All_Sales = pd.DataFrame(All_Sales).reset_index()
All_Sales.columns = ['Region','Sales']
All_Sales

In [None]:
px.pie(All_Sales, values = 'Sales', names = 'Region')

***Conclusion:***
* North America is the definitely biggest market of VG which occupies almost 50% of the market.

***Chapter 4: Market's Preference***

**Chapter 4.1: The most popular Genre, regional and global, based on sales**
<a id="section8"></a>

In [None]:
clean_df.head()

In [None]:
Sales_Genre = clean_df.groupby('Genre')['NA_Sales','EU_Sales','JP_Sales', 'Other_Sales', 'Global_Sales'].sum().reset_index('Genre')
Sales_Genre = Sales_Genre.sort_values('Global_Sales', ascending = False)
#Sales_Genre['Global_Sales'] = 0
Sales_Genre

In [None]:
Sales_Genre_NA_TOP = Sales_Genre.sort_values('NA_Sales', ascending = False)
Sales_Genre_EU_TOP = Sales_Genre.sort_values('EU_Sales', ascending = False)
Sales_Genre_JP_TOP = Sales_Genre.sort_values('JP_Sales', ascending = False)
Sales_Genre_Other_TOP = Sales_Genre.sort_values('Other_Sales', ascending = False)

In [None]:
fig = go.Figure()

fig.add_trace(go.Bar(x = Sales_Genre_NA_TOP['Genre'], 
                     y = Sales_Genre_NA_TOP['NA_Sales'],
                     name = 'Most Popular Genre in North America',
                     marker = {'color':Sales_Genre_NA_TOP['NA_Sales'], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Genre_EU_TOP['Genre'], 
                     y = Sales_Genre_EU_TOP['EU_Sales'],
                     name = 'Most Popular Genre in Europe',
                     marker = {'color':Sales_Genre_EU_TOP['EU_Sales'], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Genre_JP_TOP['Genre'], 
                     y = Sales_Genre_JP_TOP['JP_Sales'],
                     name = 'Most Popular Genre in Japan',
                     marker = {'color':Sales_Genre_JP_TOP['JP_Sales'], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Genre_Other_TOP['Genre'], 
                     y = Sales_Genre_Other_TOP['Other_Sales'],
                     name = 'Most Popular Genre in Other Region',
                     marker = {'color':Sales_Genre_Other_TOP['Other_Sales'], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Genre['Genre'], 
                     y = Sales_Genre['NA_Sales'],
                     name = 'Genre Sales in North America',
                     marker = {'color':'coral' }))

fig.add_trace(go.Bar(x = Sales_Genre['Genre'], 
                     y = Sales_Genre['EU_Sales'],
                     name = 'Genre Sales in Europe',
                     marker = {'color':'khaki' }))

fig.add_trace(go.Bar(x = Sales_Genre['Genre'], 
                     y = Sales_Genre['JP_Sales'],
                     name = 'Genre Sales in Japan',
                     marker = {'color':'cyan' }))

fig.add_trace(go.Bar(x = Sales_Genre['Genre'], 
                     y = Sales_Genre['Other_Sales'],
                     name = 'Genre Sales in Other Region',
                     marker = {'color':'forestgreen' }))
                               
fig.update_layout(updatemenus = 
                 [dict(type = 'buttons',
                     direction = 'right',
                     active = 0,
                     x = 1,
                     y = 1.2,
                     buttons = list([
                         dict(label = 'North America',
                              method = 'update',
                              args = [{'visible': [True,False,False,False,False,False,False,False]},
                                     {'title':'Sales in North America'}]),
                         dict(label = 'Europe',
                              method = 'update',
                              args = [{'visible': [False,True,False,False,False,False,False,False]},
                                     {'title':'Sales in Europe'}]),
                         dict(label = 'Japan',
                              method = 'update',
                              args = [{'visible': [False,False,True,False,False,False,False,False]},
                                     {'title':'Sales in Japan'}]),
                         dict(label = 'Other',
                              method = 'update',
                              args = [{'visible': [False,False,False,True,False,False,False,False]},
                                     {'title':'Sales in Other Parts'}]),
                         dict(label = 'Global',
                              method = 'update',
                              args = [{'visible': [False,False,False,False,True,True,True,True]},
                                     {'title':'Global Sales'}])
                                   ])
                     )
                 ])



fig.update_layout(title_text = 'Most Popular Genre by Sales',barmode = 'stack')

***Conclusion:***

* Action, Sports and Shooter games are the most popular genres globally, and the same for North America, Europe and Other parts of the world as well.
* It is quite interesting that Japanese people love the Role-Playing games most, so that the game sales of Role-Playing game is more than twice as the the second-rank Action games in this region.

**Chapter 4.2: The most popular Publisher, regional and global, based on sales**
<a id="section9"></a>

In [None]:
clean_df.head()

In [None]:
Sales_Pub = clean_df.groupby('Publisher')['NA_Sales','EU_Sales','JP_Sales', 'Other_Sales', 'Global_Sales'].sum().reset_index('Publisher')
Sales_Pub = Sales_Pub.sort_values('Global_Sales', ascending = False)
#Sales_Pub['Global_Sales'] = 0
Sales_Pub

In [None]:
Sales_Pub_NA_TOP = Sales_Pub.sort_values('NA_Sales', ascending = False)
Sales_Pub_EU_TOP = Sales_Pub.sort_values('EU_Sales', ascending = False)
Sales_Pub_JP_TOP = Sales_Pub.sort_values('JP_Sales', ascending = False)
Sales_Pub_Other_TOP = Sales_Pub.sort_values('Other_Sales', ascending = False)
Sales_Pub_Global_TOP = Sales_Pub.sort_values('Global_Sales', ascending = False)

In [None]:
Sales_Pub_Global_10 = Sales_Pub_Global_TOP['Publisher'][:10].tolist()
Sales_Pub_Global_10

Sales_Pub_Global_TOP[Sales_Pub_Global_TOP.Publisher.isin(Sales_Pub_Global_10)]

In [None]:
#fig = go.Figure()
#fig.add_trace(go.Bar(x = Sales_Pub_NA_TOP[Sales_Pub_NA_TOP['Publisher'].isin(Sales_Pub_Global_10)]['Publisher'], 
                     #y = Sales_Pub_NA_TOP[Sales_Pub_NA_TOP['Publisher'].isin(Sales_Pub_Global_10)]['NA_Sales'],
                     #name = 'Top 10 Most Popular Publisher in North America',
                     #marker = {'color':'coral'}))

In [None]:
fig = go.Figure()

fig.add_trace(go.Bar(x = Sales_Pub_NA_TOP['Publisher'][:10], 
                     y = Sales_Pub_NA_TOP['NA_Sales'][:10],
                     name = 'Top 10 Most Popular Publisher in North America',
                     marker = {'color':Sales_Pub_NA_TOP['NA_Sales'][:10], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Pub_EU_TOP['Publisher'][:10], 
                     y = Sales_Pub_EU_TOP['EU_Sales'][:10],
                     name = 'Top 10 Most Popular Publisher in Europe',
                     marker = {'color':Sales_Pub_EU_TOP['EU_Sales'][:10], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Pub_JP_TOP['Publisher'][:10], 
                     y = Sales_Pub_JP_TOP['JP_Sales'][:10],
                     name = 'Top 10 Most Popular Publisher in Japan',
                     marker = {'color':Sales_Pub_JP_TOP['JP_Sales'][:10], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Pub_Other_TOP['Publisher'][:10], 
                     y = Sales_Pub_Other_TOP['Other_Sales'][:10],
                     name = 'Top 10 Most Popular Publisher in Other Region',
                     marker = {'color':Sales_Pub_Other_TOP['Other_Sales'][:10], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Pub_Global_TOP['Publisher'][:10], 
                     y = Sales_Pub_Global_TOP['Global_Sales'][:10],
                     name = 'Top 10 Most Popular Publisher Globally',
                     marker = {'color':Sales_Pub_Global_TOP['Global_Sales'][:10], 'colorscale':'Viridis' }))



fig.add_trace(go.Bar(x = Sales_Pub_NA_TOP[Sales_Pub_NA_TOP['Publisher'].isin(Sales_Pub_Global_10)]['Publisher'], 
                     y = Sales_Pub_NA_TOP[Sales_Pub_NA_TOP['Publisher'].isin(Sales_Pub_Global_10)]['NA_Sales'],
                     name = 'Top 10 Most Popular Publisher in North America',
                     marker = {'color':'coral'}))

fig.add_trace(go.Bar(x = Sales_Pub_EU_TOP[Sales_Pub_EU_TOP['Publisher'].isin(Sales_Pub_Global_10)]['Publisher'], 
                     y = Sales_Pub_EU_TOP[Sales_Pub_EU_TOP['Publisher'].isin(Sales_Pub_Global_10)]['EU_Sales'],
                     name = 'Top 10 Most Popular Publisher in Europe',
                     marker = {'color':'khaki'}))

fig.add_trace(go.Bar(x = Sales_Pub_JP_TOP[Sales_Pub_JP_TOP['Publisher'].isin(Sales_Pub_Global_10)]['Publisher'], 
                     y = Sales_Pub_JP_TOP[Sales_Pub_JP_TOP['Publisher'].isin(Sales_Pub_Global_10)]['JP_Sales'],
                     name = 'Top 10 Most Popular Publisher in Japan',
                     marker = {'color':'cyan'}))

fig.add_trace(go.Bar(x = Sales_Pub_Other_TOP[Sales_Pub_Other_TOP['Publisher'].isin(Sales_Pub_Global_10)]['Publisher'], 
                     y = Sales_Pub_Other_TOP[Sales_Pub_Other_TOP['Publisher'].isin(Sales_Pub_Global_10)]['Other_Sales'],
                     name = 'Top 10 Most Popular Publisher in Other Regions',
                     marker = {'color':'forestgreen'}))


fig.update_layout(updatemenus = 
                 [dict(type = 'buttons',
                     direction = 'right',
                     active = 0,
                     x = 1,
                     y = 1.2,
                     buttons = list([
                         dict(label = 'Top 10 North America',
                              method = 'update',
                              args = [{'visible': [True,False,False,False,False,False,False,False,False]},
                                     {'title':'Sales in North America'}]),
                         dict(label = 'Top 10 Europe',
                              method = 'update',
                              args = [{'visible': [False,True,False,False,False,False,False,False,False]},
                                     {'title':'Sales in Europe'}]),
                         dict(label = 'Top 10 Japan',
                              method = 'update',
                              args = [{'visible': [False,False,True,False,False,False,False,False,False]},
                                     {'title':'Sales in Japan'}]),
                         dict(label = 'Top 10 Other regions',
                              method = 'update',
                              args = [{'visible': [False,False,False,True,False,False,False,False,False]},
                                     {'title':'Sales in Other Parts'}]),
                         dict(label = 'Top 10 Globally',
                              method = 'update',
                              args = [{'visible': [False,False,False,False,True,False,False,False,False]},
                                     {'title':'Global Sales'}]),
                         dict(label = 'Top 10 Globally - Regional Sections',
                              method = 'update',
                              args = [{'visible': [False,False,False,False,False,True,True,True,True]},
                                     {'title':'Global Sales - Regional Sections'}])
                                   ])
                     )
                 ])



fig.update_layout(title_text = 'Most Popular Publishers by Sales',barmode = 'stack')

***Conclusion:***

* Nintendo is the best saler globally although it did not release as many games as the second EA and third Activision.
* Old players such as EA, Activision, Sony and Ubisoft occupy North America and Europe market mainly. Japan market prefers those Japanese game producer more. Nintendo is an exception in this topic that successfully interrupted in Western market and succeed globally. 

**Chapter 4.3: The most popular Platform, regional and global, based on sales**
<a id="section10"></a>

In [None]:
clean_df.head()

In [None]:
Sales_Plat = clean_df.groupby('Platform')['NA_Sales','EU_Sales','JP_Sales', 'Other_Sales', 'Global_Sales'].sum().reset_index('Platform')
Sales_Plat = Sales_Plat.sort_values('Global_Sales', ascending = False)
#Sales_Plat['Global_Sales'] = 0
Sales_Plat

In [None]:
Sales_Plat_NA_TOP = Sales_Plat.sort_values('NA_Sales', ascending = False)
Sales_Plat_EU_TOP = Sales_Plat.sort_values('EU_Sales', ascending = False)
Sales_Plat_JP_TOP = Sales_Plat.sort_values('JP_Sales', ascending = False)
Sales_Plat_Other_TOP = Sales_Plat.sort_values('Other_Sales', ascending = False)

In [None]:
fig = go.Figure()

fig.add_trace(go.Bar(x = Sales_Plat_NA_TOP['Platform'], 
                     y = Sales_Plat_NA_TOP['NA_Sales'],
                     name = 'Most Popular Platform in North America',
                     marker = {'color':Sales_Plat_NA_TOP['NA_Sales'], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Plat_EU_TOP['Platform'], 
                     y = Sales_Plat_EU_TOP['EU_Sales'],
                     name = 'Most Popular Platform in Europe',
                     marker = {'color':Sales_Plat_EU_TOP['EU_Sales'], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Plat_JP_TOP['Platform'], 
                     y = Sales_Plat_JP_TOP['JP_Sales'],
                     name = 'Most Popular Platform in Japan',
                     marker = {'color':Sales_Plat_JP_TOP['JP_Sales'], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Plat_Other_TOP['Platform'], 
                     y = Sales_Plat_Other_TOP['Other_Sales'],
                     name = 'Most Popular Platform in Other Region',
                     marker = {'color':Sales_Plat_Other_TOP['Other_Sales'], 'colorscale':'Viridis' }))

fig.add_trace(go.Bar(x = Sales_Plat['Platform'], 
                     y = Sales_Plat['NA_Sales'],
                     name = 'Platform Sales in North America',
                     marker = {'color':'coral' }))

fig.add_trace(go.Bar(x = Sales_Plat['Platform'], 
                     y = Sales_Plat['EU_Sales'],
                     name = 'Platform Sales in Europe',
                     marker = {'color':'khaki' }))

fig.add_trace(go.Bar(x = Sales_Plat['Platform'], 
                     y = Sales_Plat['JP_Sales'],
                     name = 'Platform Sales in Japan',
                     marker = {'color':'cyan' }))

fig.add_trace(go.Bar(x = Sales_Plat['Platform'], 
                     y = Sales_Plat['Other_Sales'],
                     name = 'Platform Sales in Other Region',
                     marker = {'color':'forestgreen' }))
                               
fig.update_layout(updatemenus = 
                 [dict(type = 'buttons',
                     direction = 'right',
                     active = 0,
                     x = 1,
                     y = 1.2,
                     buttons = list([
                         dict(label = 'North America',
                              method = 'update',
                              args = [{'visible': [True,False,False,False,False,False,False,False]},
                                     {'title':'Platform Sales in North America'}]),
                         dict(label = 'Europe',
                              method = 'update',
                              args = [{'visible': [False,True,False,False,False,False,False,False]},
                                     {'title':'Platform Sales in Europe'}]),
                         dict(label = 'Japan',
                              method = 'update',
                              args = [{'visible': [False,False,True,False,False,False,False,False]},
                                     {'title':'Platform Sales in Japan'}]),
                         dict(label = 'Other',
                              method = 'update',
                              args = [{'visible': [False,False,False,True,False,False,False,False]},
                                     {'title':'Platform Sales in Other Parts'}]),
                         dict(label = 'Global',
                              method = 'update',
                              args = [{'visible': [False,False,False,False,True,True,True,True]},
                                     {'title':'Platform Global Sales'}])
                                   ])
                     )
                 ])



fig.update_layout(title_text = 'Most Popular Platform by Sales',barmode = 'stack')

***Conclusion:***
* Globally speaking, Games on PS2, PS3 and Xbox360 are most popular. However, situation differs slightly in different region. 
* Xbox360 takes the top in North America market, PS series are the top popular platform globally. 
* The most popular paltform are all an computer like station except the DS in Japan which is a portable machine. This can be related to the Japanese preference of Role-Playing games, which concerns more about the story rather than the 

**Chapter 4.4: Publisher's preference in terms of Genre**
<a id="section11"></a>

In [None]:
clean_df.head()

In [None]:
Genre_Publisher = clean_df['Publisher'].groupby(clean_df['Genre']).value_counts().reset_index('Genre')
Genre_Publisher.columns = ['Genre', 'Amount']
Genre_Publisher.reset_index(inplace = True)
#Genre_Publisher.sort_values(by = 'Amount', ascending = False, inplace = True)
Genre_Publisher

In [None]:
Sales_Pub_Global_20 = Sales_Pub_Global_TOP['Publisher'][:20].tolist()
Sales_Pub_Global_20

In [None]:
Genre_Publisher_20 = Genre_Publisher[Genre_Publisher['Publisher'].isin(Sales_Pub_Global_20)]

Genre_Publisher_top = Genre_Publisher_20.groupby('Publisher')['Amount'].sum().reset_index().sort_values('Amount', ascending = False)

Publisher_top = Genre_Publisher_top['Publisher'].tolist()

Genre_Publisher_20 = Genre_Publisher_20.set_index('Publisher').T[Publisher_top].T
Genre_Publisher_20

In [None]:
px.bar(Genre_Publisher_20,
       x = Genre_Publisher_20.index,
       y = 'Amount',
       color = 'Genre',
       title = 'Preference of Publisher on Genre')

***Conclusion:***
* The figure shows that for those top publishers, Action, Sports and Shooter games are their most preferred genre.
* EA produces more sports game which is in line with their popularity, the same as the action games for Activision.
* In terms of the Role-Playing games, Japanese publishers produce most of them, within which Namco Bandai is the top one.

***Conclusion after EDA：***

* This set of EDA shows that the VG amount increases fastly after 1995 and reaches top in 2008. Meanwile, between 2003 and 2004, there is a decreas and after 2011, the game amount drops heavily. The reason behind these two drops is interesting because game amount should imply the change of the VG industry.
* North America, Europe and Japan are the biggest VG markets. EDA shows that the distribution of the VG starts from North America to Europe, Japan and then all over the world. This might be in the path of the PC development.
* In early stage (before 2002) Sport Game is very popular but Action Game shows up afterwards. Even during those droping years (2011 - 2017) Action Game is still very popular comparing to other genres.The reason for this behavior deserves a discovery as well. (May be the development of the performance of game platforms that brings more possibility for game designers.)
* EA, Activision and Ubisoft are the giants through the development history of VG industry in terms of both sales and game amount. However, Japanese publisher Nintendo comes top of sales with fewer games. From the EDA we can see that Nintendo successfully interupt into the western market. How Nintendo succeed here will be an intersting topic for further study.
* Gamers in North America and Europe prefer Sports and Action Games while Japanese gamers loves Role-Playing Games more. This may also can related to the preference of game platform that portable machine is more popular in Japan and those big ones more popular in North America and Europe. This might be that Role-playing games required less machine performance than Action and Sport games.

**Here comes to the end of my first EDA.**

**I realize that the EDA is combined with *data analysis* and *data interpretation*, both are very important.**

**Please leave your comments below and many thanks in advance.**