# Business problems:
1. What are the most popular **game**, **genre**, **publisher**, and **platform** of all time?
2. What are the most popular **game** in **each region** (i.e. North America, Europe, and Japan)?
3. What is the trend of video games' global sales of all time?

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

This dataset contains a list of video games with sales greater than 100,000 copies. It was generated by a scrape of vgchartz.com. Fields include:
* Rank - Ranking of overall sales
* Name - The games name
* Platform - Platform of the games release (i.e. PC,PS4, etc.)
* Year - Year of the game's release
* Genre - Genre of the game
* Publisher - Publisher of the game
* NA_Sales - Sales in North America (in millions)
* EU_Sales - Sales in Europe (in millions)
* JP_Sales - Sales in Japan (in millions)
* Other_Sales - Sales in the rest of the world (in millions)
* Global_Sales - Total worldwide sales

In [None]:
df = pd.read_csv('/kaggle/input/videogamesales/vgsales.csv')
df.head()

In [None]:
print(df['Platform'].value_counts().head(5))
print(df['Platform'].nunique())
print(df['Publisher'].value_counts().head(5))
print(df['Publisher'].nunique())
print(df['Genre'].value_counts().head(5))
print(df['Genre'].nunique())

In [None]:
df.describe()

In [None]:
df.hist(figsize=(15,15))
plt.show()
print(df.shape)
print(df.columns)

In [None]:
df.info()

In [None]:
df.isnull().sum()

In [None]:
df = df.dropna()

In [None]:
df.info()

In [None]:
df.head(10)

In [None]:
import seaborn as sns
plt.figure(figsize=(10,10))
sns.heatmap(df.corr(), annot=True)
plt.show()

In [None]:
x = df.groupby('Genre')['Global_Sales'].sum().sort_values(ascending=False).head(5)
plt.figure(figsize=(15,15))
plt.bar(x.index,x)
plt.show()

In [None]:
df['Year'] = df['Year'].astype(int)

In [None]:
df.columns

# What are the most popular **game**, **genre**, **publisher**, and **platform** of all time?

In [None]:
regionals = ['NA_Sales', 'EU_Sales', 'JP_Sales']
aspects = ['Platform', 'Genre', 'Publisher']
for i in regionals:
    for j in aspects:
        k = df.groupby(j)[i].sum().sort_values(ascending=False).head(1)
        display(k)

In North America:
* Platform with highest sales is X360
* Genre with highest sales is Action
* Publisher with highest sales is Nintendo

In Europe:
* Platform with highest sales is PS3
* Genre with highest sales is Action
* Publisher with highest sales is Nintendo

In Japan:
* Platform with highest sales is DS
* Genre with highest sales is RPG
* Publisher with highest sales is Nintendo

Nintendo is sold highest in North America, Europe, and Japan. In North America and Europe, **Action** game are best seller, whereas in Japan, it is **RPG**. Lastly, these three regions are different in terms of **platform**. 

# What are the most popular **game** in **each region** (i.e. North America, Europe, and Japan)?

In [None]:
for i in regionals:
    display(df.sort_values(by=[i], ascending=False).head(3))

* Most popular games in North America are: Wii Sports, Super Mario Bros, and Duck Hunt
* Most popular games in EU are: Wii Sports, Mario Kart Wii, and Wii Sports Resort
* Most popular games in Japan are: Pokemon Red/Pokemon Blue, Pokemon Gold/Pokemon SIlver, and Super Mario Bros

# What is the trend of video games' global sales of all time?

In [None]:
y = df.groupby('Year')['Global_Sales'].sum()
plt.figure(figsize=(15,10))
plt.bar(y.index,y)
plt.xlabel('Year')
plt.ylabel('Global Sales')
plt.show
# The sales of video games peaked at 2008-2009