# <p style="Black; text-decoration:underline; font-family:serif; text-align:center; font-size:30px;">Video Game Sales</p>

**Analyze sales data from more than 16,500 games.**

Description

> This dataset contains a list of video games with sales greater than 100,000 copies. It was generated by a scrape of vgchartz.com.

Fields include

* Name - The games name

* Platform - Platform of the games release (i.e. PC,PS4, etc.)

* Year - Year of the game's release

* Genre - Genre of the game

* Publisher - Publisher of the game

* NA_Sales - Sales in North America (in millions)

* EU_Sales - Sales in Europe (in millions)

* JP_Sales - Sales in Japan (in millions)

* Other_Sales - Sales in the rest of the world (in millions)

* Global_Sales - Total worldwide sales.

> The script to scrape the data is available at https://github.com/GregorUT/vgchartzScrape.
It is based on BeautifulSoup using Python.
There are 16,598 records. 2 records were dropped due to incomplete information.

# <p style="color:orange; font-size:24px;">1. Libraries used</p>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

## <p style="color:orange; font-size:24px;">1.1 Import Dataset</p>

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename));
        
Data_vg = pd.read_csv("../input/videogamesales/vgsales.csv");
Data_vg

**This data excludes *invalid values / NaN* in the *Year* column and 4 records whose three of them it was from 2017 and one from 2020.**

In [None]:
Data_vg.dropna(inplace=True)
Data_vg.drop(columns="Rank",inplace=True)
Data_vg = Data_vg[Data_vg["Year"]<2017.0]
Data_vg

## <p style="color:orange; font-size:24px;">1.2 Correlation in Data</p>

In [None]:
matrix = Data_vg.corr()
plt.figure(figsize=(8,6))
#plot heat map
g=sns.heatmap(matrix,annot=True,cmap="YlGn_r")

In [None]:
# Summary
Data_vg.describe()

In [None]:
sns.pairplot(Data_vg);

As seen by the pairplot:
> NA sales contributes almost linearly in Global Sales

> EU is scattered graph while JP is a curve

# <p style="color:orange; font-size:24px;">2. Quanitative analysis</p>

## **Global Sales of video Games**

In [None]:
df = Data_vg.groupby(by  = 'Year').sum()
df.plot.line(figsize=(10,10), grid="on");
plt.ylabel("Sales in million $");

**Sales increasing over time, clearly shows the raise in interest in video games.**

## **Genre wise distribution of games**

In [None]:
df = pd.DataFrame(Data_vg['Genre'].value_counts(normalize=True))
plot = df.plot.pie(subplots=True, autopct='%1.1f%%', figsize=(10, 10))

**Action Games are the most prefered games of all time**

# <p style="color:orange; font-size:24px;">3. Qualitative analysis</p> 

## **All rounder Platforms used for most Genres**

In [None]:
df1 = pd.DataFrame(Data_vg.groupby('Platform')['Genre'].nunique())
df1.sort_values(by=['Genre'], inplace=True)
df1[df1["Genre"]>11]

## **Most frequent games**

In [None]:
df = pd.DataFrame(Data_vg["Name"].value_counts().head(5))
df

## <p style="color:orange; font-size:24px;">4. Most games launch in year</p>

In [None]:
title = {'family': 'serif',
        'color':  'darkblue',
        'weight': 'normal',
        'size': 16,
        }
sub_head = {'family': 'monospace',
        'color':  'darkblue',
        'size': 16,
        'weight': 'demibold',
        }

In [None]:
dt = pd.DataFrame(Data_vg['Year'].value_counts()).sort_index().tail(15)
dt = list(dt["Year"])

In [None]:
plt.figure(figsize=(16,8))
sns.countplot("Year",data=Data_vg)
plt.xlim([21.5,31.5])
plt.xlabel("no. of games launched")
li=21.9
for i in range(10):
    plt.text(li, dt[i], dt[i])
    li+=1
plt.title("10 Most frequent launch years",fontdict=title)
plt.ylabel("Year of release");

In [None]:
dft = Data_vg[Data_vg["Year"]!=1981.0]
dt = pd.DataFrame(dft['Year'].value_counts()).sort_index().head(10)
dt = list(dt["Year"])

In [None]:
plt.figure(figsize=(16,8))
sns.countplot("Year",data=dft)
plt.xlim([-0.5,9.5])
plt.ylim([0,50])
li=-0.1
for i in range(10):
    plt.text(li, dt[i], '['+ str(dt[i]) + ']' )
    li+=1
plt.xlabel("no. of games launched")
plt.title("10 Least frequent launch years",fontdict=sub_head)
plt.ylabel("Year of release");

# <p style="color:orange; font-size:24px;">5. Global Sales</p>

In [None]:
Data_vg[Data_vg["Global_Sales"]<1.5].hist(column="Global_Sales",bins = 20, color= 'orange' )
plt.xlabel("Sales in million less than 1.5 million $")
plt.title("Worldwide Sales",fontdict=title);

## Top 10 publishers globally(gross sales)

In [None]:
df1 = pd.DataFrame(Data_vg.groupby('Publisher')['Global_Sales'].sum())
df1.sort_values(by=['Global_Sales'], inplace=True)
df1 = df1.tail(10)
plot = df1.plot.pie(y='Global_Sales', autopct='%1.1f%%', figsize=(10, 10))
plt.title("Publisher market share",fontdict=title);

In [None]:
df = Data_vg.drop(columns=["Year","Global_Sales"]).head(10)
ax = df.plot.bar(x="Name",stacked=True,rot=85)
plt.title("Top 10 Games globally",fontdict=title);

# <p style="color:orange; font-size:24px;">6. Best selling Games area wise</p>

In [None]:
df1 = pd.DataFrame(Data_vg.groupby('Name')['NA_Sales'].sum())
df1.sort_values(by=['NA_Sales'], inplace=True)
df1 = df1.tail(5)
df1.plot.pie(y='NA_Sales', autopct='%1.1f%%', figsize=(6, 6))
plt.title("Best selling games in North America", fontdict=title)

df1 = pd.DataFrame(Data_vg.groupby('Name')['EU_Sales'].sum())
df1.sort_values(by=['EU_Sales'], inplace=True)
df1 = df1.tail(5)
df1.plot.pie(y='EU_Sales', autopct='%1.1f%%', figsize=(6, 6))
plt.title("Best selling games in Europe", fontdict=title)

df1 = pd.DataFrame(Data_vg.groupby('Name')['JP_Sales'].sum())
df1.sort_values(by=['JP_Sales'], inplace=True)
df1 = df1.tail(5)
df1.plot.pie(y='JP_Sales', autopct='%1.1f%%', figsize=(6, 6))
plt.title("Best selling games in Japan", fontdict=title);

$$\textrm{If you like the work please upvote :-) }$$
$$\textrm{Comments are Welcome}$$