# Analysis on Video Game Sales since 1980

**First, lets take a look at the first five rows data**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
import seaborn as sns
warnings.filterwarnings("ignore")

vg_sales = pd.read_csv('../input/vgsales.csv')
print(vg_sales.head())

In [None]:
years = [2016, 2017, 2020]
total_sales_group = vg_sales.groupby(['Year']).sum().drop(years)
average_sales_group = vg_sales.groupby(['Year']).mean().drop(years)
count_sales_group = vg_sales.replace(0, np.nan).groupby(['Year']).count().drop(years)

In [None]:
def lineplot(df, title = 'Sales by Year', ylabel ='Sales' , legendsize = 10, legendloc = 'upper left'):

    year = df.index.values
    na = df.NA_Sales
    eu = df.EU_Sales
    jp = df.JP_Sales
    other = df.Other_Sales
    global_ = df.Global_Sales
    
    if df is count_sales_group:
        region_list = [na, eu, jp, other]
        columns = ['NA', 'EU', 'JP', 'OTHER']
    else:
        region_list = [na, eu, jp, other, global_]
        columns = ['NA', 'EU', 'JP', 'OTHER', 'WORLD WIDE']

    for i, region in enumerate(region_list):
        plt.plot(year, region, label = columns[i])

    plt.ylabel(ylabel)
    plt.xlabel('Year')
    plt.title(title)
    plt.legend(loc=legendloc, prop = {'size':legendsize})
    plt.show()
    plt.clf()

    for i, region in enumerate(region_list):
        plt.plot(year, region, label = columns[i])

    plt.yscale('log')
    plt.ylabel(ylabel)
    plt.xlabel('Year')
    plt.title(title + '(Log)')
    plt.legend(loc=legendloc, prop = {'size':legendsize})
    plt.show()
    plt.clf()

**I am interested in seeing how sales overall have done since 1980. I have included both a linear scale and a logarithmic scale.**

In [None]:
lineplot(total_sales_group, title = 'Sales by Year', ylabel ='Sales (In Millions)', legendsize = 8)

**In the top graph, We can see that sales have been continually increasing. There was a peak in 2008, which was around the start of the recession, which affected the video game industry as severely as many other industries.**

**In the bottom graph, we get a good look into how sales did between the regions. We can see that The United States has been the leader in video game sales for the most part. Japan looks like it was able to surpass The United States briefly between 1992 and 1996. I am curious to see what games were so popular between those years in Japan, so we'll take a quick look at that.**

**Below are the top 20 games where sales in Japan outperformed The United States, during this time period.**

In [None]:
japan1992_1996 = vg_sales[['Name', 'JP_Sales', 'NA_Sales']][(vg_sales.Year>=1992) & (vg_sales.Year<=1996) & (vg_sales.JP_Sales > vg_sales.NA_Sales)].sort(columns = 'JP_Sales', ascending = False)
print(japan1992_1996.head(20))

In [None]:
lineplot(average_sales_group, title = 'Average Revenue per Game per Year', ylabel ='Sales (In Millions)', legendsize = 8, legendloc = 'upper right')

**Another important trend to note, given by the above graphs is that average revenue per game produced had decreased dramatically after the late 80's. This could possibly have been due to a decrease in cost to produce this technology, now that it had been around for some time.**

**So.... If sales overall increased, and revenue per game decreased, it must mean that more games were being produced, which is something that we'll look into next.**

In [None]:
lineplot(count_sales_group, title = 'Of All Games Produced Did Every Region Sell Every Game?\n\nGames for Sale by Region by Year', ylabel ='Count', legendsize = 8, legendloc = 'upper left')

## Final Thoughts

**From our graphs above, we can see that the industry was booming in the late 90's and beyond. It seems as though the industry had become very effecient in production, and was able to produce more volume, which resulted in higher revenue.**

**Japans revenue declines as we saw earlier started in the late 90's. As we can see from our graph, it seems that many games that were being introduced in other parts of the world were not being introduced in Japan, which could have resulted in the big revenue losses.**

### Fun Fact:
#### - What are top 20 highest grossing games since 1980?

In [None]:
Top_games = vg_sales[['Name', 'Year','Global_Sales']].sort(columns = 'Global_Sales', ascending = False)
print(Top_games.head(20))