# Vizualising video game sales

### Objective

To visualize the video game sales throughout the data set in order to recognise achievements and failures of publishing companies released content.

This is a visualization project to gather findings thoughout the data set on video game sales. Whilst completing the kernel, I have noticed that some things found in the data set are not matched to other websites records that I have double checked and resourced. So findings within the data set may not be completely accurate. This kernel is simply for my own curioisty (and a good exercise to build Python skill for data analysis).

#### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
vg_data = pd.read_csv("../input/videogamesales/vgsales.csv")

In [None]:
vg_data.head()

#### Cleaning the data

In [None]:
# Any record with NA in Year column will be set to the average value within Year column
vg_data.Year = vg_data.Year.fillna(vg_data.Year.mean())

In [None]:
# Change year column to integer data type
# Check column data types
vg_data['Year'] = vg_data['Year'].astype(int)
vg_data.dtypes

In [None]:
# Sum of null values within data set and to see what column they belong to.
vg_data.isnull().sum()

I will leave in the records with Null values. It has not completely destroyed the whole record where they are located so I don't think it's necessary to completely ignore the whole record of the 58 records with a Null value.

No more cleansing necessary.

## Visualizing Nintendo

In [None]:
# Gathering Nintendo records
nintendo = vg_data.loc[vg_data['Publisher'] == 'Nintendo']
nintendo.head()

### Nintendo platform visualization

In [None]:
# Visualising value counts of Nintendo consoles
nintendo.Platform.value_counts().plot(kind='pie', figsize=(15, 7), autopct='%.2f')

plt.xlabel('Console')
plt.ylabel('Count of games in dataset')
plt.title('Popularity of Nintendo consoles by games in data')
plt.show()

Apart from Platform games such as, Mario Kart, Super smash bro's etc. Video games are released on multiple consoles. For e.g. Grand theft auto games are on Playstation 2 as well as Xbox 360, James Bond games are on Playstation as well as Nintendo.


In [None]:
# Total game sales per console
nintendo.groupby('Platform').sum()

In [None]:
nintendo.drop(columns=['Rank', 'Year']).groupby('Platform').sum().plot(kind='bar', figsize=(15, 7))

plt.title('Nintendo total game sales per console')
plt.xlabel('Console')
plt.ylabel('Sales to $1m')
plt.show()

Nintendo Wii has been the most popular games console. The least popular console platforms have been GameCube and WiiU. This could potentially be down to the timing of release dates. When the GameCube had come out, Playstation 2 was out about 1 year before, with playstation still releasing a lot more games. You could also watch dvd's on it and play playstation 1 games on the console as they were still compatible. GameCube on the other hand had much smaller disc size compatibility and therefore couldn't give consumers what the playstation had given.

Nintendo must make up for this innovative defeat to get back into the gaming consciousness. Even today when people think of gaming, in tournaments or otherwise, Nintendo content is not the first thought of or played. Battle Royale's have taken over the gaming community since Fortnite's release which are mostly played on PC, Playstation or Xbox.

### Game genre with Nintendo

In [None]:
nintendo.Genre.value_counts().plot(kind='pie', figsize=(15, 7), autopct='%.2f')

plt.title('Popularity of game genres with Nintendo')
plt.xlabel('Genre')
plt.ylabel('Amount of video games')
plt.show()

Seems that the most popular games sold on Nintendo consoles are the Nintendo games themselves, for e.g. Super mario bros, Mario kart etc. This is a good strategy for Nintendo to have as it must generate more revenue than what they would have if these Platform games did not exsist. Without these games Nintendo could possibly fall behind competitors as Sony and Microsoft have managed to turn video game consoles into media centres. Video game consoles can now have Netflix, YouTube and other applications on there. Nintendo must do this also if it is to keep up. Or come up with a new strategy.

In [None]:
nin_rpy = nintendo.loc[nintendo['Genre']=='Role-Playing'].drop(columns=['Year', 'Publisher', 'Rank'])
nin_plt = nintendo.loc[nintendo['Genre']=='Platform'].drop(columns=['Year', 'Publisher', 'Rank'])
nin_ftg = nintendo.loc[nintendo['Genre']=='Fighting'].drop(columns=['Year', 'Publisher', 'Rank'])
nin_sht = nintendo.loc[nintendo['Genre']=='Shooter'].drop(columns=['Year', 'Publisher', 'Rank'])
nin_sim = nintendo.loc[nintendo['Genre']=='Simulation'].drop(columns=['Year', 'Publisher', 'Rank'])
nin_str = nintendo.loc[nintendo['Genre']=='Strategy'].drop(columns=['Year', 'Publisher', 'Rank'])
nin_adv = nintendo.loc[nintendo['Genre']=='Adventure'].drop(columns=['Year', 'Publisher', 'Rank'])
nin_rac = nintendo.loc[nintendo['Genre']=='Racing'].drop(columns=['Year', 'Publisher', 'Rank'])
nin_spt = nintendo.loc[nintendo['Genre']=='Sports'].drop(columns=['Year', 'Publisher', 'Rank'])
nin_puz = nintendo.loc[nintendo['Genre']=='Puzzle'].drop(columns=['Year', 'Publisher', 'Rank'])
nin_act = nintendo.loc[nintendo['Genre']=='Action'].drop(columns=['Year', 'Publisher', 'Rank'])
nin_msc = nintendo.loc[nintendo['Genre']=='Misc'].drop(columns=['Year', 'Publisher', 'Rank'])

nin_genre_list = [nin_rpy, nin_plt, nin_ftg, nin_sht, nin_sim, nin_str, nin_adv, nin_rac, nin_spt, nin_puz, nin_act, nin_msc]

In [None]:
nin_genre_list = [nin_rpy, nin_plt, nin_ftg, nin_sht, nin_sim, nin_sim, nin_adv, nin_rac, nin_spt, nin_spt, nin_puz, nin_act, nin_msc]
nin_gnr = ["Role-play", "Platform", "Fighting", "Shooting", "Simulation", "Strategy", "Adventure", "Racing", "Sports", "Puzzle", "Action", "Misc"]

# Plot for genre sales per Nintendo console
fig, axes = plt.subplots(nrows=4, ncols=3, figsize=(15,20))

for df, ax, i in zip(nin_genre_list, axes.ravel(), nin_gnr):
    df.groupby('Platform').sum().plot(kind='bar', ax=ax)
    ax.set_title(f"Plot for Nintendo {i} genre sales per console")
    plt.tight_layout()

Sport and puzzle games on Nintendo Wii have been a success it seems. Bringing in global revenues of over 160m dollars.

Fighting and adventure games have been the least successful with fighting games bringing in only under 14m dollars on Wii (the most popular console), and adventure selling globally just over 16m dollars with GameBoy being the most popular for this genre.

### Nintendo game sales

In [None]:
# Top 20 Nintendo game sales
nintendo.drop(columns=['Year', 'Rank'])[:20].set_index('Name').plot(kind='bar', figsize=(15, 7))

plt.title('Nintendo top 20 video game sales data')
plt.xlabel('Video Game')
plt.ylabel('Sales in millions')
plt.show()

## Visualizing Sony (Playstation)

In [None]:
vg_data.head()

In [None]:
# Gathering Sony records
sony = vg_data.loc[vg_data['Publisher'] == 'Sony Computer Entertainment']
sony.head()

### Sony console sales

In [None]:
# Visualising value counts of Sony consoles
sony.Platform.value_counts().plot(kind='pie', figsize=(15, 7), autopct='%.2f')

plt.xlabel('Console')
plt.ylabel('Count of games in dataset')
plt.title('Popularity of Sony consoles by game release')
plt.show()

In [None]:
# Total game sales per console
sony.groupby('Platform').sum()

In [None]:
sony.drop(columns=['Year', 'Rank']).groupby('Platform').sum().plot(kind='bar', figsize=(15, 7))

plt.title('Playstation total game sales per console')
plt.xlabel('Console')
plt.ylabel('Sales to $1m')
plt.show()

### Game Genre with Sony

In [None]:
pd.set_option('display.max_columns', 12)
vg_data.head()

In [None]:
sony = vg_data.loc[vg_data['Publisher'] == 'Sony Computer Entertainment']
sony.Genre.value_counts().plot(kind='pie', figsize=(15, 7), autopct='%.2f')

plt.title('Popularity of game genres with Playstation')
plt.xlabel('Genre')
plt.ylabel('Amount of video games')
plt.show()

In [None]:
sony_spt = sony.loc[sony['Genre']=='Sports'].drop(columns=['Year', 'Publisher', 'Rank'])
sony_msc = sony.loc[sony['Genre']=='Misc'].drop(columns=['Year', 'Publisher', 'Rank'])
sony_puz = sony.loc[sony['Genre']=='Puzzle'].drop(columns=['Year', 'Publisher', 'Rank'])
sony_sim = sony.loc[sony['Genre']=='Simulation'].drop(columns=['Year', 'Publisher', 'Rank'])
sony_sgy = sony.loc[sony['Genre']=='Strategy'].drop(columns=['Year', 'Publisher', 'Rank'])
sony_ftg = sony.loc[sony['Genre']=='Fighting'].drop(columns=['Year', 'Publisher', 'Rank'])
sony_adv = sony.loc[sony['Genre']=='Adventure'].drop(columns=['Year', 'Publisher', 'Rank'])
sony_rpy = sony.loc[sony['Genre']=='Role-Playing'].drop(columns=['Year', 'Publisher', 'Rank'])
sony_sht = sony.loc[sony['Genre']=='Shooter'].drop(columns=['Year', 'Publisher', 'Rank'])
sony_rac = sony.loc[sony['Genre']=='Racing'].drop(columns=['Year', 'Publisher', 'Rank'])
sony_plt = sony.loc[sony['Genre']=='Platform'].drop(columns=['Year', 'Publisher', 'Rank'])
sony_act = sony.loc[sony['Genre']=='Action'].drop(columns=['Year', 'Publisher', 'Rank'])

sony_genre_list = [sony_spt, sony_msc, sony_puz, sony_sim, sony_sgy, sony_ftg, sony_adv, sony_rpy, sony_sht, sony_rac, sony_plt, sony_act]

In [None]:
sony_genre_list = [sony_spt, sony_msc, sony_puz, sony_sim, sony_sgy, sony_ftg, sony_adv, sony_rpy, sony_sht, sony_rac, sony_plt, sony_act]
sony_gnr = ["Sports", "Misc", "Puzzles", "Simulation", "Strategy", "Fighting", "Adventure", "Role-playing", "Shooting", "Racing", "Platform", "Action"]

# Plot for genre sales per Sony console
fig, axes = plt.subplots(nrows=4, ncols=3, figsize=(15,20))

for df, ax, i in zip(sony_genre_list, axes.ravel(), sony_gnr):
    df.groupby('Platform').sum().plot(kind='bar', ax=ax)
    ax.set_title(f"Plot for Sony {i} genre sales per console")
    plt.tight_layout()

### Sony Playstation game sales

In [None]:
# Top 10 ps game sales
sony.drop(columns=['Year', 'Rank'])[:20].set_index('Name').plot(kind='bar', figsize=(15, 7))

plt.title('Sony Playstation top 20 video game sales data')
plt.xlabel('Video Game')
plt.ylabel('Sales in millions')
plt.show()

Let's take a look at Gran Turismo 4. It doesn't seem to have a great deal of sales in Europe.

In [None]:
sony.loc[sony['Name']== 'Gran Turismo 4']

"*Gran Turismo 4 received a "Double Platinum" sales award from the Entertainment and Leisure Software Publishers Association (ELSPA),[44] indicating sales of at least 600,000 copies in the United Kingdom.[45]*

*By March 2016, Gran Turismo 4 had shipped 1.27 million copies in Japan, 3.47 million in North America, 6.83 million in Europe, and 180,000 in Asia for a total of 11.76 million copies.[1] It is the third highest-selling game in the Gran Turismo franchise, ahead of Gran Turismo, but behind Gran Turismo 5 and Gran Turismo 3: A-Spec.[46]*"

Reference [here](http://en.wikipedia.org/wiki/Gran_Turismo_4#Sales)

It's always good to double check information.

## Yearly visualization

In [None]:
# Frequency of year values in data
vg_data.Year.value_counts().sort_index()

In [None]:
vg_data.Year.value_counts().sort_index().plot(kind='bar', figsize=(15, 7), grid=True, color=['orange', 'blue', 'cyan', 'green', 'yellow'])

plt.title('Dataset value counts of games per year')
plt.ylabel('Value Counts')
plt.xlabel('Year')
plt.show()

Across the data set it seems that the best years for game release were 2008 and 2009. From 2011 there has been a sharp decrease in development (this is most likely because of an incomplete data set). In 2017 there were still a lot of games being developed and in recent years the gaming industry has increased with the rise of Esports becoming a big industry.

## Gaming in old millennium and new millennium

In [None]:
all_years = vg_data.set_index('Year').sort_values(by='Year', ascending=True)
all_years.index = pd.to_datetime(all_years.index, format='%Y')
all_years.head()

In [None]:
import dateutil.parser

### Most popular consoles before 2000

In [None]:
b4_2000 = all_years[all_years.index < dateutil.parser.parse("2000-01-01")]

In [None]:
b4_2000.Platform.value_counts().plot(kind='bar', figsize=(15, 7))

plt.title('Most popular gaming colsoles from 1980 to 1999')
plt.xlabel('Console')
plt.ylabel('Popularity by record count')
plt.show()

The first Playstation was designed by Sony and Nintendo, and primarily competed with Nintendo 64 and Sega Saturn (ref: https://en.wikipedia.org/wiki/PlayStation_(console)).

So why was Playstation more popular? Or did it just have more games to sell? 

### Most popular consoles after 2000

In [None]:
aft_2000 = all_years[all_years.index >dateutil.parser.parse("2000-01-01")]

In [None]:
aft_2000.Platform.value_counts().plot(kind='bar', figsize=(15, 7))

plt.title('Most popular gaming colsoles from 2000 to 2020')
plt.xlabel('Console')
plt.ylabel('Popularity by record count')
plt.show()

Playstation 2 had games that revolutionized gaming. For example Grand theft auto 3 was the first open world game and was a big hit. Rockstar games (owned by Take Two entertainment) had made gaming cool for adults as well as kids. This was huge for gaming. 

### Sales from 1980 to 1999

In [None]:
b4_2000.drop(columns=['Rank']).groupby('Publisher').sum().sort_values(by='Global_Sales', ascending=False)[:20]

In [None]:
b4_2000.drop(columns=['Rank']).groupby('Publisher').sum().sort_values(by='Global_Sales', ascending=False)[:20].plot(kind='bar', figsize=(15, 7))

plt.title('Publishing company sales in total, between 2000 and 2020')
plt.xlabel('Publisher')
plt.ylabel('Sales in $1m\'s')
plt.show()

### Sales from 2000 to 2020

In [None]:
aft_2000.head()

In [None]:
aft_2000.groupby('Publisher').sum().sort_values(by='Global_Sales', ascending=False)[:20]

In [None]:
aft_2000.drop(columns=['Rank']).groupby('Publisher').sum().sort_values(by='Global_Sales', ascending=False)[:20].plot(kind='bar', figsize=(15, 7))

plt.title('Publishing company sales in total, between 2000 and 2020')
plt.xlabel('Publisher')
plt.ylabel('Sales in $1m\'s')
plt.show()

## Regional representation

### North America vs Europe sales

In [None]:
vg_data.head()

In [None]:
# Top 20 sales in NA compared to EU by publisher
na_eu = vg_data.groupby(['Publisher'])[['NA_Sales', 'EU_Sales']].sum().sort_values(by=['NA_Sales'], ascending=True)[-20:]
na_eu

In [None]:
na_eu.plot(kind='bar', figsize=(15, 7))

plt.xlabel('Publisher')
plt.ylabel('Sales in millions')
plt.title('North America compared to Europe sales in data')
plt.show()

### Japan vs Other

In [None]:
vg_data.head()

In [None]:
# Top 20 sales in JAPAN compared to OTHER by publisher
jp_ot = vg_data.groupby(['Publisher'])[['JP_Sales', 'Other_Sales']].sum().sort_values(by=['JP_Sales'], ascending=True)[-20:]
jp_ot

In [None]:
jp_ot.plot(kind='bar', figsize=(15, 7))

plt.xlabel('Publisher')
plt.ylabel('Sales in millions')
plt.title('Japan compared to Other sales in data')
plt.show()