# Video Game Sales Analysis

---

We're gonna explore the videogame sales dataset with pandas, and visualize it with seaborn.
 
**LET'S MANIPULATE IT!**

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns # visualization

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

---

-I want to configure my graphics options:

In [None]:
sns.set(style="whitegrid")
sns.set_palette("husl")

In [None]:
# Let's start from importing
video_game = pd.read_csv("/kaggle/input/videogamesales/vgsales.csv",index_col="Rank")

---

In [None]:
# We should explore what we have in our data.
"""
We have some info about data's column names:
-Rank - Ranking of overall sales
-Name - The games name
-Platform - Platform of the games release (i.e. PC,PS4, etc.)
-Year - Year of the game's release
-Genre - Genre of the game
-Publisher - Publisher of the game
-NA_Sales - Sales in North America (in millions)
-EU_Sales - Sales in Europe (in millions)
-JP_Sales - Sales in Japan (in millions)
-Other_Sales - Sales in the rest of the world (in millions)
-Global_Sales - Total worldwide sales.
"""

-Info table gives us the datatypes and total counts. We can see that our data has 4 different sales and Global Sales as total.

In [None]:
video_game.info()

-Looks like we have some NaN values. We should drop these for more efficient analysis.

In [None]:
# Dropping nan values
video_game.dropna(how="any",inplace = True)
video_game.info()

-Let's see how many rows and cols we got

In [None]:
video_game.shape 

In [None]:
# I want to see the dataset's column names
video_game.columns

-Let's see first 10 observations of our data.

In [None]:
video_game.head(10)

-And last 10 observations...

In [None]:
video_game.tail(10)

-We can see a lot of games have 0.01 Global Sales rate. But how many? Let's learn this

In [None]:
ZeroZeroOne=video_game[video_game["Global_Sales"]==0.01]
counted_zerozeroone = ZeroZeroOne["Global_Sales"].count()
print("Yes, there are {} games that have 0.01 million sales in Global_Sales column.".format(counted_zerozeroone))

-How many platforms does this data have? 

In [None]:
platforms=video_game["Platform"]
platforms=platforms.drop_duplicates()
platforms=platforms.reset_index()
platforms=platforms.drop(columns="Rank")
counted_platforms=platforms["Platform"].count()
print("There are {} gaming platforms in this video game data".format(counted_platforms))

---

# JAPAN Sales

-To analyze Japan's Sales, we should see only Japan's total sales

In [None]:
japan_sales= video_game["JP_Sales"]
sum_of_japan_sales=japan_sales.sum()
print(sum_of_japan_sales)
print("Sum of sales made in Japan is ${:.2f}(million)".format(sum_of_japan_sales))

-Rankings are ordered by Global_Sales column but I want to look from another perspective and select Japan, ranked by Japan


In [None]:
jp_sales=video_game.sort_values("JP_Sales",ascending=False)
jp_sales= jp_sales.reset_index()

-Now i want to drop games that didn't sell in Japanese Game Market.

In [None]:
jp_sales = jp_sales[~(jp_sales["JP_Sales"] <= 0)]  

-I want to select last 5 year's games

In [None]:
jp_latest = jp_sales[jp_sales["Year"]>=2015]
# Sorted by years from the latest to oldest(2015)
jp_latest = jp_latest.sort_values("Year",ascending=False)
jp_latest = jp_latest.reset_index()

-I'm curious about which platform is most in japanese game industries? Let's see

In [None]:
ax=sns.countplot(y="Platform",data=jp_latest)
ax.set(xlabel="Count",ylabel="Platforms",title="Latest games in Japan by Platform")

 Can we say that the latest games are commonly made for Sony consoles and most of it are handheld? 
 But what about sales?

In [None]:
ax=sns.barplot(x="JP_Sales",y="Platform",data=jp_latest,ci=None)
ax.set(xlabel="Video Game Sales in Japan(million $)",ylabel="Platforms",title="Top Selling Platforms from last 5 years")

 Ah, here we can see Nintendo is the most selling brand in Japan! (In last 5 years, of course)


-What is the top selling games by platform in Japan?


In [None]:
jp_plats=jp_sales.groupby("Platform")["JP_Sales"].sum()
jp_plats_sorted=jp_plats.sort_values(ascending=False)

In [None]:
jp_plats_sorted.head(3)

In [None]:
jp_heads=jp_plats_sorted.head(5)
jp_heads=jp_heads.reset_index()

In [None]:
ax=sns.barplot(y="Platform",x="JP_Sales",data=jp_heads)
ax.set(xlabel="Video Game Sales in Japan(million $)",ylabel="Platforms",title="Best Selling Gaming Platforms in Japan")

 We can see that Nintendo DS is the Top Seller at video games. Then followed by Sony's PlayStation and Playstation2 games.
 Actually I really love PSOne, Ps2 and Nintendo DS. This result made me happy :) 


# I'll do more for Japan. But not now.

---

# EUROPE Sales

-What about Europe? Let's have a look!

In [None]:
eu_sales=video_game.sort_values("EU_Sales",ascending=False)
eu_sales=eu_sales.reset_index()
eu_sales = eu_sales[~(eu_sales["EU_Sales"] <= 0)] 

In [None]:
# I want to select last 5 year's games
eu_latest = eu_sales[eu_sales["Year"]>=2015]
# Sorted by years from the latest to oldest
eu_latest = eu_latest.sort_values("Year",ascending=False)
eu_latest = eu_latest.reset_index() 

In [None]:
ax=sns.countplot(y="Platform",data=eu_latest)
ax.set(xlabel="Count",ylabel="Platforms",title="Latest games in Europe by Platform")

   Oh, some values are different in Europe. In europe, most of the games are made for PlayStation 4. But in Japan, this was Playstation Vita. 
   
   -But what about sales? Let's look

In [None]:
ax=sns.barplot(x="EU_Sales",y="Platform",data=eu_latest,ci=None)
ax.set(xlabel="Video Game Sales in Europe(million $)",ylabel="Platforms",title="Top Selling Platforms from last 5 years")

   Oh, I think I can get this result (but not sure): There are too many PS4 users in Europe. What used most made the most sales.
  
  -And someone still plays Wii :D

-What is the top selling games by platform?


In [None]:
eu_plats=eu_sales.groupby("Platform")["EU_Sales"].sum()
eu_plats_sorted=eu_plats.sort_values(ascending=False)

In [None]:
eu_plats_sorted.head(3)
eu_heads=eu_plats_sorted.head(5)
eu_heads=eu_heads.reset_index()

In [None]:
ax=sns.barplot(y="Platform",x="EU_Sales",data=eu_heads)
ax.set(xlabel="Video Game Sales in Europe (million $)",ylabel="Platforms",title="Best Selling Gaming Platforms in Europe")

 All time winner is PlayStation 3 games and followed by Playstation 2 games. And we can see that the third is Microsoft.


**I'll do more for Europe. But not now.**

---

# NORTH AMERICA Sales

-And now we should look what's happening on North America

In [None]:
na_sales=video_game.sort_values("NA_Sales",ascending=False)
na_sales=na_sales.reset_index()
na_sales = na_sales[~(na_sales["NA_Sales"] <= 0)] 

In [None]:
# I want to select last 5 year's games
na_latest = na_sales[na_sales["Year"]>=2015]
# Sorted by years from the latest to oldest
na_latest = na_latest.sort_values("Year",ascending=False)
na_latest = na_latest.reset_index() 

In [None]:
ax=sns.countplot(y="Platform",data=na_latest)
ax.set(xlabel="Count",ylabel="Platforms",title="Latest games in North America by Platform")

 This seems same as Europe's situation.
 
 -But what about sales? Let's look

In [None]:
ax=sns.barplot(x="NA_Sales",y="Platform",data=na_latest,ci=None)
ax.set(xlabel="Video Game Sales in North America (million $)",ylabel="Platforms",title="Top Selling Platforms from last 5 years")

 Oh, actually this shocked me. I didn't wait result like this.  
 We can say that North America's most decided to play Xbox One Games in last 5 years.
 
 And I can clearly see that there are too many Nintendo users in North America. (But we can get actual number from this dataset)
 
 That explains why most of the Zelda cosplayers are from the USA (lol)

-What is the top selling games by platform?

In [None]:
na_plats=na_sales.groupby("Platform")["NA_Sales"].sum()
na_plats_sorted=na_plats.sort_values(ascending=False)

In [None]:
na_plats_sorted.head(3)
na_heads=na_plats_sorted.head(5)
na_heads=na_heads.reset_index()

In [None]:
ax=sns.barplot(y="Platform",x="NA_Sales",data=na_heads)
ax.set(xlabel="Video Game Sales in North America (million $)",ylabel="Platforms",title="Best Selling Gaming Platforms in Europe")

 Our thessis is getting more powerful by that result. Xbox won the race in North America despite red ring of death.


---

-We can get some summary statistics with pandas' describe function. 

In [None]:
video_game.describe()

-I want to see sum of the sales globally

In [None]:
global_sales=video_game["Global_Sales"]
sum_of_global_sales = global_sales.sum()
print(sum_of_global_sales)
print("Sum of sales made in the entire world is ${:.2f}(millions)".format(sum_of_global_sales))

---

# DARK SOULS

In [None]:
dark_souls=video_game[video_game["Name"].isin(["Dark Souls","Dark Souls II","Dark Souls III","Bloodborne"])]
dark_souls=dark_souls.reset_index()
dark_souls=dark_souls.sort_values(["Name","Rank"],ascending=True)

-To see dark souls games rankings by platforms:

In [None]:
print(dark_souls[["Rank","Name","Platform"]])

-Let's visualize Souls games' rankings

In [None]:
ax=sns.scatterplot(x="Rank",y="Platform",hue="Name",data=dark_souls)
ax.set(xlabel="Rankings",ylabel="Platforms",title="Souls Games' rankings by Platform")

# GTA V

In [None]:
gta=video_game[video_game["Name"]=="Grand Theft Auto V"]
gta=gta[gta["Platform"]=="PS3"]
gtaps3=gta["Global_Sales"]

In [None]:
gtaps3=float(gtaps3)
more_than_gta=video_game[video_game["Global_Sales"]>gtaps3]
print(more_than_gta[["Name","Publisher"]])

    We can see that Nintendo beated GTA V! (lol) (Nintendo Fanboy included in that comment)

---

In [None]:
new=video_game[video_game["Year"]==2020]
print(new[["Name","Publisher","Global_Sales"]])

    We have just one game from 2020 in this dataset

** I'll do the analysis for Others and Global later.**

---

# Thank you for reading my analysis.