# Data Analysis with Pandas
## Video Game Sales
### Yousef Jariry
#### 25/10/2021

In [1]:
import pandas as pd

In [2]:
df =pd.read_csv('./vgsales.csv')
df

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.00,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


## 1.Which company is the most common video game publisher?

In [3]:
# mode() ==> Pandas get the most frequent values of a column
df_most_common=df['Publisher'].mode()
df_most_common

0    Electronic Arts
dtype: object

## 2.What’s the most common platform?

In [4]:
# mode() ==> Pandas get the most frequent values of a column
most_common_platform=df['Platform'].mode()
most_common_platform

0    DS
dtype: object

## 3.What about the most common genre?

In [5]:
# mode() ==> Pandas get the most frequent values of a column
most_common_genre=df['Genre'].mode()
most_common_genre

0    Action
dtype: object

## 4.What are the top 20 highest grossing games?

In [6]:
# https://stackoverflow.com/questions/39066260/get-first-and-second-highest-values-in-pandas-columns
top_twenty_highest_grossing_games=df[['Name','Global_Sales']].nlargest(20,['Global_Sales'])
top_twenty_highest_grossing_games

Unnamed: 0,Name,Global_Sales
0,Wii Sports,82.74
1,Super Mario Bros.,40.24
2,Mario Kart Wii,35.82
3,Wii Sports Resort,33.0
4,Pokemon Red/Pokemon Blue,31.37
5,Tetris,30.26
6,New Super Mario Bros.,30.01
7,Wii Play,29.02
8,New Super Mario Bros. Wii,28.62
9,Duck Hunt,28.31


## 5.For North American video game sales, what’s the median?

In [7]:
na_median_sales=df['NA_Sales'].median()
na_median_sales

0.08

### A.Provide a secondary output showing ten games surrounding the median sales output

In [8]:
med=df[df['NA_Sales']==df['NA_Sales'].median()].head(10)
med

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
446,447,Dragon Warrior IV,NES,1990.0,Role-Playing,Enix Corporation,0.08,0.0,3.03,0.01,3.12
497,498,World Soccer Winning Eleven 7 International,PS2,2003.0,Sports,Konami Digital Entertainment,0.08,1.24,1.13,0.45,2.9
1617,1619,Farming Simulator 2015,PC,2014.0,Simulation,Focus Home Interactive,0.08,1.02,0.0,0.13,1.23
1926,1928,Pro Evolution Soccer 2008,X360,2007.0,Sports,Konami Digital Entertainment,0.08,0.9,0.04,0.05,1.07
2067,2069,Winning Eleven: Pro Evolution Soccer 2007 (All...,X360,2006.0,Sports,Konami Digital Entertainment,0.08,0.9,0.02,0.0,1.0
2373,2375,Phantasy Star Portable 2,PSP,2009.0,Role-Playing,Sega,0.08,0.11,0.62,0.06,0.88
2579,2581,The Sims 2: Castaway,PSP,2007.0,Simulation,Electronic Arts,0.08,0.46,0.0,0.25,0.8
3186,3188,SingStar Queen,PS2,2009.0,Misc,Sony Computer Entertainment,0.08,0.12,0.0,0.44,0.63
3503,3505,Top Spin 3,PS3,2008.0,Action,Take-Two Interactive,0.08,0.37,0.0,0.12,0.57
3703,3705,Sonic & All-Stars Racing Transformed,PS3,2012.0,Racing,Sega,0.08,0.33,0.01,0.11,0.54


### B.assume that games with same median value are sorted in descending order

In [9]:
df_med=med[['Name', 'NA_Sales']][0:10]
ten_median_na_seller_names=df_med.iloc[0].Name
df_med

Unnamed: 0,Name,NA_Sales
446,Dragon Warrior IV,0.08
497,World Soccer Winning Eleven 7 International,0.08
1617,Farming Simulator 2015,0.08
1926,Pro Evolution Soccer 2008,0.08
2067,Winning Eleven: Pro Evolution Soccer 2007 (All...,0.08
2373,Phantasy Star Portable 2,0.08
2579,The Sims 2: Castaway,0.08
3186,SingStar Queen,0.08
3503,Top Spin 3,0.08
3703,Sonic & All-Stars Racing Transformed,0.08


## 6.For the top-selling game of all time, how many standard deviations above/below the mean are its sales for North America?

In [10]:
# df_max=df['NA_Sales'].max()
# df_std=df_max 
# df_std

# df_std=df['NA_Sales'].std()
# df_std

df_std=(df['NA_Sales'].head(1)-df['NA_Sales'].mean())/df['NA_Sales'].std()
df_std

0    50.478988
Name: NA_Sales, dtype: float64

## 7.The Nintendo Wii seems to have outdone itself with games. How does its average number of sales compare with all of the other platforms?

In [11]:
df_wii=df[df['Platform']=='Wii']['Global_Sales'].mean()
all_Platform=df['Global_Sales'].mean()
all_=(all_Platform,df_wii)
all_

(0.5374406555006628, 0.6994037735849057)

8.a- How many Game of sports

In [12]:
df_sport=df[df['Genre']=='Sports'].count()
# df[df['Year']=='2008.0']
df_sport['Rank']


2346

8.b-How many game for PC Platform?

In [13]:
df_platform=df[df['Platform']=='PC'].count()
df_platform['Rank']

960

8.c-What is the count,mean,std,min and max for all sales?

In [14]:
df['Global_Sales'].describe()

count    16598.000000
mean         0.537441
std          1.555028
min          0.010000
25%          0.060000
50%          0.170000
75%          0.470000
max         82.740000
Name: Global_Sales, dtype: float64

In [15]:
def test():

    def assert_equal(actual,expected):
        assert actual == expected, f"Expected {expected} but got {actual}"

    assert_equal(df_most_common.values, 'Electronic Arts')
    assert_equal(most_common_platform.values, 'DS')
    assert_equal(most_common_genre.values, 'Action')
    assert_equal(top_twenty_highest_grossing_games.iloc[0].Name, 'Wii Sports')
    assert_equal(top_twenty_highest_grossing_games.iloc[19].Name, 'Brain Age: Train Your Brain in Minutes a Day')
    assert_equal(na_median_sales,0.08)
    assert_equal(ten_median_na_seller_names, 'Dragon Warrior IV')

    print("Success!!!")

test()

Success!!!
