# Video Game Sales

**Data**: Analyze sales data from more than 16,500 games.
[Video Game Data](https://www.kaggle.com/gregorut/videogamesales)

**Author**: Nawal Ahmad

**Date**: 10/Aug/2021


In [238]:
import pandas as pd

In [239]:
df = pd.read_csv('./vgsales.csv')
df.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


## Which company is the most common video game publisher?


In [240]:
most_common_publisher = df['Publisher'].value_counts().idxmax()
most_common_publisher

'Electronic Arts'

## What’s the most common platform?


In [241]:
most_common_platform = df['Platform'].value_counts().idxmax()
most_common_platform

'DS'

## What about the most common genre?


In [242]:
most_common_genre = df['Genre'].value_counts().idxmax()
most_common_genre


'Action'

## What are the top 20 highest grossing games?


In [243]:
top_twenty_highest_grossing_games = df.sort_values('Global_Sales', ascending = False).head(20)
top_twenty_highest_grossing_games

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37
5,6,Tetris,GB,1989.0,Puzzle,Nintendo,23.2,2.26,4.22,0.58,30.26
6,7,New Super Mario Bros.,DS,2006.0,Platform,Nintendo,11.38,9.23,6.5,2.9,30.01
7,8,Wii Play,Wii,2006.0,Misc,Nintendo,14.03,9.2,2.93,2.85,29.02
8,9,New Super Mario Bros. Wii,Wii,2009.0,Platform,Nintendo,14.59,7.06,4.7,2.26,28.62
9,10,Duck Hunt,NES,1984.0,Shooter,Nintendo,26.93,0.63,0.28,0.47,28.31


## For North American video game sales, what’s the median?


In [244]:
na_median_sales = df['NA_Sales'].median()
na_median_sales

0.08

### Ten games surrounding the median sales output in North American:

In [245]:
ten_median_na_seller_names =df[df['NA_Sales'] == na_median_sales ].head(10)
ten_median_na_seller_names 

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
446,447,Dragon Warrior IV,NES,1990.0,Role-Playing,Enix Corporation,0.08,0.0,3.03,0.01,3.12
497,498,World Soccer Winning Eleven 7 International,PS2,2003.0,Sports,Konami Digital Entertainment,0.08,1.24,1.13,0.45,2.9
1617,1619,Farming Simulator 2015,PC,2014.0,Simulation,Focus Home Interactive,0.08,1.02,0.0,0.13,1.23
1926,1928,Pro Evolution Soccer 2008,X360,2007.0,Sports,Konami Digital Entertainment,0.08,0.9,0.04,0.05,1.07
2067,2069,Winning Eleven: Pro Evolution Soccer 2007 (All...,X360,2006.0,Sports,Konami Digital Entertainment,0.08,0.9,0.02,0.0,1.0
2373,2375,Phantasy Star Portable 2,PSP,2009.0,Role-Playing,Sega,0.08,0.11,0.62,0.06,0.88
2579,2581,The Sims 2: Castaway,PSP,2007.0,Simulation,Electronic Arts,0.08,0.46,0.0,0.25,0.8
3186,3188,SingStar Queen,PS2,2009.0,Misc,Sony Computer Entertainment,0.08,0.12,0.0,0.44,0.63
3503,3505,Top Spin 3,PS3,2008.0,Action,Take-Two Interactive,0.08,0.37,0.0,0.12,0.57
3703,3705,Sonic & All-Stars Racing Transformed,PS3,2012.0,Racing,Sega,0.08,0.33,0.01,0.11,0.54


### assume that games with same median value are sorted in descending order


In [246]:
median_des = df[df['NA_Sales'] == na_median_sales].sort_values('Rank', ascending= False)
median_des

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
11492,11494,Ultimate Shooting Collection,Wii,2008.0,Shooter,Milestone,0.08,0.00,0.00,0.00,0.08
11455,11457,The Hidden,3DS,,Adventure,Unknown,0.08,0.00,0.00,0.00,0.08
11432,11434,DanceDanceRevolution,X360,2011.0,Simulation,Konami Digital Entertainment,0.08,0.00,0.00,0.01,0.08
11431,11433,Little League World Series Baseball: Double Play,DS,2010.0,Sports,Activision,0.08,0.00,0.00,0.01,0.08
11403,11405,My English Coach: Para Hispanoparlantes,DS,2009.0,Misc,Ubisoft,0.08,0.00,0.00,0.01,0.08
...,...,...,...,...,...,...,...,...,...,...,...
2067,2069,Winning Eleven: Pro Evolution Soccer 2007 (All...,X360,2006.0,Sports,Konami Digital Entertainment,0.08,0.90,0.02,0.00,1.00
1926,1928,Pro Evolution Soccer 2008,X360,2007.0,Sports,Konami Digital Entertainment,0.08,0.90,0.04,0.05,1.07
1617,1619,Farming Simulator 2015,PC,2014.0,Simulation,Focus Home Interactive,0.08,1.02,0.00,0.13,1.23
497,498,World Soccer Winning Eleven 7 International,PS2,2003.0,Sports,Konami Digital Entertainment,0.08,1.24,1.13,0.45,2.90


## For the top-selling game of all time, how many standard deviations above/below the mean are its sales for North America?


In [247]:
standard_deviations = df['NA_Sales'].std()
mean = df['NA_Sales'].mean()
standard_deviations
# mean

0.8166830292988796

## The Nintendo Wii seems to have outdone itself with games. How does its average number of sales compare with all of the other platforms?


## Tests:

In [252]:
def test():

    def assert_equal(actual,expected):   
        assert actual == expected, f"Expected {expected} but got {actual}" 
        
    assert_equal(most_common_publisher, "Electronic Arts")
    assert_equal(most_common_platform, "DS")
    assert_equal(most_common_genre, "Action")  
    assert_equal(top_twenty_highest_grossing_games.iloc[0].Name, 'Wii Sports')
    assert_equal(top_twenty_highest_grossing_games.iloc[19].Name,'Brain Age: Train Your Brain in Minutes a Day')
    assert_equal(na_median_sales, 0.08)
    assert_equal(ten_median_na_seller_names.iloc[6].Name, 'The Sims 2: Castaway')
    print("Success!!!")
test()

Success!!!
