# Data Analysis with Pandas
## Video Game Sales
### Logan Jones | Jan 19, 2021


In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('vgsales.csv')
df.head()

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


### 1. Which company is the most common video game publisher?

In [3]:
most_common_publisher = df['Publisher'].mode().item()
most_common_publisher

'Electronic Arts'

### 2. What’s the most common platform?

In [4]:
most_common_platform = df['Platform'].mode().item()
most_common_platform

'DS'

### 3. What about the most common genre?

In [5]:
most_common_genre = df['Genre'].mode().item()
most_common_genre

'Action'

### 4. What are the top 20 highest grossing games?

In [6]:

top_twenty_highest_grossing_games = df.sort_values(by="Global_Sales", ascending=False).head(20)
top_twenty_highest_grossing_games

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37
5,6,Tetris,GB,1989.0,Puzzle,Nintendo,23.2,2.26,4.22,0.58,30.26
6,7,New Super Mario Bros.,DS,2006.0,Platform,Nintendo,11.38,9.23,6.5,2.9,30.01
7,8,Wii Play,Wii,2006.0,Misc,Nintendo,14.03,9.2,2.93,2.85,29.02
8,9,New Super Mario Bros. Wii,Wii,2009.0,Platform,Nintendo,14.59,7.06,4.7,2.26,28.62
9,10,Duck Hunt,NES,1984.0,Shooter,Nintendo,26.93,0.63,0.28,0.47,28.31


### 5. For North American video game sales, what’s the median?

In [7]:
na_median_sales = df['NA_Sales'].median()
na_median_sales

0.08

#### 5a. Provide a secondary output showing ten games surrounding the median sales output.  
> assume that games with same median value are sorted in descending order

In [8]:
na_median_games = df[df['NA_Sales'] == na_median_sales]
na_median_mid = round(len(df[df['NA_Sales'] == na_median_sales]) / 2)
ten_median_na_sellers = na_median_games.iloc[na_median_mid-5: na_median_mid+5]
ten_median_na_seller_names = list(ten_median_na_sellers.Name.values)
ten_median_na_sellers


Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
9934,9936,Turok: Evolution,GBA,2002.0,Shooter,Acclaim Entertainment,0.08,0.03,0.0,0.0,0.12
9957,9959,Deadpool,XOne,2015.0,Action,Activision,0.08,0.03,0.0,0.01,0.12
9996,9998,GT Advance 2: Rally Racing,GBA,2001.0,Racing,THQ,0.08,0.03,0.0,0.0,0.11
10000,10002,A Witch's Tale,DS,2009.0,Role-Playing,Nippon Ichi Software,0.08,0.0,0.03,0.01,0.11
10012,10014,Nickelodeon Dance,X360,2011.0,Misc,Take-Two Interactive,0.08,0.02,0.0,0.01,0.11
10020,10022,Phantasy Star Collection,GBA,2002.0,Role-Playing,Atari,0.08,0.03,0.0,0.0,0.11
10022,10024,LEGO Knights' Kingdom,GBA,2004.0,Action,THQ,0.08,0.03,0.0,0.0,0.11
10024,10026,Family Game Night 4: The Game Show,PS3,2011.0,Misc,Electronic Arts,0.08,0.02,0.0,0.01,0.11
10026,10028,NBA Jam 2002,GBA,2002.0,Sports,Acclaim Entertainment,0.08,0.03,0.0,0.0,0.11
10029,10031,Tony Hawk's Pro Skater 5,XOne,2015.0,Sports,Activision,0.08,0.02,0.0,0.01,0.11


### 6. For the top-selling game of all time, how many standard deviations above/below the mean are its sales for North America?

In [9]:
top_game_na_sales = df.iloc[0].NA_Sales
top_to_mean = top_game_na_sales - df["NA_Sales"].mean()
std_away = top_to_mean / df["NA_Sales"].std()
std_away

50.47898767479108

### 7. The Nintendo Wii seems to have outdone itself with games. How does its average number of sales compare with all of the other platforms?

In [10]:
wii_rank = df.groupby("Platform")["Global_Sales"].mean().sort_values(ascending=False).rank(ascending=False)["Wii"]
wii_rank

9.0

### 8. Come up with 3 more questions that can be answered with this data set.

#### 8a. What is the 10th best selling video game?

In [11]:
tenth_ranked = df.loc[df["Rank"] == 10]["Name"].item()
tenth_ranked

'Duck Hunt'

8b. What year was World of Warcraft released?

In [12]:
wow_release = df.loc[df["Name"] == "World of Warcraft"]["Year"].item()
wow_release

2004.0

#### 8c. How many games has Activision released?

In [13]:
activision_count = (df.Publisher == "Activision").sum()
activision_count

975

## Tests

In [14]:
def test():

    def assert_equal(actual,expected):
        assert actual == expected, f"Expected {expected} but got {actual}"

    assert_equal(most_common_publisher, 'Electronic Arts')
    assert_equal(most_common_platform, 'DS')
    assert_equal(most_common_genre, 'Action')
    assert_equal(top_twenty_highest_grossing_games.iloc[0].Name, 'Wii Sports')
    assert_equal(top_twenty_highest_grossing_games.iloc[19].Name, 'Brain Age: Train Your Brain in Minutes a Day')
    assert_equal(na_median_sales, 0.08)
    assert_equal(ten_median_na_seller_names, 
    ['Turok: Evolution', 'Deadpool', 'GT Advance 2: Rally Racing',
       "A Witch's Tale", 'Nickelodeon Dance', 'Phantasy Star Collection',
       "LEGO Knights' Kingdom", 'Family Game Night 4: The Game Show',
       'NBA Jam 2002', "Tony Hawk's Pro Skater 5"])
    assert_equal(std_away, 50.47898767479108)
    assert_equal(wii_rank, 9) 

    print("Success!!!")

test()

Success!!!
