# Assignment 2 - Make a Python Notebook to Analyze CORGIS Data

### Author 
Mark Andal

### Purpose
The purpose of this assignment is to gain practice utilizing Python to parse, process, and analyze data of our choosing taken from the CORGIS database.

I chose to find:
1. The games with the highest average playtime among all playstyles per year 
2. The highest reviewed games (>= 90 review score) per console

### Data Used
[Video Games Data](https://corgis-edu.github.io/corgis/python/video_games/)

Description (taken from CORGIS website): 
> Originally collected by Dr. Joe Cox, this dataset has information about the sales and playtime of over a thousand video games released between 2004 and 2010. The playtime information was collected from crowd-sourced data on “How Long to Beat”.



## Step 1 - According to the CORGIS usage and documentation, it is necessary to execute these lines to obtain the data, which returns a list of dictionaries representing video game 

In [2]:
import video_games
video_game_dict = video_games.get_video_game()

## Step 2 - Importing PrettyPrint to print data in a cleaner format
Want to print an example data structure to understand the keys

In [3]:
from pprint import pprint as prettyprint

# print the first data structure in the video game dictionary given from CORGIS
prettyprint(video_game_dict[0])

{'Features': {'Handheld?': True,
              'Max Players': 1,
              'Multiplatform?': True,
              'Online?': True},
 'Length': {'All PlayStyles': {'Average': 22.716666666666665,
                               'Leisure': 31.9,
                               'Median': 24.483333333333334,
                               'Polled': 57,
                               'Rushed': 14.3},
            'Completionists': {'Average': 29.766666666666666,
                               'Leisure': 35.03333333333333,
                               'Median': 30.0,
                               'Polled': 20,
                               'Rushed': 22.016666666666666},
            'Main + Extras': {'Average': 24.916666666666668,
                              'Leisure': 29.966666666666665,
                              'Median': 25.0,
                              'Polled': 16,
                              'Rushed': 18.333333333333332},
            'Main Story': {'Average': 14.3333333333

## Step 3 - Create a Function to find the max average all playstyles playtime
`find_max_avg_all_playstyles`

Input: 
1. `video_game_data` - a dictionary of all games

Outputs: 
1. `max_avg_title` - a string noting the title of the game
2. `max_avg` - a float noting the average play length


In [36]:
def find_max_avg_all_playstyles(video_game_data):
    # have a dictionary with title as key and average length as value
    title_avg_dict = {}
    
    # loop through each video game's data structure
    for game in video_game_data:
        avg_length = game['Length']['All PlayStyles']['Average']
        title = game['Title']
        
        # fill dictionary with title and average length of each video game
        title_avg_dict[title] = avg_length
    
    # find the key of the max value (average length)
    max_avg_title = max(title_avg_dict, key=title_avg_dict.get)
    # find the max value
    max_avg = title_avg_dict[max_avg_title]
    return max_avg_title, max_avg

# print results
title, max_avg = find_max_avg_all_playstyles(video_game_dict)
print(title)
print(max_avg)

Monster Hunter Freedom
279.73333333333335


### Step 4 - Create a Function to create a new dictionary that seperates the games by year
`separate_by_year`

Input:
1. `video_game_data` - a dictionary of all games

Output:
1. `year_dictionary` - a dictionary of all games separated by year as the keys

In [38]:
# separate the video game dictionary by year
def separate_by_year(video_game_data):
    year_dictionary = {}
    
    # loop through each video game's data structure
    for game in video_game_data:
        release_year = game['Release']['Year']
        
        # if key (year) does not exist already
        if release_year not in year_dictionary:
            # add game data structure in a list to the year
            year_dictionary[release_year] = [game]
        else:
            # if it does exist, append the game data structure to the list
            year_dictionary[release_year].append(game)

    return year_dictionary

year_dictionary = separate_by_year(video_game_dict)


### Step 5 - Combine the 2 functions in a new function that finds the max average and title per year
`find_max_per_year`

Input:
1. `separate_year_dict` - video game dictionary separated by year

Output:
1. `year_title_avg_dict` - dictionary that has year as key and title and max average in a list as value

In [39]:
def find_max_per_year(separate_year_dict):
    year_title_avg_dict = {}
    
    # loop through each year
    for year in year_dictionary:
        # call the find_max_avg_all_playstyles function for each year dictionary
        title, max_avg = find_max_avg_all_playstyles(year_dictionary[year])
        
        # add the returned title and max avg in a list as the value to the year key
        year_title_avg_dict[year] = [title, max_avg]

    return year_title_avg_dict

# print results
year_title_avg_dict = find_max_per_year(year_dictionary)
print(year_title_avg_dict)

{2004: ['Metal Gear Ac!d', 25.383333333333333], 2005: ['Animal Crossing: Wild World', 168.96666666666667], 2006: ['Monster Hunter Freedom', 279.73333333333335], 2007: ['Monster Hunter Freedom 2', 136.01666666666668], 2008: ['Animal Crossing: City Folk', 191.25]}


### Step 6 - Createa Function to calculate all the >= 90 review scores for each console
`get_90s_per_console`

Input:
1. `video_game_data` - a dictionary of all games

Output:
1. `console_dictionary` - a dictionary with the console as key and a list of the games with >= 90 review scores as the value 

In [40]:
def get_90s_per_console(video_game_data):
    console_dictionary = {}
    
    # loop through each game data structure
    for game in video_game_data:
        score = game['Metrics']['Review Score']
        console = game['Release']['Console']
        title = game['Title']
        
        # if the review score >= 90
        if score >= 90:
            # if key (console) does not exist already
            if console not in console_dictionary:
                # add the title and score in a list to the console
                console_dictionary[console] = [title, score]
            else:
                # if it does exist, append the title and score list to the console key-value list
                title_score_list = [title, score]
                console_dictionary[console].append(title_score_list)
    return console_dictionary

# print results
console_dictionary = get_90s_per_console(video_game_dict)
print(console_dictionary)

{'Nintendo DS': ['Mario Kart DS', 91, ['Advance Wars: Dual Strike', 90], ['The Legend of Zelda: Phantom Hourglass', 90], ['Chrono Trigger', 92]], 'X360': ['Gears of War', 94, ['The Elder Scrolls IV: Oblivion', 94], ["Tom Clancy's Ghost Recon: Advanced Warfighter", 90], ['Halo 3', 94], ['Call of Duty 4: Modern Warfare', 94], ['Forza Motorsport 2', 90], ['Guitar Hero II', 92], ['Rock Band', 92], ['Mass Effect', 91], ['BioShock', 96], ['The Orange Box', 96], ['Grand Theft Auto IV', 98], ['Gears of War 2', 93], ['Fallout 3', 93], ['Rock Band 2', 92]], 'Nintendo Wii': ['The Legend of Zelda: Twilight Princess', 95, ['Super Mario Galaxy', 97], ['Resident Evil 4', 91], ['Metroid Prime 3: Corruption', 90], ['Super Smash Bros.: Brawl', 93], ['Rock Band', 92], ['?kami', 90]], 'PlayStation 3': ['Call of Duty 4: Modern Warfare', 94, ['The Elder Scrolls IV: Oblivion', 93], ['Rock Band', 92], ['Grand Theft Auto IV', 98], ['Metal Gear Solid 4: Guns of the Patriots', 94], ['LittleBigPlanet', 95], ['Fal