# Elementary Statistical Analysis and Visualization of Video Game Data

### In this notebook I aim to grasp a better understanding of visual analysis in python by cutting my teeth on video game data scraped from the web. First I will scrape data from an online video games database. I will then munge and parse the data for use, and finally visualize several aspects of the data using python code in order to extract a variety of conclusions. While the end conclusions I reach from the data are not the overall goal of this project, I hope to come away with at least a few worthwhile ideas about games I may want to put on my "to-play" list.

## Section 0: Resources and Acknowledgements
### I will be consulting a number of websites and textbooks for this project. I will list them below as I use them, both as credit where it's due and to give anyone who may read this a good list of resources should they embark on a similar learning endeavor:

1: As always when I need a git refresher: Pro Git by Scott Chacon and Ben Straub - available at https://git-scm.com/book/en/v2

2: Twitch IGDB API Documentation - available at https://api-docs.igdb.com/#about

  2a: Twitch IGDB API Python Documentation - available at https://github.com/twitchtv/igdb-api-python


## Section 1: Scraping the Data

### I will be using the IGDB.com games database to scrape data for these analyses. The documentation for the IGDB API can be found in Secion 0.

In [16]:
#typically I would list all of my imports in one section, but as this is largely a learning exercise I will import packages in the sections at which they become relevant.
import numpy as np 
import pandas as pd 
from igdb.wrapper import IGDBWrapper
import requests
import json


#read credentials auth.json from project directory - this is user info and is hterefore kept private - auth.json is in .gitignore
user_info =  json.loads(open('auth.json').read())
user_info['grant_type']='client_credentials'


r = requests.post('https://id.twitch.tv/oauth2/token', params=user_info)
access_token = json.loads(r._content)['access_token']
expires_in = json.loads(r._content)['expires_in']
wrapper = IGDBWrapper(user_info['client_id'], access_token)

In [28]:
#scape platform data


platforms = wrapper.api_request(
    'platforms',
    'fields *; limit 500;'

)

platforms = platforms.decode("utf-8")
platforms_json = json.loads(platforms)

platforms_df = pd.DataFrame(platforms_json)

platforms_df
print(platforms_df)





      id                alternative_name  category  created_at  \
0    158  Commodore Dynamic Total Vision       6.0  1510012800   
1    339              Kids Computer Pico       1.0  1595808000   
2      8                             PS2       1.0  1297555200   
3     39                             NaN       4.0  1317686400   
4     94                             NaN       6.0  1414195200   
..   ...                             ...       ...         ...   
166   27                             NaN       6.0  1300147200   
167  309                             NaN       5.0  1590710400   
168   59                             NaN       1.0  1372204800   
169  372                             NaN       3.0  1606521600   
170  373                            ZX81       6.0  1607024002   

                 name  platform_logo            slug  updated_at  \
0      Commodore CDTV          292.0  commodore-cdtv  1522972800   
1           Sega Pico            NaN       sega-pico  1595808000   
2  

In [30]:
genres = wrapper.api_request(
    'genres',
    'fields *; limit 100;'
)

genres_json = genres.decode("utf-8")
genres_df = pd.DataFrame(json.loads(genres_json))

print(genres_df, genres_df.shape)

    id  created_at                        name                       slug  \
0    4  1297555200                    Fighting                   fighting   
1    5  1297555200                     Shooter                    shooter   
2    7  1297555200                       Music                      music   
3    8  1297555200                    Platform                   platform   
4    9  1297555200                      Puzzle                     puzzle   
5   10  1297555200                      Racing                     racing   
6   11  1297555200    Real Time Strategy (RTS)     real-time-strategy-rts   
7   12  1297555200          Role-playing (RPG)           role-playing-rpg   
8   13  1297555200                   Simulator                  simulator   
9   14  1297555200                       Sport                      sport   
10  15  1297555200                    Strategy                   strategy   
11  16  1297641600   Turn-based strategy (TBS)    turn-based-strategy-tbs   

In [40]:
#test igdb query limits

games = wrapper.api_request(
    'games',
    'fields *; limit 500;'
)

games_json = games.decode("utf-8")
games_df = pd.DataFrame(json.loads(games_json))

print(games_df.shape)

(500, 47)
