# Videogame Consoles VS. fighting : which consoles have the best games ?

## Introduction

The aim of this project is to develop KPIs to measure main videogame console performances from 2000 in order to provide new insights over the videogame console game scene & market.

The project relies on the so-called "metacritics" score, an aggregation of all grades given by major press & media outlets over videogame, and how it might differ from the "playerscore", aggregation of grades given by players on the same games.
The main aim of the project has been to colect data over a large panel of consoles, in order to compare the games playable on each console, and then to be able to measure which consoles have the greatest amount of good games, both in terms of Metascore (good critics from the media) and Playerscore (positive critics from the player).

The metrics used will mix Raw Sale figures & Critics scores from the media & players.

## Original Hypothesis

Due to their great audience, positive coverage, and important sales, PlayStation & Nintendo have a positive image among the players, and should perform very well on the various selected criteria.
Is PlayStation the One True Gaming platform ? :-)

## Data sources

The project has developed data research over various sources : public API, web scraping and merging from various available CSVs.

The original idea was to scrap data from [RAWG](https://rawg.io/) , a popular public API centralising informations over 360K games, in order to create an ad hoc datasets over the research consoles.
To measure sales performance of all consoles, the popular website [VGchartz.com](http://www.vgchartz.com/analysis/platform_totals/) provides an updated table over the sales of all major videogame consoles since the 1970s. That table has been scraped and transformed into an exploitable Pandas dataframe.
Finally, in order to get information of Metacritics score, several CSVs providing information over the scores have been selected and merged together.

## 1 - Database creation

#### 1.1 - RAWG API calls

The process to retrieve information from the RAWG API i is rather straightforward : creating Get requests with the needed argument on the console #ID, with several arguments depending on the desired results (sort results, number of page etc.)
The [RAWG API documentation](https://api.rawg.io/docs/) gives information over the type of requests required.

Each API call is limited to 40 results, therefore a for loop of calls, to retrieve 40 answers at each call, was created.
The calls url centralize required information, the platform ID & the decreasing sorting order, in terms of rating
(example of API call url = f'https://api.rawg.io/api/games?platforms=105&page={i}&page_size=40&ordering=-rating)

The data pipeline, which might have been authomatized through a function to call, was nonetheless made manually, due to the risk of API calls crashing, and the various parameters to change at each API calls.

It has been a great exercise to practice my API call skills and improve my knowledge over the RAWG API overall, but the results was limited for my research scope, due to the lack of MetaScore (out of 24K games scraped, I ended with only 3K of Metascore, when I expected at least 10K to make a proper dataset).

Here below an example of API call on the GameCube console

In [3]:
#List of23 256 games over 10 consols = 581 calls of 40 answers each

#platform ID - ID#1 = XboxOne - 'games_count': 3063
#platform ID - ID#7 = Switch - 'games_count': 3092
#platform ID - ID#10 = Wii U - 'games_count': 1272
#platform ID - ID#11 = Wii 'games_count': 2301
#platform ID - ID#80 = Xbox_old - 'games_count': 630
#platform ID - ID#14 = Xbox 360 - 'games_count': 2489
#platform ID - ID#15 = PS2 - games_count': 1737,
#platform ID - ID#16 = PS3 - 'games_count': 3567
#platform ID - ID#18 = PS4 'games_count': 4486
#platform ID - ID#105 = GameCube 'games_count': 619


In [None]:
#ACQUISTION
# Importing data from API
list_of_gamecube = []
offset = 0
#the range changes depending on the number of games available for the console, divided by 40 (the number of answer by call).
# The GameCube having 619 games registered, divided by 40, makes 16 calls to make
for i in range(16):
    url = f'https://api.rawg.io/api/games?platforms=105&page={i}&page_size=40&ordering=-rating'
    limit = 100
    response = requests.get(url)
    p = {
    "limit" : limit,
    "offset" : offset}
    result = response.json()
    list_of_gamecube.append(result)

In [None]:
# Concatenating all GameCube games API calls in one general GameCube dataframe

In [None]:
frame = []
for i in range(1,len(list_of_gamecube)):
    pd_gc_temp = pd.DataFrame(list_of_gamecube[i]['results'])
    frame.append(pd_gc_temp)
pd_gamecube = pd.concat(frame, sort=False)

In [None]:
#Keeping only specific interesting columns for the study

In [None]:
pd_gamecube = pd_gamecube[['id', 'slug', 'name', 'released', 'rating', 'rating_top', 'ratings_count', 'metacritic', 'suggestions_count', 'genres']]

In [None]:
#Keeping only 1 genre per column

In [None]:
pd_gamecube['genres'] = pd_gamecube['genres'].apply(lambda x: x[0]["name"] if x else x)

In [None]:
pd_gamecube.to_csv('pd_gamecube.csv')

#### 1.2 - Game sales scraping

Importing data from VGChartz through pd_read_html method & cleaning the data to only keep 14 selected consoles.
With minor data cleaning, pd_read_html allows to have a,n exploitable dataset to measure sales performance from all major consoles.

This prototype version of the study features only sales figures for consoles in total ; a future enhanced version of this market study should feature more sales KPIs : 
- the evolution of sales figures over time (in order to produce a chronological chart of sales over time). V1 produced, but the sales number are spread out over various websites, with great incoherence in the datasets, and require important data cleaning. A first version of datasets have been developed for this study.
- sales figures for each game, in order to measure, for each game, how its sales performed in comparison to its critics.

In [5]:
url_total_consoles = 'http://www.vgchartz.com/platforms/'

df_total_consoles=pd.read_html(url_total_consoles, header=0)[0]

df_total_consoles.head(50)

Unnamed: 0,Pos,Platform,Hardware,Software,Tie Ratio,Games
0,1,PlayStation 2 (PS2),157.68,1661.95,10.54,3549
1,2,Xbox 360 (X360),85.8,1008.03,11.75,3678
2,3,PlayStation 3 (PS3),87.41,974.81,11.15,3316
3,4,Wii (Wii),101.64,965.78,9.5,2809
4,5,PlayStation (PS),102.5,962.01,9.39,2680
5,6,Nintendo DS (DS),154.9,844.74,5.45,4009
6,7,PlayStation 4 (PS4),107.14,595.77,5.56,1049
7,8,Nintendo Entertainment System (NES),61.91,501.48,8.1,1093
8,9,Game Boy (GB),118.69,501.11,4.22,1608
9,10,Super Nintendo Entertainment System (SNES),49.1,379.06,7.72,1207


In [2]:
df_studied_consoles_total_sales = df_total_consoles[(df_total_consoles['Platform'] == 'PlayStation (PS)') |
                    (df_total_consoles['Platform'] == 'PlayStation 2 (PS2)') | 
                    (df_total_consoles['Platform'] == 'PlayStation 3 (PS3)')| 
                    (df_total_consoles['Platform'] == 'PlayStation 4 (PS4)')|
                    (df_total_consoles['Platform'] == 'PlayStation (PSP)') |
                    (df_total_consoles['Platform'] == 'PlayStation Vita (PSV)') |
                    (df_total_consoles['Platform'] == 'Dreamcast (DC)') |
                    (df_total_consoles['Platform'] == 'Nintendo DS (DS)')|                                          
                    (df_total_consoles['Platform'] == 'Nintendo 64 (N64)')|                                        
                    (df_total_consoles['Platform'] == 'GameCube (GC)')|
                    (df_total_consoles['Platform'] == 'Wii (Wii)') | 
                    (df_total_consoles['Platform'] == 'Wii U (WiiU)') |
                    (df_total_consoles['Platform'] == 'Nintendo Switch (NS)') |
                    (df_total_consoles['Platform'] == 'Xbox (XB)') |
                    (df_total_consoles['Platform'] == 'Xbox 360 (X360)')|
                    (df_total_consoles['Platform'] == 'Xbox One (XOne)')]

In [3]:
df_studied_consoles_total_sales.reset_index(inplace=True)
df_studied_consoles_total_sales.drop(columns='index', inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


In [4]:
df_studied_consoles_total_sales

Unnamed: 0,Pos,Platform,Hardware,Software,Tie Ratio,Games
0,1,PlayStation 2 (PS2),157.68,1661.95,10.54,3549
1,2,Xbox 360 (X360),85.8,1008.03,11.75,3678
2,3,PlayStation 3 (PS3),87.41,974.81,11.15,3316
3,4,Wii (Wii),101.64,965.78,9.5,2809
4,5,PlayStation (PS),102.5,962.01,9.39,2680
5,6,Nintendo DS (DS),154.9,844.74,5.45,4009
6,7,PlayStation 4 (PS4),107.14,595.77,5.56,1049
7,15,Xbox (XB),24.65,271.46,11.01,978
8,16,Xbox One (XOne),46.6,269.06,5.77,625
9,17,Nintendo 64 (N64),32.93,225.16,6.84,395


## 2 - Analysis

In [2]:
# import requests
# import json
# import pandas as pd
# # from pandas.io.json import json_normalize
# import matplotlib.pyplot as plt
# import seaborn as sns
# import numpy as np
# df_metacritics_raw = pd.read_csv(r'D:\IronHack\IronHack_Classes\Week 5\Video_game_project\JV_DataSets\MetaCritics\metacritics_raw.csv')

The study, in part developed over Tableau, illustrates over various KPIs the performance of the 4 major brands (Sony, Microsoft, Nintendo & Sega) from 2000 onwards.

### 2.1 Consoles Sales Figures - PlayStation & Nintendo clear winners ?

First part focuses on the sales figures for all 14 studied consoles, divided by brand.
PlayStation is a clear winner in that domain, due to positive performance for PS2 (all-time best sold console), and PS3 & PS4 (#3 & #4 of the ranking).
Nintendo is close second in terms of sales, due mainly to the extremely high-performance of its nomad DS consol ; Wii ranks #5, but other Nintendo's consoles sold much less than the competition.

Overall, XboX has rather deceiving sales figures, X360 being its best-seller, but ranking at the #7 rank on the overall consoles sales figures.

fig. 1 : number of units sold in millions per consoles

<img src=https://raw.githubusercontent.com/Binardino/Project-Week-5-Your-Own-Project/master/your-project/Charts_PNG/Console_War_BarChart.png>

The second sales figure below calculates the ratio of games sold per console, highlighting that, even if XboX sold less consoles than the competition, XboX consoles have overall positive ratio of games per consoles (almost 12 games in average per Xbox 360 sold), which highlights that players did buy games on the platform.

fig. 2 : ratio of games sold per consoles

<img src=https://raw.githubusercontent.com/Binardino/Project-Week-5-Your-Own-Project/master/your-project/Charts_PNG/Ratio_game_console_BarChart.png>

### 2.2 Consols per Critics Score

### Difference between average MetaScore & PlayerScore

The following scatter plot illustrates difference of average metascore & playerscore by console.
The goal of the scatterplot is to highlight highest differences between those two scores.

Two key learnings :
- Overall, Players are being harsher and more critical than the press over most of consoles.
- No clear winners in that category.


Main consoles :
- Dreamcast has the best MetaScore, but rather smashed by the player critics
- PlayStation is having good reviews in average, with no clear difference between Meta & Playerscores (if not better reviews from players for PS2 & PS Vita)
- Nintendo is the console with, in average, the fewer difference between the Meta & Playerscore


When comparing averange Metascore VS. Player ratings, one may affirm that players are in general harsher than the press on the game quality.
Metascore average per console is oscillating between 6.5 & 8 depending on the console (DreamCast having the higher Metascore, and Nintendo's Wii surprisingly the worst one), whereas playerscore may greatly differ negatively in comparison.

<img src=https://raw.githubusercontent.com/Binardino/Project-Week-5-Your-Own-Project/master/your-project/Charts_PNG/scores_scatter.png>

### 2.3 Top 1000 Games evaluation

The final part of the study focuses on the top 1000 games, in terms of metascore, in order to evaluate which consoles have generated the greatest amount of positively received games. 
When looking at the distribution from 2000 onwards, one may assert that :
- Sony has been rather constant over the years
- Nintendo had its highlights between 2001 & 2005, before a pause in 2006, and a renew during the Wii period during 2006-2011, before decreasing again, and increasing anew from  2017 onwards with the Switch
- Xbox had important peaks at its debut in 2001, and had another positive period during 2007-2010

Fig. 1 : number of games in the top 1000 MetaScore per year per consoles  

<img src=https://raw.githubusercontent.com/Binardino/Project-Week-5-Your-Own-Project/master/your-project/Charts_PNG/Evolution_Number_Records_Year_Brand.png>

When looking at the top 1000 games, in terms of metascore, several genres are more present than other :
- Action adventure is the most recurring game
- Sports, Shooter, Role Playing & Platfrom are the following genres

<img src=https://raw.githubusercontent.com/Binardino/Project-Week-5-Your-Own-Project/master/your-project/Charts_PNG/Main_Game_genre_Top1000.png>

### Brand genre per Brand

When looking at the game genre divided by brand, one may observe that each brand has different strength and genre focus.

- PlayStation is the platform having the most top 1000 games overall.
- Xbox comes second with most games belonging to Sports & Shooter category.
- Nintendo has built its reputation on Action Adevnture & Platform games.

<img src=https://raw.githubusercontent.com/Binardino/Project-Week-5-Your-Own-Project/master/your-project/Charts_PNG/Brand_Main_genre_Top1000.png>

## 3 - Challenges & learnings

I learnt quite a lot in terms of API calls management, data cleaning through Pandas and creation of custom charts with MatplotLib & Seaborn.
My main challenge was to 

Even most of my pre-work (API data & complex scatterplot creations) have not been conserved in my finale version of my work, due to its lack of pertinence and relevance, which made me pivot towards more straightforward presentation elements, I learnt a great deal on the way and 
Due to the constraint of the exercise (3 minute time presentation), I had to focus the creation and the analyze on easy to present insights and charts.

I presented more detailed analysis in this paper, notably with more in-depth charts.