# Video Games Review: Analysis
In this project, I explore [Metacritic's video games reviews](https://www.metacritic.com/browse/games/score/metascore/all/all/filtered?page=0), released between 1995-2021.

Please see the [README](https://github.com/henrylin03/video-games) for more information.


## Setup
In this section, I import all necessary libraries, setup the SQLite database, and then import the input `.csv` files in a format that allows manipulation by both `sqlite` and `pandas`:

In [10]:
# import necessary libraries
import os
import pandas as pd
from sqlalchemy import create_engine

# creating SQLite database
engine = create_engine("sqlite://", echo=False)

# import CSV as DataFrames
INPUT_PATH = "./input"
meta_df = pd.read_csv(os.path.join(INPUT_PATH, "meta.csv"))
user_df = pd.read_csv(os.path.join(INPUT_PATH, "user.csv"))


In [12]:
meta_df.to_sql("meta", engine, if_exists="replace", index=False)
pd.read_sql_query("SELECT * FROM meta LIMIT 5", engine)


Unnamed: 0,meta_score,meta_rank,name,platform,release_date,summary
0,99,1.0,The Legend of Zelda: Ocarina of Time,Platform: Nintendo 64,"November 23, 1998","As a young boy, Link is tricked by Ganondorf, ..."
1,98,2.0,Tony Hawk's Pro Skater 2,Platform: PlayStation,"September 20, 2000",As most major publishers' development efforts ...
2,98,3.0,Grand Theft Auto IV,Platform: PlayStation 3,"April 29, 2008",[Metacritic's 2008 PS3 Game of the Year; Also ...
3,98,4.0,SoulCalibur,Platform: Dreamcast,"September 8, 1999","This is a tale of souls and swords, transcendi..."
4,98,5.0,Grand Theft Auto IV,Platform: Xbox 360,"April 29, 2008",[Metacritic's 2008 Xbox 360 Game of the Year; ...


In [11]:
user_df.to_sql("user", engine, if_exists="replace", index=False)
pd.read_sql_query("SELECT * FROM user LIMIT 5", engine)


Unnamed: 0,user_score,user_rank,name,platform,release_date,summary
0,9.7,1.0,Ghost Trick: Phantom Detective,Platform: DS,"January 11, 2011",Ghost Trick is a story of mystery and intrigue...
1,9.7,2.0,Z.H.P. Unlosing Ranger vs Darkdeath Evilman,Platform: PSP,"October 25, 2010","Known as ZettaiHero Keikakuin Japan, Z.H.P. is..."
2,9.6,3.0,Superliminal,Platform: Xbox One,"July 7, 2020",Perception is reality. In this mind-bending fi...
3,9.6,4.0,Superliminal,Platform: Switch,"July 7, 2020",Perception is reality. Escape from a mind-bend...
4,9.6,5.0,Crystar,Platform: PlayStation 4,"August 27, 2019","For when I weep, then I am strong. Battle thro..."


# Cleaning


### Inspecting

Inspecting number of rows:

In [3]:
pd.read_sql_query("""SELECT COUNT(*) row_count
                     FROM vg""", engine)

Unnamed: 0,row_count
0,18800


Inspecting first 10 rows:

In [4]:
pd.read_sql_query("""SELECT *
                     FROM vg
                     LIMIT 10""", engine)

Unnamed: 0,index,name,platform,release_date,summary,meta_score,user_review
0,0,The Legend of Zelda: Ocarina of Time,Nintendo 64,"November 23, 1998","As a young boy, Link is tricked by Ganondorf, ...",99,9.1
1,1,Tony Hawk's Pro Skater 2,PlayStation,"September 20, 2000",As most major publishers' development efforts ...,98,7.4
2,2,Grand Theft Auto IV,PlayStation 3,"April 29, 2008",[Metacritic's 2008 PS3 Game of the Year; Also ...,98,7.7
3,3,SoulCalibur,Dreamcast,"September 8, 1999","This is a tale of souls and swords, transcendi...",98,8.4
4,4,Grand Theft Auto IV,Xbox 360,"April 29, 2008",[Metacritic's 2008 Xbox 360 Game of the Year; ...,98,7.9
5,5,Super Mario Galaxy,Wii,"November 12, 2007",[Metacritic's 2007 Wii Game of the Year] The u...,97,9.1
6,6,Super Mario Galaxy 2,Wii,"May 23, 2010","Super Mario Galaxy 2, the sequel to the galaxy...",97,9.1
7,7,Red Dead Redemption 2,Xbox One,"October 26, 2018",Developed by the creators of Grand Theft Auto ...,97,8.0
8,8,Grand Theft Auto V,Xbox One,"November 18, 2014",Grand Theft Auto 5 melds storytelling and game...,97,7.9
9,9,Grand Theft Auto V,PlayStation 3,"September 17, 2013","Los Santos is a vast, sun-soaked metropolis fu...",97,8.3


Inspecting `DISTINCT` values of each column:

In [5]:
for c in vg_df.columns:
    query = pd.read_sql_query(f"""SELECT DISTINCT {c} 
                                  FROM vg 
                                  ORDER BY 1""", engine)
    print(query)

                                 name
0                              #DRIVE
1                              #IDARB
2                     #KILLALLZOMBIES
3                       'Splosion Man
4                            .detuned
...                               ...
12249                            rain
12250     theHunter: Call of the Wild
12251                    uDraw Studio
12252  void tRrLM(); //Void Terrarium
12253                             xXx

[12254 rows x 1 columns]
             platform
0                 3DS
1                  DS
2           Dreamcast
3    Game Boy Advance
4            GameCube
5         Nintendo 64
6                  PC
7                 PSP
8         PlayStation
9       PlayStation 2
10      PlayStation 3
11      PlayStation 4
12      PlayStation 5
13   PlayStation Vita
14             Stadia
15             Switch
16                Wii
17              Wii U
18               Xbox
19           Xbox 360
20           Xbox One
21      Xbox Series X
           r

From above:
1. If a game is released on different `platform`s, they are duplicated in `name` and each is given a separate `meta_score` and `user_review` score 
2. `release_date` should be changed to a format that allows for `STRFTIME` (`YYYY-MM-DD`).
3. `platform` has a space prefix that should be removed.
4. `user_review` uses `'tbd'` as potentially a `NULL` value.

### Missing values

Counting the missing numbers in each column:

In [6]:
pd.read_sql_query("""SELECT SUM(CASE WHEN name IS NULL THEN 1 ELSE 0 END) name_miss,
                         SUM(CASE WHEN platform IS NULL THEN 1 ELSE 0 END) platform_miss, 
                         SUM(CASE WHEN release_date IS NULL THEN 1 ELSE 0 END) rdate_miss,
                         SUM(CASE WHEN summary IS  NULL THEN 1 ELSE 0 END) summ_miss,
                         SUM(CASE WHEN meta_score IS NULL THEN 1 ELSE 0 END) mscore_miss,
                         SUM(CASE WHEN user_review = 'tbd' OR user_review IS NULL THEN 1 ELSE 0 END) ureview_miss
                     FROM vg""", engine)

Unnamed: 0,name_miss,platform_miss,rdate_miss,summ_miss,mscore_miss,ureview_miss
0,0,0,0,114,0,1365


Selecting rows with missing values:

In [7]:
pd.read_sql_query("""SELECT *
                     FROM vg
                     WHERE summary IS NULL
                         OR user_review = 'tbd'
                         OR user_review IS NULL""", engine) 

Unnamed: 0,index,name,platform,release_date,summary,meta_score,user_review
0,679,Synth Riders,PlayStation 4,"August 10, 2021",Synth Riders is your freestyle dancing VR rhyt...,89,tbd
1,833,Injustice 2: Legendary Edition,PlayStation 4,"March 27, 2018",,88,7.6
2,963,Tiger Woods PGA Tour 2005,GameCube,"September 20, 2004",Challenge professional golfer Tiger Woods to c...,88,tbd
3,1277,NASCAR 2005: Chase for the Cup,Xbox,"August 31, 2004",Do you have what it takes to be a top NASCAR d...,86,tbd
4,1472,Moto Racer Advance,Game Boy Advance,"December 5, 2002",,86,tbd
...,...,...,...,...,...,...,...
1454,18594,Air Conflicts: Aces of World War II,PSP,"April 14, 2009",Air Conflicts is an arcade flight simulator ga...,36,tbd
1455,18639,King of Clubs,Wii,"August 4, 2008","Never the same game twice, this absorbing and ...",35,tbd
1456,18700,Jenga World Tour,DS,"November 13, 2007",Jenga is based on the world famous wooden bloc...,32,tbd
1457,18715,Dream Chronicles,PlayStation 3,"November 23, 2010",Unlock the secrets of the beautiful and myster...,31,tbd


Missing values largely in `user_review`. `'tbd'` indicates <4 user reviews [(source)](https://www.metacritic.com/faq#item13) which would be difficult to impute. 

Missing values in `summary` are also difficult to impute. 

### Changing `release_date` to `STRFTIME` format

In [8]:
vg_df['release_date_formatted'] = pd.to_datetime(vg_df.release_date, format='%B %d, %Y')
# print(vg_df.release_date)

### Removing space prefix for `platform`

In [9]:
vg_df.platform = vg_df.platform.str.lstrip()

print(vg_df.platform.unique())

['Nintendo 64' 'PlayStation' 'PlayStation 3' 'Dreamcast' 'Xbox 360' 'Wii'
 'Xbox One' 'PC' 'Switch' 'PlayStation 2' 'PlayStation 4' 'GameCube'
 'Xbox' 'Wii U' 'Game Boy Advance' '3DS' 'Xbox Series X' 'DS'
 'PlayStation Vita' 'PlayStation 5' 'PSP' 'Stadia']


In [10]:
vg_df.to_sql("vg", engine, if_exists="replace")

## Insights

### Games

#### What are the 10 most popular games?

Because each row represents a game for a platform, we will need to `GROUP BY` the game name, and take an average.

##### By Metascore
Metascore is a weighted average score of curated reviews by critics selected by Metacritic [(source)](https://www.metacritic.com/about-metascores).

In [11]:
pd.read_sql_query("""SELECT name,
                         summary,
                         AVG(1.0 * meta_score) meta_score_avg
                     FROM vg
                     GROUP BY 1
                     ORDER BY 3 DESC, 1
                     LIMIT 10""", engine) 

Unnamed: 0,name,summary,meta_score_avg
0,The Legend of Zelda: Ocarina of Time,"As a young boy, Link is tricked by Ganondorf, ...",99.0
1,Metroid Prime,Samus returns in a new mission to unravel the ...,97.0
2,NFL 2K1,"In the end, NFL 2K1 is a deeper, more refined ...",97.0
3,Super Mario Galaxy,[Metacritic's 2007 Wii Game of the Year] The u...,97.0
4,Super Mario Galaxy 2,"Super Mario Galaxy 2, the sequel to the galaxy...",97.0
5,Super Mario Odyssey,New Evolution of Mario Sandbox-Style Gameplay....,97.0
6,The House in Fata Morgana - Dreams of the Reve...,A gothic suspense tale set in a cursed mansion...,97.0
7,Grand Theft Auto V,Grand Theft Auto 5 melds storytelling and game...,96.8
8,The Legend of Zelda: Breath of the Wild,Forget everything you know about The Legend of...,96.5
9,Gran Turismo,Welcome to the most advanced racing game ever ...,96.0


##### By user reviews

In [12]:
pd.read_sql_query("""SELECT name,
                         summary,
                         ROUND(AVG(1.0 * user_review), 1) user_review_avg,
                         AVG(1.0 * meta_score) meta_score_avg
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 3 DESC, 4 DESC, 1
                     LIMIT 10""", engine) 

Unnamed: 0,name,summary,user_review_avg,meta_score_avg
0,Ghost Trick: Phantom Detective,Ghost Trick is a story of mystery and intrigue...,9.7,83.0
1,Z.H.P. Unlosing Ranger vs Darkdeath Evilman,"Known as ZettaiHero Keikakuin Japan, Z.H.P. is...",9.7,81.0
2,GrimGrimoire,Lillet Blan was very excited. Her heart had be...,9.7,79.0
3,Tengami,"Set in Japan of ancient dark fairy tales, Teng...",9.7,70.0
4,Metal Torrent,[DSiWare] Prepare for a high level of intensit...,9.7,62.0
5,Diaries of a Spaceport Janitor,Diaries of a Spaceport Janitor is an anti-adve...,9.6,69.0
6,Crystar,"For when I weep, then I am strong. Battle thro...",9.6,67.0
7,The Witcher 3: Wild Hunt,With the Empire attacking the Kingdoms of the ...,9.3,92.0
8,Astro's Playroom,Astro and his crew lead you on a magical intro...,9.3,83.0
9,The Last of Us,Twenty years after a pandemic radically transf...,9.2,95.0


#### What are the 10 least popular games?

##### By Metascore

In [13]:
pd.read_sql_query("""SELECT name,
                         summary,
                         AVG(1.0 * meta_score) meta_score_avg,
                         ROUND(AVG(1.0 * user_review), 2) user_review_avg
                     FROM vg
                     GROUP BY 1
                     ORDER BY 3, 4, 1
                     LIMIT 10""", engine) 

Unnamed: 0,name,summary,meta_score_avg,user_review_avg
0,Infestation: Survivor Stories (The War Z),"(Formerly known as ""The War Z"") It has been 5 ...",20.0,1.7
1,Afro Samurai 2: Revenge of Kuma Volume One,"Head out on a journey of redemption, driven by...",21.0,2.9
2,Fast & Furious: Showdown,Fast & Furious: Showdown takes some of the fra...,22.0,1.3
3,Drake of the 99 Dragons,Drake is out for revenge in a supernatural Hon...,22.0,1.7
4,Leisure Suit Larry: Box Office Bust,The Leisure Suit Larry: Box Office Bust video ...,22.5,2.25
5,Fighter Within,Unleash your inner fighter to beat your friend...,23.0,2.8
6,FlatOut 3: Chaos & Destruction,FlatOut 3: Chaos & Destruction brings a new di...,23.0,3.0
7,Homie Rollerz,"Homie Rollerz is a fast-paced, mayhem-laden ka...",23.0,3.0
8,Charlie's Angels,"Join Natalie, Dylan, and Alex for an intense a...",23.0,4.3
9,Pulse Racer,Pulse Racer takes you to a future where racers...,24.0,2.2


##### By user review

In [14]:
pd.read_sql_query("""SELECT name,
                         summary,
                         ROUND(AVG(user_review), 2) user_review_avg
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 3, 1
                     LIMIT 10""", engine) 

Unnamed: 0,name,summary,user_review_avg
0,Madden NFL 21,Innovative new gameplay mechanics in Madden NF...,0.35
1,Madden NFL 22,There will be more detailed staff management a...,0.55
2,Warcraft III: Reforged,"A Classic Favorite, Reforged. Warcraft III: Re...",0.6
3,FIFA 20: Legacy Edition,Enjoy more control over the Decisive Moments t...,0.7
4,The Sims 4: Star Wars - Journey to Batuu,Your Sims are definitely not at home anymore. ...,1.0
5,When Ski Lifts Go Wrong,,1.0
6,FIFA 21,"On the street and in the stadium, FIFA 21 has ...",1.07
7,Call of Duty: Modern Warfare 3 - Defiance,A sequel to the $1 billion-grossing shooter Mo...,1.2
8,FIFA 20,Enjoy more control over the Decisive Moments t...,1.3
9,Fast & Furious: Showdown,Fast & Furious: Showdown takes some of the fra...,1.3


#### What is the top game for each platform?

##### By Metascore

In [15]:
pd.read_sql_query("""SELECT platform, 
                         name,
                         MAX(meta_score) meta_score_max, 
                         user_review
                     FROM vg
                     GROUP BY 1""", engine)

Unnamed: 0,platform,name,meta_score_max,user_review
0,3DS,The Legend of Zelda: Ocarina of Time 3D,94,9.0
1,DS,Grand Theft Auto: Chinatown Wars,93,8.3
2,Dreamcast,SoulCalibur,98,8.4
3,Game Boy Advance,The Legend of Zelda: A Link to the Past,95,9.0
4,GameCube,Metroid Prime,97,8.6
5,Nintendo 64,The Legend of Zelda: Ocarina of Time,99,9.1
6,PC,Disco Elysium: The Final Cut,97,8.3
7,PSP,God of War: Chains of Olympus,91,8.6
8,PlayStation,Tony Hawk's Pro Skater 2,98,7.4
9,PlayStation 2,Tony Hawk's Pro Skater 3,97,7.5


##### By user review

In [16]:
pd.read_sql_query("""SELECT platform, 
                         name, 
                         MAX(1.0 * user_review) user_review_max,
                         meta_score
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1""", engine)

Unnamed: 0,platform,name,user_review_max,meta_score
0,3DS,The Legend of Zelda: Ocarina of Time 3D,9.0,94
1,DS,Ghost Trick: Phantom Detective,9.7,83
2,Dreamcast,Resident Evil 2,9.2,77
3,Game Boy Advance,Metroid Fusion,9.1,92
4,GameCube,Resident Evil 4,9.2,96
5,Nintendo 64,The Legend of Zelda: Ocarina of Time,9.1,99
6,PC,Diaries of a Spaceport Janitor,9.6,69
7,PSP,Z.H.P. Unlosing Ranger vs Darkdeath Evilman,9.7,81
8,PlayStation,Metal Gear Solid,9.2,94
9,PlayStation 2,GrimGrimoire,9.7,79


### Over the years

#### Which years had the most popular games released?

By both Metascore and user reviews, 1995 had the most popular games released. However, there was only one game included from 1995, so this has a high margin of error. 

Although the ordering is slightly different, the top 5 are all in the 1990s (**NB**: the data does not include games from 1990-1994 inclusive).

However, Metascores were far higher for games released in 2019-2021 than user reviews.

##### By Metascore

In [17]:
pd.read_sql_query("""SELECT STRFTIME('%Y', release_date_formatted) release_year,
                         ROUND(AVG(1.0 * meta_score), 2) meta_score_avg,
                         COUNT(meta_score) meta_score_count
                     FROM vg
                     GROUP BY 1
                     ORDER BY 2 DESC""", engine)

Unnamed: 0,release_year,meta_score_avg,meta_score_count
0,1995,86.0,1
1,1996,85.45,20
2,1997,85.11,28
3,1999,83.57,53
4,1998,82.47,45
5,2021,73.68,745
6,2019,72.84,1011
7,2017,72.41,1053
8,2020,72.33,1082
9,2000,72.16,354


##### By user reviews

In [18]:
pd.read_sql_query("""SELECT STRFTIME('%Y', release_date_formatted) release_year,
                         ROUND(AVG(1.0 * user_review), 2) user_review_avg,
                         COUNT(user_review) user_review_count
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 2 DESC""", engine)

Unnamed: 0,release_year,user_review_avg,user_review_count
0,1995,8.6,1
1,1999,8.47,52
2,1997,8.46,28
3,1998,8.41,43
4,1996,8.35,20
5,2000,7.59,309
6,2003,7.55,728
7,2001,7.54,485
8,2004,7.49,667
9,2002,7.41,666


### Platforms

#### Which platforms had the most popular games?

As above, on average, Metascores tend to rate the latest games (on Xbox Series X, and PlayStation 5) far higher than users. 

Nintendo 64 was agreed by both Metacritic and users to have the most popular games. The Dreamcast and PlayStation 1 also reached top 5 in both metrics.

##### By Metascore

In [19]:
pd.read_sql_query("""SELECT platform,
                         ROUND(AVG(1.0 * meta_score), 2) meta_score_avg,
                         COUNT(meta_score) meta_score_count
                     FROM vg
                     GROUP BY 1
                     ORDER BY 2 DESC""", engine)

Unnamed: 0,platform,meta_score_avg,meta_score_count
0,Nintendo 64,78.44,71
1,Xbox Series X,75.99,77
2,PlayStation 5,75.43,124
3,Dreamcast,74.07,125
4,PlayStation,73.34,187
5,Switch,72.52,1399
6,Wii U,72.42,184
7,Xbox One,72.4,1179
8,PC,71.8,4864
9,PlayStation Vita,71.06,257


##### By user review

In [20]:
pd.read_sql_query("""SELECT platform,
                         ROUND(AVG(1.0 * user_review), 2) user_review_avg,
                         COUNT(user_review) user_review_count
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 2 DESC""", engine)

Unnamed: 0,platform,user_review_avg,user_review_count
0,Nintendo 64,7.95,71
1,Dreamcast,7.87,119
2,PlayStation,7.72,166
3,PlayStation 2,7.53,1311
4,Game Boy Advance,7.47,349
5,GameCube,7.43,413
6,Stadia,7.38,5
7,Wii U,7.31,181
8,PlayStation Vita,7.27,251
9,PSP,7.27,464


#### Do *home* video game consoles have more popular games than *handheld*? Do *PCs* have more popular games than *home* video game consoles?

By Metascore, handheld consoles have the lowest rating games out of all 4 categoriesm, PC games edged out on home consoles.

However, contrastingly, user-reviewed scores of games are the highest rated on handheld consoles, with PCs being slightly lower than home consoles.

##### By Metascore

In [21]:
pd.read_sql_query("""SELECT 
                         CASE
                             WHEN platform IN ('3DS', 'DS', 'Game Boy Advance', 'PSP', 'PlayStation Vita')
                                 THEN 'Handheld'
                             WHEN platform = 'Switch'
                                 THEN 'Hybrid'
                             WHEN platform = 'PC' 
                                 THEN 'PC'
                             ELSE 'Home'
                             END platform_type,
                         ROUND(AVG(1.0 * meta_score), 2) meta_score_avg,
                         COUNT(meta_score) meta_score_count
                     FROM vg
                     GROUP BY 1
                     ORDER BY 2 DESC""", engine)

Unnamed: 0,platform_type,meta_score_avg,meta_score_count
0,Hybrid,72.52,1399
1,PC,71.8,4864
2,Home,70.35,10214
3,Handheld,68.44,2323


##### By user review

In [22]:
pd.read_sql_query("""SELECT 
                         CASE
                             WHEN platform IN ('3DS', 'DS', 'Game Boy Advance', 'PSP', 'PlayStation Vita') THEN 'Handheld'
                             WHEN platform = 'Switch' THEN 'Hybrid'
                             WHEN platform = 'PC' THEN 'PC'
                             ELSE 'Home'
                             END platform_type,
                         ROUND(AVG(1.0 * user_review), 2) user_review_avg,
                         COUNT(user_review) user_review_count
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 2 DESC""", engine)

Unnamed: 0,platform_type,user_review_avg,user_review_count
0,Handheld,7.23,2041
1,Hybrid,7.17,1216
2,Home,6.95,9518
3,PC,6.92,4660
