# Metacritic Video Games Reviews: Analysis

In this project, I explore [Metacritic's video games reviews](https://www.metacritic.com/browse/games/score/metascore/all/all/filtered?page=0), released between 1995-2021.

Please see the [README](https://github.com/henrylin03/video-games) for more information.

### Data Source

Scraped and uploaded by Deep Contractor on [Kaggle](https://www.kaggle.com/deepcontractor/top-video-games-19952021-metacritic).


## Setup

In this section, I import all necessary libraries, setup the SQLite database, and then import the input `.csv` file in a format that allows manipulation by both `sqlite` and `pandas`:


In [19]:
# import necessary libraries
import pandas as pd
from sqlalchemy import create_engine

# creating SQLite database
engine = create_engine("sqlite://", echo=False)

# import CSV as DataFrames
print("Loading input file...", end=" ", flush=True)
vg_df = pd.read_csv("input/all_games.csv")
vg_df.to_sql("vg", engine, if_exists="replace")

# inspect table
if vg_df.empty:
    raise Exception("Input file CSV is empty.")
print("✅ Done")
display(vg_df.head())


Loading input file... ✅ Done


Unnamed: 0,name,platform,release_date,summary,meta_score,user_review
0,The Legend of Zelda: Ocarina of Time,Nintendo 64,"November 23, 1998","As a young boy, Link is tricked by Ganondorf, ...",99,9.1
1,Tony Hawk's Pro Skater 2,PlayStation,"September 20, 2000",As most major publishers' development efforts ...,98,7.4
2,Grand Theft Auto IV,PlayStation 3,"April 29, 2008",[Metacritic's 2008 PS3 Game of the Year; Also ...,98,7.7
3,SoulCalibur,Dreamcast,"September 8, 1999","This is a tale of souls and swords, transcendi...",98,8.4
4,Grand Theft Auto IV,Xbox 360,"April 29, 2008",[Metacritic's 2008 Xbox 360 Game of the Year; ...,98,7.9


## Cleaning

Before insights drawn in analysis, the following activities should be completed:

1. `release_date` column should be reformatted to datetime.
2. Remove all leading and trailing spaces from the table.

For completeness, the following were identified, but no actions will be taken:

1. A single game may have multiple entries if they were released on multiple platforms. These are not true "duplicates" and require NFA. (incorrect)
2. Some `user_review` values have `tbd` values. These appear if there are [<4 reviews](https://www.metacritic.com/faq#item13) for the game.
3. Some game summaries (`summary` column) have missing values. These are difficult to impute and are ignored.


### Reformat `release_date` column

As SQLite does not support month names ([StackOverflow](https://stackoverflow.com/questions/1181123/date-formatting-from-sqlite-query)) as is found in the `release_date` column, we use `pandas`' [`.to_datetime()`](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) method instead:


In [20]:
vg_df.release_date = pd.to_datetime(vg_df.release_date, format="%B %d, %Y")

# update `vg` SQL table and inspect
vg_df.to_sql("vg", engine, if_exists="replace", index=False)
pd.read_sql_query("SELECT * FROM vg LIMIT 3", engine)


Unnamed: 0,name,platform,release_date,summary,meta_score,user_review
0,The Legend of Zelda: Ocarina of Time,Nintendo 64,1998-11-23 00:00:00.000000,"As a young boy, Link is tricked by Ganondorf, ...",99,9.1
1,Tony Hawk's Pro Skater 2,PlayStation,2000-09-20 00:00:00.000000,As most major publishers' development efforts ...,98,7.4
2,Grand Theft Auto IV,PlayStation 3,2008-04-29 00:00:00.000000,[Metacritic's 2008 PS3 Game of the Year; Also ...,98,7.7


### Remove leading and trailing spaces

As SQLite's [`TRIM()`](https://www.sqlite.org/lang_corefunc.html#trim) function only allows the removal of trailing spaces in single columns ([StackOverflow](https://stackoverflow.com/questions/41816797/sqlite-trim-same-character-multiple-columns)), for efficiency, I use Python's `.strip()` method to remove whitespaces at the start and end across the DataFrame.

As `.strip()` is a string function, to avoid `AttributeError`, we only apply this to the cells that are strings.


In [21]:
vg_df = vg_df.applymap(
    lambda x: x.strip().replace(r"^\t+", "") if isinstance(x, str) else x
)

# update `vg` SQL table and inspect
vg_df.to_sql("vg", engine, if_exists="replace", index=False)
vg_df.platform.unique()


array(['Nintendo 64', 'PlayStation', 'PlayStation 3', 'Dreamcast',
       'Xbox 360', 'Wii', 'Xbox One', 'PC', 'Switch', 'PlayStation 2',
       'PlayStation 4', 'GameCube', 'Xbox', 'Wii U', 'Game Boy Advance',
       '3DS', 'Xbox Series X', 'DS', 'PlayStation Vita', 'PlayStation 5',
       'PSP', 'Stadia'], dtype=object)

### Other data inspections

For completeness, the data was also inspected for missing and duplicate values. However, as noted above, no actions are taken for the purposes of analyses.


#### Missing values


In [22]:
# counting missing values in each column
pd.read_sql_query(
    """
                    SELECT SUM(CASE WHEN name IS NULL THEN 1 ELSE 0 END) name_miss,
                        SUM(CASE WHEN platform IS NULL THEN 1 ELSE 0 END) platform_miss, 
                        SUM(CASE WHEN release_date IS NULL THEN 1 ELSE 0 END) rdate_miss,
                        SUM(CASE WHEN summary IS  NULL THEN 1 ELSE 0 END) summ_miss,
                        SUM(CASE WHEN meta_score IS NULL THEN 1 ELSE 0 END) meta_score_miss,
                        SUM(CASE WHEN user_review = 'tbd' OR user_review IS NULL THEN 1 ELSE 0 END) user_review_miss
                    FROM vg""",
    engine,
)


Unnamed: 0,name_miss,platform_miss,rdate_miss,summ_miss,meta_score_miss,user_review_miss
0,0,0,0,114,0,1365


In [23]:
# selecting rows with missing values in 1+ columns identified above
pd.read_sql_query(
    """
                    SELECT *
                    FROM vg
                    WHERE summary IS NULL
                        OR user_review = 'tbd'
                        OR user_review IS NULL""",
    engine,
)


Unnamed: 0,name,platform,release_date,summary,meta_score,user_review
0,Synth Riders,PlayStation 4,2021-08-10 00:00:00.000000,Synth Riders is your freestyle dancing VR rhyt...,89,tbd
1,Injustice 2: Legendary Edition,PlayStation 4,2018-03-27 00:00:00.000000,,88,7.6
2,Tiger Woods PGA Tour 2005,GameCube,2004-09-20 00:00:00.000000,Challenge professional golfer Tiger Woods to c...,88,tbd
3,NASCAR 2005: Chase for the Cup,Xbox,2004-08-31 00:00:00.000000,Do you have what it takes to be a top NASCAR d...,86,tbd
4,Moto Racer Advance,Game Boy Advance,2002-12-05 00:00:00.000000,,86,tbd
...,...,...,...,...,...,...
1454,Air Conflicts: Aces of World War II,PSP,2009-04-14 00:00:00.000000,Air Conflicts is an arcade flight simulator ga...,36,tbd
1455,King of Clubs,Wii,2008-08-04 00:00:00.000000,"Never the same game twice, this absorbing and ...",35,tbd
1456,Jenga World Tour,DS,2007-11-13 00:00:00.000000,Jenga is based on the world famous wooden bloc...,32,tbd
1457,Dream Chronicles,PlayStation 3,2010-11-23 00:00:00.000000,Unlock the secrets of the beautiful and myster...,31,tbd


#### Duplicate values

In the following, we check if each row's game name (`name`) and `summary` columns have any duplicates. The other columns are not expected to be unique.


In [24]:
for col in ["name", "summary"]:
    temp_dup_df = pd.read_sql_query(
        f"""
        SELECT {col}, 
            COUNT(*) row_counts
        FROM vg
        GROUP BY 1 
        HAVING row_counts > 1
        ORDER BY 2 DESC""",
        engine,
    )
    display(col, temp_dup_df)


'name'

Unnamed: 0,name,row_counts
0,Madden NFL 07,9
1,Cars,9
2,Madden NFL 06,8
3,X-Men: The Official Game,7
4,Tiger Woods PGA Tour 07,7
...,...,...
3828,1942: Joint Strike,2
3829,187 Ride or Die,2
3830,10 Second Ninja X,2
3831,007 Legends,2


'summary'

Unnamed: 0,summary,row_counts
0,,114
1,The game involves players creating and destroy...,7
2,Need for Speed ProStreet accelerates street ra...,7
3,LEGO Indiana Jones: The Original Adventures ta...,7
4,"Iron Man, an explosive third-person action sho...",7
...,...,...
2726,'DARK SOULS II Crown of the Ivory King' is the...,2
2727,"""Street Fighter"" is best known for its well-po...",2
2728,"""Our goal was to create a game that is perfect...",2
2729,"""Only the possibility of you can change our fa...",2


In [25]:
# join other columns from the original table to review cause of duplication
pd.read_sql_query(
    """
                WITH duplicate_names AS (
                    SELECT name, 
                        COUNT(*) row_counts
                    FROM vg
                    GROUP BY 1 
                    HAVING row_counts > 1
                )
                
                SELECT d.name,
                    vg.platform,
                    vg.release_date,
                    vg.summary,
                    vg.meta_score,
                    vg.user_review
                FROM duplicate_names d
                JOIN vg
                    USING (name)
                ORDER BY 1
                """,
    engine,
)


Unnamed: 0,name,platform,release_date,summary,meta_score,user_review
0,.hack//G.U. Last Recode,PlayStation 4,2017-11-03 00:00:00.000000,.hack//G.U. is back. Log back into the .hack//...,76,8.2
1,.hack//G.U. Last Recode,PC,2017-11-03 00:00:00.000000,.hack//G.U. is back! Log back into the .hack//...,69,8.1
2,007 Legends,PlayStation 2,2012-10-16 00:00:00.000000,"007 Legends features an original, overarching ...",45,4.5
3,007 Legends,Wii,2012-10-16 00:00:00.000000,Each of the 007 Legends movie-inspired mission...,41,4.1
4,007: Quantum of Solace,PC,2008-11-04 00:00:00.000000,Introducing a more lethal and cunningly effici...,70,6.3
...,...,...,...,...,...,...
10374,kill.switch,PlayStation 2,2003-10-28 00:00:00.000000,"In a world on the brink of global conflict, yo...",73,8.2
10375,kill.switch,PC,2004-03-30 00:00:00.000000,"In a world on the brink of global conflict, yo...",66,6.9
10376,nail'd,PC,2010-11-30 00:00:00.000000,nail’d is all about eschewing boring realism f...,69,7.6
10377,nail'd,PlayStation 3,2010-11-30 00:00:00.000000,nail’d is all about eschewing boring realism f...,66,6.2


The `name` column's duplicates are caused by the game released on different platforms (`platform` column). Checking if these two columns have any duplicates in the original table (`vg`) yields no results:


In [26]:
pd.read_sql_query(
    """
        SELECT name,
            platform,
            COUNT(*) as row_counts
        FROM vg
        GROUP BY 1, 2
        HAVING row_counts > 1
                """,
    engine,
)


Unnamed: 0,name,platform,row_counts
0,18 Wheeler: American Pro Trucker,PlayStation 2,2
1,4x4 EVO 2,PC,2
2,ADR1FT,PC,2
3,Agatha Christie: The ABC Murders,PC,2
4,Air Conflicts: Secret Wars,Xbox 360,2
...,...,...,...
100,X-Blades,Xbox One,2
101,X-Men Origins: Wolverine,PlayStation 3,2
102,X-Men: The Official Game,PlayStation 4,2
103,X-Men: The Official Game,Xbox 360,2


In [41]:
vg_df[(vg_df.name == "X-Men: The Official Game") & (vg_df.platform == "PlayStation 4")]


Unnamed: 0,name,platform,release_date,summary,meta_score,user_review
17120,X-Men: The Official Game,PlayStation 4,2006-05-16,Master the power of the X-Men--Obliterate enem...,53,8.1
17292,X-Men: The Official Game,PlayStation 4,2006-05-16,Master the power of the X-Men—Obliterate enemi...,52,6.0


## Insights


### Games


#### What are the 10 most popular games?


Because each row represents a game for a platform, we will need to `GROUP BY` the game name, and take an average.


##### By Metascore

Metascore is a weighted average score of curated reviews by critics selected by Metacritic [(source)](https://www.metacritic.com/about-metascores).


In [28]:
pd.read_sql_query(
    """SELECT name,
                         summary,
                         AVG(1.0 * meta_score) meta_score_avg
                     FROM vg
                     GROUP BY 1
                     ORDER BY 3 DESC, 1
                     LIMIT 10""",
    engine,
)


Unnamed: 0,name,summary,meta_score_avg
0,The Legend of Zelda: Ocarina of Time,"As a young boy, Link is tricked by Ganondorf, ...",99.0
1,Metroid Prime,Samus returns in a new mission to unravel the ...,97.0
2,NFL 2K1,"In the end, NFL 2K1 is a deeper, more refined ...",97.0
3,Super Mario Galaxy,[Metacritic's 2007 Wii Game of the Year] The u...,97.0
4,Super Mario Galaxy 2,"Super Mario Galaxy 2, the sequel to the galaxy...",97.0
5,Super Mario Odyssey,New Evolution of Mario Sandbox-Style Gameplay....,97.0
6,The House in Fata Morgana - Dreams of the Reve...,A gothic suspense tale set in a cursed mansion...,97.0
7,Grand Theft Auto V,Grand Theft Auto 5 melds storytelling and game...,96.8
8,The Legend of Zelda: Breath of the Wild,Forget everything you know about The Legend of...,96.5
9,Gran Turismo,Welcome to the most advanced racing game ever ...,96.0


##### By user reviews


In [29]:
pd.read_sql_query(
    """SELECT name,
                         summary,
                         ROUND(AVG(1.0 * user_review), 1) user_review_avg,
                         AVG(1.0 * meta_score) meta_score_avg
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 3 DESC, 4 DESC, 1
                     LIMIT 10""",
    engine,
)


Unnamed: 0,name,summary,user_review_avg,meta_score_avg
0,Ghost Trick: Phantom Detective,Ghost Trick is a story of mystery and intrigue...,9.7,83.0
1,Z.H.P. Unlosing Ranger vs Darkdeath Evilman,"Known as ZettaiHero Keikakuin Japan, Z.H.P. is...",9.7,81.0
2,GrimGrimoire,Lillet Blan was very excited. Her heart had be...,9.7,79.0
3,Tengami,"Set in Japan of ancient dark fairy tales, Teng...",9.7,70.0
4,Metal Torrent,[DSiWare] Prepare for a high level of intensit...,9.7,62.0
5,Diaries of a Spaceport Janitor,Diaries of a Spaceport Janitor is an anti-adve...,9.6,69.0
6,Crystar,"For when I weep, then I am strong. Battle thro...",9.6,67.0
7,The Witcher 3: Wild Hunt,With the Empire attacking the Kingdoms of the ...,9.3,92.0
8,Astro's Playroom,Astro and his crew lead you on a magical intro...,9.3,83.0
9,The Last of Us,Twenty years after a pandemic radically transf...,9.2,95.0


#### What are the 10 least popular games?


##### By Metascore


In [30]:
pd.read_sql_query(
    """SELECT name,
                         summary,
                         AVG(1.0 * meta_score) meta_score_avg,
                         ROUND(AVG(1.0 * user_review), 2) user_review_avg
                     FROM vg
                     GROUP BY 1
                     ORDER BY 3, 4, 1
                     LIMIT 10""",
    engine,
)


Unnamed: 0,name,summary,meta_score_avg,user_review_avg
0,Infestation: Survivor Stories (The War Z),"(Formerly known as ""The War Z"") It has been 5 ...",20.0,1.7
1,Afro Samurai 2: Revenge of Kuma Volume One,"Head out on a journey of redemption, driven by...",21.0,2.9
2,Fast & Furious: Showdown,Fast & Furious: Showdown takes some of the fra...,22.0,1.3
3,Drake of the 99 Dragons,Drake is out for revenge in a supernatural Hon...,22.0,1.7
4,Leisure Suit Larry: Box Office Bust,The Leisure Suit Larry: Box Office Bust video ...,22.5,2.25
5,Fighter Within,Unleash your inner fighter to beat your friend...,23.0,2.8
6,FlatOut 3: Chaos & Destruction,FlatOut 3: Chaos & Destruction brings a new di...,23.0,3.0
7,Homie Rollerz,"Homie Rollerz is a fast-paced, mayhem-laden ka...",23.0,3.0
8,Charlie's Angels,"Join Natalie, Dylan, and Alex for an intense a...",23.0,4.3
9,Pulse Racer,Pulse Racer takes you to a future where racers...,24.0,2.2


##### By user review


In [31]:
pd.read_sql_query(
    """SELECT name,
                         summary,
                         ROUND(AVG(user_review), 2) user_review_avg
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 3, 1
                     LIMIT 10""",
    engine,
)


Unnamed: 0,name,summary,user_review_avg
0,Madden NFL 21,Innovative new gameplay mechanics in Madden NF...,0.35
1,Madden NFL 22,There will be more detailed staff management a...,0.55
2,Warcraft III: Reforged,"A Classic Favorite, Reforged. Warcraft III: Re...",0.6
3,FIFA 20: Legacy Edition,Enjoy more control over the Decisive Moments t...,0.7
4,The Sims 4: Star Wars - Journey to Batuu,Your Sims are definitely not at home anymore. ...,1.0
5,When Ski Lifts Go Wrong,,1.0
6,FIFA 21,"On the street and in the stadium, FIFA 21 has ...",1.07
7,Call of Duty: Modern Warfare 3 - Defiance,A sequel to the $1 billion-grossing shooter Mo...,1.2
8,FIFA 20,Enjoy more control over the Decisive Moments t...,1.3
9,Fast & Furious: Showdown,Fast & Furious: Showdown takes some of the fra...,1.3


#### What is the top game for each platform?


##### By Metascore


In [32]:
pd.read_sql_query(
    """SELECT platform, 
                         name,
                         MAX(meta_score) meta_score_max, 
                         user_review
                     FROM vg
                     GROUP BY 1""",
    engine,
)


Unnamed: 0,platform,name,meta_score_max,user_review
0,3DS,The Legend of Zelda: Ocarina of Time 3D,94,9.0
1,DS,Grand Theft Auto: Chinatown Wars,93,8.3
2,Dreamcast,SoulCalibur,98,8.4
3,Game Boy Advance,The Legend of Zelda: A Link to the Past,95,9.0
4,GameCube,Metroid Prime,97,8.6
5,Nintendo 64,The Legend of Zelda: Ocarina of Time,99,9.1
6,PC,Disco Elysium: The Final Cut,97,8.3
7,PSP,God of War: Chains of Olympus,91,8.6
8,PlayStation,Tony Hawk's Pro Skater 2,98,7.4
9,PlayStation 2,Tony Hawk's Pro Skater 3,97,7.5


##### By user review


In [33]:
pd.read_sql_query(
    """SELECT platform, 
                         name, 
                         MAX(1.0 * user_review) user_review_max,
                         meta_score
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1""",
    engine,
)


Unnamed: 0,platform,name,user_review_max,meta_score
0,3DS,The Legend of Zelda: Ocarina of Time 3D,9.0,94
1,DS,Ghost Trick: Phantom Detective,9.7,83
2,Dreamcast,Resident Evil 2,9.2,77
3,Game Boy Advance,Metroid Fusion,9.1,92
4,GameCube,Resident Evil 4,9.2,96
5,Nintendo 64,The Legend of Zelda: Ocarina of Time,9.1,99
6,PC,Diaries of a Spaceport Janitor,9.6,69
7,PSP,Z.H.P. Unlosing Ranger vs Darkdeath Evilman,9.7,81
8,PlayStation,Metal Gear Solid,9.2,94
9,PlayStation 2,GrimGrimoire,9.7,79


### Over the years


#### Which years had the most popular games released?


By both Metascore and user reviews, 1995 had the most popular games released. However, there was only one game included from 1995, so this has a high margin of error.

Although the ordering is slightly different, the top 5 are all in the 1990s (**NB**: the data does not include games from 1990-1994 inclusive).

However, Metascores were far higher for games released in 2019-2021 than user reviews.


##### By Metascore


In [34]:
pd.read_sql_query(
    """SELECT STRFTIME('%Y', release_date_formatted) release_year,
                         ROUND(AVG(1.0 * meta_score), 2) meta_score_avg,
                         COUNT(meta_score) meta_score_count
                     FROM vg
                     GROUP BY 1
                     ORDER BY 2 DESC""",
    engine,
)


OperationalError: (sqlite3.OperationalError) no such column: release_date_formatted
[SQL: SELECT STRFTIME('%Y', release_date_formatted) release_year,
                         ROUND(AVG(1.0 * meta_score), 2) meta_score_avg,
                         COUNT(meta_score) meta_score_count
                     FROM vg
                     GROUP BY 1
                     ORDER BY 2 DESC]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

##### By user reviews


In [None]:
pd.read_sql_query(
    """SELECT STRFTIME('%Y', release_date_formatted) release_year,
                         ROUND(AVG(1.0 * user_review), 2) user_review_avg,
                         COUNT(user_review) user_review_count
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 2 DESC""",
    engine,
)


### Platforms


#### Which platforms had the most popular games?


As above, on average, Metascores tend to rate the latest games (on Xbox Series X, and PlayStation 5) far higher than users.

Nintendo 64 was agreed by both Metacritic and users to have the most popular games. The Dreamcast and PlayStation 1 also reached top 5 in both metrics.


##### By Metascore


In [None]:
pd.read_sql_query(
    """SELECT platform,
                         ROUND(AVG(1.0 * meta_score), 2) meta_score_avg,
                         COUNT(meta_score) meta_score_count
                     FROM vg
                     GROUP BY 1
                     ORDER BY 2 DESC""",
    engine,
)


##### By user review


In [None]:
pd.read_sql_query(
    """SELECT platform,
                         ROUND(AVG(1.0 * user_review), 2) user_review_avg,
                         COUNT(user_review) user_review_count
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 2 DESC""",
    engine,
)


#### Do _home_ video game consoles have more popular games than _handheld_? Do _PCs_ have more popular games than _home_ video game consoles?


By Metascore, handheld consoles have the lowest rating games out of all 4 categoriesm, PC games edged out on home consoles.

However, contrastingly, user-reviewed scores of games are the highest rated on handheld consoles, with PCs being slightly lower than home consoles.


##### By Metascore


In [None]:
pd.read_sql_query(
    """SELECT 
                         CASE
                             WHEN platform IN ('3DS', 'DS', 'Game Boy Advance', 'PSP', 'PlayStation Vita')
                                 THEN 'Handheld'
                             WHEN platform = 'Switch'
                                 THEN 'Hybrid'
                             WHEN platform = 'PC' 
                                 THEN 'PC'
                             ELSE 'Home'
                             END platform_type,
                         ROUND(AVG(1.0 * meta_score), 2) meta_score_avg,
                         COUNT(meta_score) meta_score_count
                     FROM vg
                     GROUP BY 1
                     ORDER BY 2 DESC""",
    engine,
)


##### By user review


In [None]:
pd.read_sql_query(
    """SELECT 
                         CASE
                             WHEN platform IN ('3DS', 'DS', 'Game Boy Advance', 'PSP', 'PlayStation Vita') THEN 'Handheld'
                             WHEN platform = 'Switch' THEN 'Hybrid'
                             WHEN platform = 'PC' THEN 'PC'
                             ELSE 'Home'
                             END platform_type,
                         ROUND(AVG(1.0 * user_review), 2) user_review_avg,
                         COUNT(user_review) user_review_count
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 2 DESC""",
    engine,
)
