# Video Games Review: Analysis

In this project, I explore [Metacritic's video games reviews](https://www.metacritic.com/browse/games/score/metascore/all/all/filtered?page=0), released between 1995-2021.

Please see the [README](https://github.com/henrylin03/video-games) for more information.


## Setup

In this section, I import all necessary libraries, setup the SQLite database, and then import the input `.csv` files in a format that allows manipulation by both `sqlite` and `pandas`:


In [1]:
# import necessary libraries
import os
import pandas as pd
from sqlalchemy import create_engine

# creating SQLite database
engine = create_engine("sqlite://", echo=False)

# import CSV as DataFrames
INPUT_PATH = "./input"
meta_df = pd.read_csv(os.path.join(INPUT_PATH, "meta.csv"))
user_df = pd.read_csv(os.path.join(INPUT_PATH, "user.csv"))


In [2]:
meta_df.to_sql("meta", engine, if_exists="replace", index=False)
pd.read_sql_query("SELECT * FROM meta LIMIT 3", engine)


Unnamed: 0,meta_score,meta_rank,name,platform,release_date,summary
0,99,1.0,The Legend of Zelda: Ocarina of Time,Platform: Nintendo 64,"November 23, 1998","As a young boy, Link is tricked by Ganondorf, ..."
1,98,2.0,Tony Hawk's Pro Skater 2,Platform: PlayStation,"September 20, 2000",As most major publishers' development efforts ...
2,98,3.0,Grand Theft Auto IV,Platform: PlayStation 3,"April 29, 2008",[Metacritic's 2008 PS3 Game of the Year; Also ...


In [3]:
user_df.to_sql("user", engine, if_exists="replace", index=False)
pd.read_sql_query("SELECT * FROM user LIMIT 3", engine)


Unnamed: 0,user_score,user_rank,name,platform,release_date,summary
0,9.7,1.0,Ghost Trick: Phantom Detective,Platform: DS,"January 11, 2011",Ghost Trick is a story of mystery and intrigue...
1,9.7,2.0,Z.H.P. Unlosing Ranger vs Darkdeath Evilman,Platform: PSP,"October 25, 2010","Known as ZettaiHero Keikakuin Japan, Z.H.P. is..."
2,9.6,3.0,Superliminal,Platform: Xbox One,"July 7, 2020",Perception is reality. In this mind-bending fi...


## Cleaning

Before analysis, the following steps should be taken:

1. Merge the two tables. If a game does not have a `meta_score` and/or `user_score`, add `NULL` value.
2. Ensuring correct data types -- namely: `release_date` should be in datetime format, and ranking is an integer and not float.
3. Clean the `platform` column by removing the `"Platform: "` prefix


### Merging Tables

As there are some games that have a Metascore but no User Score due to insufficient reviews (and vice versa), we need to join the two tables together.

Here, we use `pandas.merge()` method as SQLite does not have `FULL OUTER JOIN`s _([SQLite Tutorial](https://www.sqlitetutorial.net/sqlite-full-outer-join/))_.


In [29]:
merged_df = meta_df.merge(user_df, how="outer", on=["name", "platform", "release_date"])

merged_df.head(3)


Unnamed: 0,meta_score,meta_rank,name,platform,release_date,summary_x,user_score,user_rank,summary_y
0,99,1.0,The Legend of Zelda: Ocarina of Time,Platform: Nintendo 64,"November 23, 1998","As a young boy, Link is tricked by Ganondorf, ...",9.1,84.0,"As a young boy, Link is tricked by Ganondorf, ..."
1,98,2.0,Tony Hawk's Pro Skater 2,Platform: PlayStation,"September 20, 2000",As most major publishers' development efforts ...,7.4,8651.0,As most major publishers' development efforts ...
2,98,3.0,Grand Theft Auto IV,Platform: PlayStation 3,"April 29, 2008",[Metacritic's 2008 PS3 Game of the Year; Also ...,7.8,5567.0,[Metacritic's 2008 PS3 Game of the Year; Also ...


In [30]:
if not merged_df[merged_df.summary_y.isna() & ~merged_df.summary_x.isna()].empty:
    print('There are rows with a "summary" in column "summary_x" and not "summary_y"')
if not merged_df[merged_df.summary_x.isna() & ~merged_df.summary_y.isna()].empty:
    print('There are rows with a "summary" in column "summary_y" and not "summary_x"')

# thus, we can safely drop the summary_y column
merged_df = merged_df.drop(columns=["summary_y"]).rename(
    columns={"summary_x": "summary"}
)
merged_df.head(3)


There are rows with a "summary" in column "summary_x" and not "summary_y"


Unnamed: 0,meta_score,meta_rank,name,platform,release_date,summary,user_score,user_rank
0,99,1.0,The Legend of Zelda: Ocarina of Time,Platform: Nintendo 64,"November 23, 1998","As a young boy, Link is tricked by Ganondorf, ...",9.1,84.0
1,98,2.0,Tony Hawk's Pro Skater 2,Platform: PlayStation,"September 20, 2000",As most major publishers' development efforts ...,7.4,8651.0
2,98,3.0,Grand Theft Auto IV,Platform: PlayStation 3,"April 29, 2008",[Metacritic's 2008 PS3 Game of the Year; Also ...,7.8,5567.0


### Ensuring correct data types

In [None]:
# reformatting release_date as datetime format

### Changing `release_date` to `STRFTIME` format


In [None]:
vg_df["release_date_formatted"] = pd.to_datetime(vg_df.release_date, format="%B %d, %Y")
# print(vg_df.release_date)


### Removing space prefix for `platform`


In [None]:
vg_df.platform = vg_df.platform.str.lstrip()

print(vg_df.platform.unique())


## Insights


### Games


#### What are the 10 most popular games?


Because each row represents a game for a platform, we will need to `GROUP BY` the game name, and take an average.


##### By Metascore

Metascore is a weighted average score of curated reviews by critics selected by Metacritic [(source)](https://www.metacritic.com/about-metascores).


In [None]:
pd.read_sql_query(
    """SELECT name,
                         summary,
                         AVG(1.0 * meta_score) meta_score_avg
                     FROM vg
                     GROUP BY 1
                     ORDER BY 3 DESC, 1
                     LIMIT 10""",
    engine,
)


##### By user reviews


In [None]:
pd.read_sql_query(
    """SELECT name,
                         summary,
                         ROUND(AVG(1.0 * user_review), 1) user_review_avg,
                         AVG(1.0 * meta_score) meta_score_avg
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 3 DESC, 4 DESC, 1
                     LIMIT 10""",
    engine,
)


#### What are the 10 least popular games?


##### By Metascore


In [None]:
pd.read_sql_query(
    """SELECT name,
                         summary,
                         AVG(1.0 * meta_score) meta_score_avg,
                         ROUND(AVG(1.0 * user_review), 2) user_review_avg
                     FROM vg
                     GROUP BY 1
                     ORDER BY 3, 4, 1
                     LIMIT 10""",
    engine,
)


##### By user review


In [None]:
pd.read_sql_query(
    """SELECT name,
                         summary,
                         ROUND(AVG(user_review), 2) user_review_avg
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 3, 1
                     LIMIT 10""",
    engine,
)


#### What is the top game for each platform?


##### By Metascore


In [None]:
pd.read_sql_query(
    """SELECT platform, 
                         name,
                         MAX(meta_score) meta_score_max, 
                         user_review
                     FROM vg
                     GROUP BY 1""",
    engine,
)


##### By user review


In [None]:
pd.read_sql_query(
    """SELECT platform, 
                         name, 
                         MAX(1.0 * user_review) user_review_max,
                         meta_score
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1""",
    engine,
)


### Over the years


#### Which years had the most popular games released?


By both Metascore and user reviews, 1995 had the most popular games released. However, there was only one game included from 1995, so this has a high margin of error.

Although the ordering is slightly different, the top 5 are all in the 1990s (**NB**: the data does not include games from 1990-1994 inclusive).

However, Metascores were far higher for games released in 2019-2021 than user reviews.


##### By Metascore


In [None]:
pd.read_sql_query(
    """SELECT STRFTIME('%Y', release_date_formatted) release_year,
                         ROUND(AVG(1.0 * meta_score), 2) meta_score_avg,
                         COUNT(meta_score) meta_score_count
                     FROM vg
                     GROUP BY 1
                     ORDER BY 2 DESC""",
    engine,
)


##### By user reviews


In [None]:
pd.read_sql_query(
    """SELECT STRFTIME('%Y', release_date_formatted) release_year,
                         ROUND(AVG(1.0 * user_review), 2) user_review_avg,
                         COUNT(user_review) user_review_count
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 2 DESC""",
    engine,
)


### Platforms


#### Which platforms had the most popular games?


As above, on average, Metascores tend to rate the latest games (on Xbox Series X, and PlayStation 5) far higher than users.

Nintendo 64 was agreed by both Metacritic and users to have the most popular games. The Dreamcast and PlayStation 1 also reached top 5 in both metrics.


##### By Metascore


In [None]:
pd.read_sql_query(
    """SELECT platform,
                         ROUND(AVG(1.0 * meta_score), 2) meta_score_avg,
                         COUNT(meta_score) meta_score_count
                     FROM vg
                     GROUP BY 1
                     ORDER BY 2 DESC""",
    engine,
)


##### By user review


In [None]:
pd.read_sql_query(
    """SELECT platform,
                         ROUND(AVG(1.0 * user_review), 2) user_review_avg,
                         COUNT(user_review) user_review_count
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 2 DESC""",
    engine,
)


#### Do _home_ video game consoles have more popular games than _handheld_? Do _PCs_ have more popular games than _home_ video game consoles?


By Metascore, handheld consoles have the lowest rating games out of all 4 categoriesm, PC games edged out on home consoles.

However, contrastingly, user-reviewed scores of games are the highest rated on handheld consoles, with PCs being slightly lower than home consoles.


##### By Metascore


In [None]:
pd.read_sql_query(
    """SELECT 
                         CASE
                             WHEN platform IN ('3DS', 'DS', 'Game Boy Advance', 'PSP', 'PlayStation Vita')
                                 THEN 'Handheld'
                             WHEN platform = 'Switch'
                                 THEN 'Hybrid'
                             WHEN platform = 'PC' 
                                 THEN 'PC'
                             ELSE 'Home'
                             END platform_type,
                         ROUND(AVG(1.0 * meta_score), 2) meta_score_avg,
                         COUNT(meta_score) meta_score_count
                     FROM vg
                     GROUP BY 1
                     ORDER BY 2 DESC""",
    engine,
)


##### By user review


In [None]:
pd.read_sql_query(
    """SELECT 
                         CASE
                             WHEN platform IN ('3DS', 'DS', 'Game Boy Advance', 'PSP', 'PlayStation Vita') THEN 'Handheld'
                             WHEN platform = 'Switch' THEN 'Hybrid'
                             WHEN platform = 'PC' THEN 'PC'
                             ELSE 'Home'
                             END platform_type,
                         ROUND(AVG(1.0 * user_review), 2) user_review_avg,
                         COUNT(user_review) user_review_count
                     FROM vg
                     WHERE user_review != 'tbd'
                     GROUP BY 1
                     ORDER BY 2 DESC""",
    engine,
)
