# Nintendo Game Ratings: Association Rule Mining

This ipynb serves as the second part of this Nintendo-Game-Ratings project, covering association rule mining for all games in the Nintendo dataset. This notebook's first section will mostly consist of data munging, so if you are interested in getting right to the results please skip to section 2.

### Section 1: Dataset Cleanup

Closer inspection of this dataset's ARM-relevant features reveals a few minor issues: repeated words in genre entries, missing values, and truncated developer names. Examples below.

In [1]:
# import of csv
import pandas as pd

df_ratings = pd.read_csv('../nintendo_ratings.csv')
df_ratings.head()

Unnamed: 0,meta_score,title,platform,date,user_score,link,esrb_rating,developers,genres
0,,Super Mario 3D World + Bowser's Fury,Switch,"Feb 12, 2021",,/game/switch/super-mario-3d-world-+-bowsers-fury,,['Nintendo'],"['Action', 'Platformer', '3D']"
1,,Super Smash Bros. Ultimate: Sephiroth,Switch,"Dec 22, 2020",,/game/switch/super-smash-bros-ultimate-sephiroth,,['Nintendo'],"['Action', '2D', 'Fighting']"
2,66.0,Fitness Boxing 2: Rhythm & Exercise,Switch,"Dec 4, 2020",6.2,/game/switch/fitness-boxing-2-rhythm-exercise,E,"['Nintendo', ' Imagineer Co.', 'Ltd.']","['Miscellaneous', 'Exercise / Fitness']"
3,63.0,Fire Emblem: Shadow Dragon & the Blade of Light,Switch,"Dec 4, 2020",7.6,/game/switch/fire-emblem-shadow-dragon-the-bla...,E,['Intelligent Systems'],"['Strategy', 'Turn-Based', 'Tactics']"
4,79.0,Hyrule Warriors: Age of Calamity,Switch,"Nov 20, 2020",8.1,/game/switch/hyrule-warriors-age-of-calamity,T,"['Omega Force', ' Koei Tecmo Games']","['Action', ""Beat-'Em-Up"", '3D']"


In [2]:
# repeated genres
print(f'Title: {df_ratings.loc[302, "title"]}\nGenres: {df_ratings.loc[302, "genres"]}')

Title: Super Smash Bros. for Wii U
Genres: ['Action', 'Fighting', 'Fighting', '3D', '2D', '2D', '3D']


Note 'Fighting', '2D', and '3D' all appear twice in this list.

In [16]:
# missing values
print(f'Missing counts:\n     ESRB Ratings: {df_ratings.esrb_rating.isnull().sum()}\n     Developers: {df_ratings.developers.isnull().sum()}\n     Genres: {df_ratings.genres.isnull().sum()}')

Missing counts:
     ESRB Ratings: 106
     Developers: 0
     Genres: 0


Rows missing developers and / or genres will be dropped, while rows without esrb ratings will be ignored.

In [4]:
# truncated developers
print(f'Title: {df_ratings.loc[2, "title"]}\nDevelopers: {df_ratings.loc[2, "developers"]}')

Title: Fitness Boxing 2: Rhythm & Exercise
Developers: ['Nintendo', ' Imagineer Co.', 'Ltd.']


This list should only have two items, 'Nintendo' and 'Imagineer Co. Ltd.'; the last entry was erroneously split.

In [5]:
# dropping missing developers / genres
temp = df_ratings.shape
df_ratings.dropna(axis = 0, how = 'any', subset = ['developers', 'genres'], inplace = True)
df_ratings.reset_index(drop = True, inplace = True)
print(f'Old shape: {temp}\nNew shape: {df_ratings.shape}\nDropped rows: {temp[0] - df_ratings.shape[0]}')

Old shape: (1026, 9)
New shape: (1014, 9)
Dropped rows: 12


In [6]:
# de-duping genres
df_ratings.genres = df_ratings.apply(lambda x: str(list(set(eval(x.genres)))), axis = 1)

In [9]:
# importing cleaned df
df_ratings = pd.read_csv('../ratings_clean.csv')

nan

### Section 2: Association Rule Mining