# Recommendation System
# Rekomendasi Game dengan teknik *content-based filtering*

*Oleh: [Rifqi Novandi](https://github.com/rifqinvnd)*

## Latar Belakang
Pada proyek machine learning ini, akan dibuat model sistem rekomendasi untuk memprediksi game yang disukai berdasarkan game lain yang memiliki kesamaan serupa atau dengan menggunakan teknik *content-based filtering* dengan beberapa variabel seperti platform, tahun rilis, genre, dll.

## 1. Menginstall dan mengimpor library yang dibutuhkan

In [None]:
# menginstall library yang dibutuhkan
!pip install -U scikit-learn

Collecting scikit-learn
  Downloading scikit_learn-1.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (23.1 MB)
[K     |████████████████████████████████| 23.1 MB 1.5 MB/s 
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-3.0.0-py3-none-any.whl (14 kB)
Installing collected packages: threadpoolctl, scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 0.22.2.post1
    Uninstalling scikit-learn-0.22.2.post1:
      Successfully uninstalled scikit-learn-0.22.2.post1
Successfully installed scikit-learn-1.0 threadpoolctl-3.0.0


In [1]:
# menggunakan library os, zipfile untuk menyiapkan dataset
import os
import zipfile

# library untuk pengolahan data
import pandas as pd
import numpy as np
from collections import Counter
from sklearn.preprocessing import MinMaxScaler

# library untuk membuat model sistem rekomendasi
from sklearn.neighbors import NearestNeighbors
from sklearn.metrics.pairwise import cosine_similarity

# Untuk evaluasi sistem rekomendasi
from sklearn.metrics import calinski_harabasz_score, davies_bouldin_score

## 2. Mempersiapkan Dataset

### 2.1 Menyiapkan userame dan key akun Kaggle

In [2]:
# menyiapkan kredensial environment Kaggle
os.environ['KAGGLE_USERNAME'] = 'rifqinovandi'
os.environ['KAGGLE_KEY'] = '03877f2b4798e3c6def8f76fd42e3070'

### 2.2 Mengunduh dan mempersiapkan dataset

In [3]:
# mengunduh dataset dengan Kaggle CLI
!kaggle datasets download -d rush4ratio/video-game-sales-with-ratings

Downloading video-game-sales-with-ratings.zip to /content
  0% 0.00/476k [00:00<?, ?B/s]
100% 476k/476k [00:00<00:00, 32.2MB/s]


In [4]:
# mengekstaksi berkas zip ke CWD
files = "/content/video-game-sales-with-ratings.zip"
zip = zipfile.ZipFile(files, 'r')
zip.extractall('/content')
zip.close()

## 3. Pemahaman Data (*Data Understanding*)

### 3.1 Membaca data dengan pandas DataFrame

In [5]:
df = pd.read_csv(files)
df.head()

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
0,Wii Sports,Wii,2006.0,Sports,Nintendo,41.36,28.96,3.77,8.45,82.53,76.0,51.0,8.0,322.0,Nintendo,E
1,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24,,,,,,
2,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.68,12.76,3.79,3.29,35.52,82.0,73.0,8.3,709.0,Nintendo,E
3,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.61,10.93,3.28,2.95,32.77,80.0,73.0,8.0,192.0,Nintendo,E
4,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37,,,,,,


## 3.2 Memahami isi keseluruhan dataset

In [6]:
# mengecek shape dari  DataFrame
df.shape

(16719, 16)

In [7]:
# melihat info dataset yang digunakan
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16719 entries, 0 to 16718
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             16717 non-null  object 
 1   Platform         16719 non-null  object 
 2   Year_of_Release  16450 non-null  float64
 3   Genre            16717 non-null  object 
 4   Publisher        16665 non-null  object 
 5   NA_Sales         16719 non-null  float64
 6   EU_Sales         16719 non-null  float64
 7   JP_Sales         16719 non-null  float64
 8   Other_Sales      16719 non-null  float64
 9   Global_Sales     16719 non-null  float64
 10  Critic_Score     8137 non-null   float64
 11  Critic_Count     8137 non-null   float64
 12  User_Score       10015 non-null  object 
 13  User_Count       7590 non-null   float64
 14  Developer        10096 non-null  object 
 15  Rating           9950 non-null   object 
dtypes: float64(9), object(7)
memory usage: 2.0+ MB


In [8]:
# melihat jumlah data kosong pada setiap kolom
df.isna().sum()

Name                  2
Platform              0
Year_of_Release     269
Genre                 2
Publisher            54
NA_Sales              0
EU_Sales              0
JP_Sales              0
Other_Sales           0
Global_Sales          0
Critic_Score       8582
Critic_Count       8582
User_Score         6704
User_Count         9129
Developer          6623
Rating             6769
dtype: int64

In [9]:
# mendeskripsikan setiap kolom dataset
df.describe()

Unnamed: 0,Year_of_Release,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Count
count,16450.0,16719.0,16719.0,16719.0,16719.0,16719.0,8137.0,8137.0,7590.0
mean,2006.487356,0.26333,0.145025,0.077602,0.047332,0.533543,68.967679,26.360821,162.229908
std,5.878995,0.813514,0.503283,0.308818,0.18671,1.547935,13.938165,18.980495,561.282326
min,1980.0,0.0,0.0,0.0,0.0,0.01,13.0,3.0,4.0
25%,2003.0,0.0,0.0,0.0,0.0,0.06,60.0,12.0,10.0
50%,2007.0,0.08,0.02,0.0,0.01,0.17,71.0,21.0,24.0
75%,2010.0,0.24,0.11,0.04,0.03,0.47,79.0,36.0,81.0
max,2020.0,41.36,28.96,10.22,10.57,82.53,98.0,113.0,10665.0


## 4. Mempersiapkan Data (*Data Preparation*)

### 4.1 Membuang kolom yang memiliki banyak *missing-value*

In [10]:
# membuang kolom dengan missing value yang tinggi
df.drop(['Global_Sales', 'Critic_Score', 'Critic_Count', 'User_Count'], axis=1, inplace=True)

### 4.2 Membersihkan data setiap kolom

#### 4.2.1 Kolom Name 

In [11]:
# mengecek missing value kolom name
df[df['Name'].isna()]

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,User_Score,Developer,Rating
659,,GEN,1993.0,,Acclaim Entertainment,1.78,0.53,0.0,0.08,,,
14246,,GEN,1993.0,,Acclaim Entertainment,0.0,0.0,0.03,0.0,,,


In [12]:
# menghapus missing-value
for index in df[df['Name'].isna()].index:
  df.drop(index, axis=0, inplace=True)

In [13]:
# memastikan kembali missing-value telah terhapus
if(df['Name'].isna().sum() == 0):
  print("Tidak ada data kosong pada kolom Name")
else:
  print("Terdapat data kosong pada kolom Name")

Tidak ada data kosong pada kolom Name


#### 4.2.2 Kolom Platform

In [14]:
# menggunakan collections Counter untuk mengecek jumlah setiap elemen kolom platform
platform_counter = Counter(df['Platform'])
platform_counter

Counter({'2600': 133,
         '3DO': 3,
         '3DS': 520,
         'DC': 52,
         'DS': 2152,
         'GB': 98,
         'GBA': 822,
         'GC': 556,
         'GEN': 27,
         'GG': 1,
         'N64': 319,
         'NES': 98,
         'NG': 12,
         'PC': 974,
         'PCFX': 1,
         'PS': 1197,
         'PS2': 2161,
         'PS3': 1331,
         'PS4': 393,
         'PSP': 1209,
         'PSV': 432,
         'SAT': 173,
         'SCD': 6,
         'SNES': 239,
         'TG16': 2,
         'WS': 6,
         'Wii': 1320,
         'WiiU': 147,
         'X360': 1262,
         'XB': 824,
         'XOne': 247})

In [15]:
# menghilangkan kolom dengan jumlah kurang dari 350
platform_less_than_350 = ['2600', '3DO', 'DC', 'GB', 'GEN', 'GG', 'N64','NES', 'NG',
                          'PCFX', 'SAT', 'SCD', 'SNES', 'TG16', 'WS', 'WiiU', 'XOne']

df = df[~df['Platform'].isin(platform_less_than_350)]

In [16]:
# mengecek elemen pada kolom platform
df['Platform'].unique()

array(['Wii', 'DS', 'X360', 'PS3', 'PS2', 'GBA', 'PS4', '3DS', 'PS', 'XB',
       'PC', 'PSP', 'GC', 'PSV'], dtype=object)

#### 4.2.3 Kolom Genre

In [17]:
# mengecek missing value pada kolom genre
df['Genre'].isna().sum()

0

In [18]:
# mengecek elemen berbeda pada kolom genre
df['Genre'].unique()

array(['Sports', 'Racing', 'Platform', 'Misc', 'Simulation', 'Action',
       'Role-Playing', 'Puzzle', 'Shooter', 'Fighting', 'Adventure',
       'Strategy'], dtype=object)

In [19]:
# mengecek jumlah setiap elemen pada kolom genre
genre_counter = Counter(df['Genre'])
genre_counter

Counter({'Action': 3082,
         'Adventure': 1229,
         'Fighting': 718,
         'Misc': 1641,
         'Platform': 738,
         'Puzzle': 505,
         'Racing': 1132,
         'Role-Playing': 1359,
         'Shooter': 1182,
         'Simulation': 835,
         'Sports': 2108,
         'Strategy': 624})

In [20]:
# membuang row dengan genre misc yang terlalu komplex
df = df[df['Genre'] != 'Misc']

In [21]:
# mengecek kembali info dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 13512 entries, 0 to 16718
Data columns (total 12 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             13512 non-null  object 
 1   Platform         13512 non-null  object 
 2   Year_of_Release  13293 non-null  float64
 3   Genre            13512 non-null  object 
 4   Publisher        13485 non-null  object 
 5   NA_Sales         13512 non-null  float64
 6   EU_Sales         13512 non-null  float64
 7   JP_Sales         13512 non-null  float64
 8   Other_Sales      13512 non-null  float64
 9   User_Score       8844 non-null   object 
 10  Developer        8925 non-null   object 
 11  Rating           8792 non-null   object 
dtypes: float64(5), object(7)
memory usage: 1.3+ MB


#### 4.2.4 Kolom Publisher

In [22]:
# mengecek missing-value pada kolom publisher
df['Publisher'].isna().sum()

27

In [23]:
# membuang setiap row dengan missing value
for index in df[df['Publisher'].isna()].index:
  df.drop(index, axis=0, inplace=True)

In [24]:
# mengecek kembali setiap missing value pada kolom publisher telah dibuang
if(df['Publisher'].isna().sum() == 0):
  print("Tidak ada data kosong pada kolom Publisher")
else:
  print("Terdapat data kosong pada kolom Publisher")

Tidak ada data kosong pada kolom Publisher


In [25]:
# mengecek elemen elemen pada kolom publisher
df['Publisher'].unique()

array(['Nintendo', 'Take-Two Interactive', 'Sony Computer Entertainment',
       'Activision', 'Microsoft Game Studios', 'Bethesda Softworks',
       'Electronic Arts', 'Sega', 'SquareSoft', '505 Games', 'Ubisoft',
       'GT Interactive', 'Konami Digital Entertainment', 'Square Enix',
       'Sony Computer Entertainment Europe', 'Virgin Interactive',
       'LucasArts', 'Capcom', 'Warner Bros. Interactive Entertainment',
       'Universal Interactive', 'Eidos Interactive', 'Atari',
       'Vivendi Games', 'Enix Corporation', 'Hasbro Interactive',
       'Namco Bandai Games', 'THQ', 'Fox Interactive',
       'Acclaim Entertainment', 'Disney Interactive Studios',
       'Codemasters', 'Majesco Entertainment', 'Red Orb', 'Level 5',
       'Midway Games', 'JVC', 'Deep Silver', 'NCSoft', '989 Studios',
       'UEP Systems', 'Maxis', 'Tecmo Koei', 'ASCII Entertainment',
       'Valve Software', 'Unknown', 'Valve', 'Hello Games', 'D3Publisher',
       'Activision Value', 'Infogrames', 'Red S

In [26]:
# mengecek elemen dari unknown
df[df['Publisher'] == 'Unknown']

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,User_Score,Developer,Rating
944,Gran Turismo Concept 2001 Tokyo,PS2,2001.0,Racing,Unknown,0.00,1.10,0.42,0.33,,,
1650,NASCAR Thunder 2003,PS2,,Racing,Unknown,0.60,0.46,0.00,0.16,8.7,EA Sports,E
2108,Suikoden III,PS2,,Role-Playing,Unknown,0.29,0.23,0.38,0.08,7.7,KCET,T
2224,Teenage Mutant Ninja Turtles,GBA,2003.0,Action,Unknown,0.67,0.25,0.00,0.02,8.8,Konami,E
2321,Blitz: The League,PS2,2005.0,Sports,Unknown,0.74,0.03,0.00,0.12,8,Midway,M
...,...,...,...,...,...,...,...,...,...,...,...,...
16558,"Horse Life 4: My Horse, My Friend, My Champion",3DS,2015.0,Action,Unknown,0.00,0.01,0.00,0.00,,,
16638,The Treasures of Mystery Island 3 Pack - Save ...,PC,2011.0,Puzzle,Unknown,0.01,0.00,0.00,0.00,,,
16653,Real Crimes: The Unicorn Killer,DS,2011.0,Puzzle,Unknown,0.00,0.01,0.00,0.00,,,
16706,STORM: Frontline Nation,PC,2011.0,Strategy,Unknown,0.00,0.01,0.00,0.00,7.2,SimBin,E10+


In [27]:
# membuang elemen publisher unknown
for index in df[df['Publisher'] == 'Unknown'].index:
  df.drop(index, axis=0, inplace=True)

#### 4.2.5 Kolom Year of Release

In [28]:
# mengecek missing value pada kolom Year of Release
df['Year_of_Release'].isna().sum()

115

In [29]:
# membuang missing value pada kolom Year of Release
for index in df[df['Year_of_Release'].isna()].index:
  df.drop(index, axis=0, inplace=True)

In [30]:
# memastikan missing value telah terbuang
if(df['Year_of_Release'].isna().sum() == 0):
  print("Tidak ada data kosong pada kolom Year_of_Release")
else:
  print("Terdapat data kosong pada kolom Year_of_Release")

Tidak ada data kosong pada kolom Year_of_Release


In [31]:
# mengecek elemen elemen pada kolom Year of Release
df['Year_of_Release'].unique()

array([2006., 2008., 2009., 2005., 2007., 2013., 2004., 2002., 2010.,
       2001., 2011., 2015., 2012., 2014., 1997., 1999., 2016., 2003.,
       1998., 1996., 2000., 1995., 1994., 1992., 2020., 2017., 1985.,
       1988.])

In [32]:
# mengubah type kolom menjadi string karena merupakan kategorikal
df['Year_of_Release'] = df['Year_of_Release'].astype('str')

In [33]:
# mengecek kembali missing value pada dataset
df.isna().sum()

Name                  0
Platform              0
Year_of_Release       0
Genre                 0
Publisher             0
NA_Sales              0
EU_Sales              0
JP_Sales              0
Other_Sales           0
User_Score         4559
Developer          4496
Rating             4615
dtype: int64

#### 4.2.6 Kolom User Score

In [34]:
# membuang missing value pada kolom user score, developer, dan rating
for index in df[df['User_Score'].isna()].index:
  df.drop(index, axis=0, inplace=True)

for index in df[df['Developer'].isna()].index:
  df.drop(index, axis=0, inplace=True)

for index in df[df['Rating'].isna()].index:
  df.drop(index, axis=0, inplace=True)

In [35]:
# mengecek seluruh missing value pada dataset telah terbuang
if df.isna().sum().sum() == 0:
  print('Dataset bersih dari data kosong')
else:
  print('Masih tedapat data kosong pada dataset')

Dataset bersih dari data kosong


In [36]:
# mengecek elemen elemen pada kolom user score
df['User_Score'].unique()

array(['8', '8.3', '8.5', '8.4', '8.6', '7.7', '7.4', '8.2', '9', '8.1',
       '8.7', '7.1', '3.4', '6.3', '5.3', '4.8', '3.2', '8.9', '6.4',
       '7.8', '7.9', '7.5', '2.6', '7.2', '9.2', '7', '4.3', '6.6', '7.6',
       '5.7', '5', '9.1', '6.5', 'tbd', '8.8', '6.9', '7.3', '9.4', '6.8',
       '6.1', '6.7', '4', '5.4', '4.9', '4.5', '9.3', '4.2', '3.7', '5.8',
       '5.6', '5.9', '3.9', '5.5', '6.2', '5.2', '6', '4.1', '4.7', '4.4',
       '5.1', '3.5', '2.5', '3', '3.1', '2.9', '2.7', '2.2', '2', '4.6',
       '9.5', '2.1', '3.6', '2.8', '3.3', '1.8', '3.8', '0', '1.6', '9.6',
       '2.4', '1.7', '1.1', '0.3', '1.5', '0.7', '1.2', '2.3', '1.3',
       '0.2', '0.5', '0.6', '1.4', '0.9', '1.9', '1', '9.7'], dtype=object)

In [37]:
# mengecek isi elemen tbd pada kolom user score
df[df['User_Score'] == 'tbd']

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,User_Score,Developer,Rating
119,Zumba Fitness,Wii,2010.0,Sports,505 Games,3.45,2.59,0.00,0.66,tbd,"Pipeworks Software, Inc.",E
520,Zumba Fitness 2,Wii,2011.0,Sports,Majesco Entertainment,1.51,1.03,0.00,0.27,tbd,"Majesco Games, Majesco",T
726,Dance Dance Revolution X2,PS2,2009.0,Simulation,Konami Digital Entertainment,1.09,0.85,0.00,0.28,tbd,Konami,E10+
821,The Incredibles,GBA,2004.0,Action,THQ,1.15,0.77,0.04,0.10,tbd,Helixe,E
1047,Tetris Worlds,GBA,2001.0,Puzzle,THQ,1.25,0.39,0.00,0.06,tbd,3d6 Games,E
...,...,...,...,...,...,...,...,...,...,...,...,...
16699,Planet Monsters,GBA,2001.0,Action,Titus,0.01,0.00,0.00,0.00,tbd,Planet Interactive,E
16701,Bust-A-Move 3000,GC,2003.0,Puzzle,Ubisoft,0.01,0.00,0.00,0.00,tbd,Taito Corporation,E
16702,Mega Brain Boost,DS,2008.0,Puzzle,Majesco Entertainment,0.01,0.00,0.00,0.00,tbd,Interchannel-Holon,E
16708,Plushees,DS,2008.0,Simulation,Destineer,0.01,0.00,0.00,0.00,tbd,Big John Games,E


In [38]:
# membuang row dengan user score tbd
for index in df[df['User_Score'] == 'tbd'].index:
  df.drop(index, axis=0, inplace=True)

In [39]:
# mengubah type data user score menjadi float sebagai fitur numerikal
df['User_Score'] = df['User_Score'].astype('float')

#### 4.2.7 Kolom Developer

In [40]:
# mengecek jumlah elemen berbeda pada kolom developer
df['Developer'].nunique()

1267

In [41]:
# karena jumlah elemen berbeda terlalu banyak dan kolom merupakan kategorikal maka kolom dibuang
df.drop('Developer', axis=1, inplace=True)

#### 4.2.8 Kolom Rating

In [42]:
# mengecek elemen berbeda pada kolom rating
df['Rating'].unique()

array(['E', 'M', 'T', 'E10+', 'K-A', 'AO', 'EC', 'RP'], dtype=object)

### 4.3 Pembersihan data duplikasi

In [43]:
df.duplicated().sum()

0

In [44]:
# mengecek kembali info dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6662 entries, 0 to 16700
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             6662 non-null   object 
 1   Platform         6662 non-null   object 
 2   Year_of_Release  6662 non-null   object 
 3   Genre            6662 non-null   object 
 4   Publisher        6662 non-null   object 
 5   NA_Sales         6662 non-null   float64
 6   EU_Sales         6662 non-null   float64
 7   JP_Sales         6662 non-null   float64
 8   Other_Sales      6662 non-null   float64
 9   User_Score       6662 non-null   float64
 10  Rating           6662 non-null   object 
dtypes: float64(5), object(6)
memory usage: 624.6+ KB


In [45]:
# Hasil data setelah melakukan proses cleaning
df.head()

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,User_Score,Rating
0,Wii Sports,Wii,2006.0,Sports,Nintendo,41.36,28.96,3.77,8.45,8.0,E
2,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.68,12.76,3.79,3.29,8.3,E
3,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.61,10.93,3.28,2.95,8.0,E
6,New Super Mario Bros.,DS,2006.0,Platform,Nintendo,11.28,9.14,6.5,2.88,8.5,E
8,New Super Mario Bros. Wii,Wii,2009.0,Platform,Nintendo,14.44,6.94,4.7,2.24,8.4,E


In [46]:
# mendeskripsikan kembali kolom kolom numerikal
df.describe()

Unnamed: 0,NA_Sales,EU_Sales,JP_Sales,Other_Sales,User_Score
count,6662.0,6662.0,6662.0,6662.0,6662.0
mean,0.370982,0.224319,0.059917,0.080369,7.165596
std,0.925752,0.666564,0.275964,0.267998,1.492732
min,0.0,0.0,0.0,0.0,0.0
25%,0.06,0.02,0.0,0.01,6.5
50%,0.14,0.05,0.0,0.02,7.5
75%,0.37,0.2,0.01,0.07,8.2
max,41.36,28.96,6.5,10.57,9.7


### 4.4 Menstrukturkan kembali data

#### 4.4.1 Membuat dataframe berisi nama game

In [47]:
# menyimpan nama-nama game pada dataframe baru
df_game_name = pd.DataFrame({'Game': df['Name']}).reset_index(drop=True)
df_game_name.head()

Unnamed: 0,Game
0,Wii Sports
1,Mario Kart Wii
2,Wii Sports Resort
3,New Super Mario Bros.
4,New Super Mario Bros. Wii


In [48]:
# menggunakan kolom aplikasi sebagai index
df.set_index('Name', inplace=True)
df.head()

Unnamed: 0_level_0,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,User_Score,Rating
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Wii Sports,Wii,2006.0,Sports,Nintendo,41.36,28.96,3.77,8.45,8.0,E
Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.68,12.76,3.79,3.29,8.3,E
Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.61,10.93,3.28,2.95,8.0,E
New Super Mario Bros.,DS,2006.0,Platform,Nintendo,11.28,9.14,6.5,2.88,8.5,E
New Super Mario Bros. Wii,Wii,2009.0,Platform,Nintendo,14.44,6.94,4.7,2.24,8.4,E


#### 4.4.2 Konversi label kategorikal dengan one-hot encoding

In [49]:
# memilih semua kolom dengan datatype object
column_object = df.dtypes[df.dtypes == 'object'].keys()
column_object

Index(['Platform', 'Year_of_Release', 'Genre', 'Publisher', 'Rating'], dtype='object')

In [50]:
# mengonversikan data kategori ke one-hot encoding
one_hot_label = pd.get_dummies(df[column_object])
one_hot_label.head(3)

Unnamed: 0_level_0,Platform_3DS,Platform_DS,Platform_GBA,Platform_GC,Platform_PC,Platform_PS,Platform_PS2,Platform_PS3,Platform_PS4,Platform_PSP,Platform_PSV,Platform_Wii,Platform_X360,Platform_XB,Year_of_Release_1985.0,Year_of_Release_1988.0,Year_of_Release_1992.0,Year_of_Release_1994.0,Year_of_Release_1996.0,Year_of_Release_1997.0,Year_of_Release_1998.0,Year_of_Release_1999.0,Year_of_Release_2000.0,Year_of_Release_2001.0,Year_of_Release_2002.0,Year_of_Release_2003.0,Year_of_Release_2004.0,Year_of_Release_2005.0,Year_of_Release_2006.0,Year_of_Release_2007.0,Year_of_Release_2008.0,Year_of_Release_2009.0,Year_of_Release_2010.0,Year_of_Release_2011.0,Year_of_Release_2012.0,Year_of_Release_2013.0,Year_of_Release_2014.0,Year_of_Release_2015.0,Year_of_Release_2016.0,Genre_Action,...,Publisher_Titus,Publisher_Tomy Corporation,Publisher_Touchstone,Publisher_Trion Worlds,Publisher_Tripwire Interactive,Publisher_Tru Blu Entertainment,Publisher_Ubisoft,Publisher_Ubisoft Annecy,Publisher_Universal Interactive,Publisher_Valcon Games,Publisher_ValuSoft,Publisher_Valve,Publisher_Valve Software,Publisher_Vir2L Studios,Publisher_Virgin Interactive,Publisher_Visco,Publisher_Vivendi Games,Publisher_Wanadoo,Publisher_Wargaming.net,Publisher_Warner Bros. Interactive Entertainment,Publisher_White Park Bay Software,Publisher_XS Games,Publisher_Xicat Interactive,Publisher_Xplosiv,Publisher_Xseed Games,Publisher_Yacht Club Games,Publisher_Zoo Digital Publishing,Publisher_Zoo Games,Publisher_Zushi Games,Publisher_bitComposer Games,Publisher_id Software,Publisher_inXile Entertainment,Rating_AO,Rating_E,Rating_E10+,Rating_EC,Rating_K-A,Rating_M,Rating_RP,Rating_T
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
Wii Sports,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
Mario Kart Wii,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
Wii Sports Resort,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


In [51]:
# menghapus kolom dengan type data object
df.drop(column_object,axis=1,inplace=True)
df.head()

Unnamed: 0_level_0,NA_Sales,EU_Sales,JP_Sales,Other_Sales,User_Score
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Wii Sports,41.36,28.96,3.77,8.45,8.0
Mario Kart Wii,15.68,12.76,3.79,3.29,8.3
Wii Sports Resort,15.61,10.93,3.28,2.95,8.0
New Super Mario Bros.,11.28,9.14,6.5,2.88,8.5
New Super Mario Bros. Wii,14.44,6.94,4.7,2.24,8.4


In [52]:
# menyatukan data one-hot encoding dengan data keseluruhan
df = pd.concat([df,one_hot_label],axis=1)
df.head()

Unnamed: 0_level_0,NA_Sales,EU_Sales,JP_Sales,Other_Sales,User_Score,Platform_3DS,Platform_DS,Platform_GBA,Platform_GC,Platform_PC,Platform_PS,Platform_PS2,Platform_PS3,Platform_PS4,Platform_PSP,Platform_PSV,Platform_Wii,Platform_X360,Platform_XB,Year_of_Release_1985.0,Year_of_Release_1988.0,Year_of_Release_1992.0,Year_of_Release_1994.0,Year_of_Release_1996.0,Year_of_Release_1997.0,Year_of_Release_1998.0,Year_of_Release_1999.0,Year_of_Release_2000.0,Year_of_Release_2001.0,Year_of_Release_2002.0,Year_of_Release_2003.0,Year_of_Release_2004.0,Year_of_Release_2005.0,Year_of_Release_2006.0,Year_of_Release_2007.0,Year_of_Release_2008.0,Year_of_Release_2009.0,Year_of_Release_2010.0,Year_of_Release_2011.0,Year_of_Release_2012.0,...,Publisher_Titus,Publisher_Tomy Corporation,Publisher_Touchstone,Publisher_Trion Worlds,Publisher_Tripwire Interactive,Publisher_Tru Blu Entertainment,Publisher_Ubisoft,Publisher_Ubisoft Annecy,Publisher_Universal Interactive,Publisher_Valcon Games,Publisher_ValuSoft,Publisher_Valve,Publisher_Valve Software,Publisher_Vir2L Studios,Publisher_Virgin Interactive,Publisher_Visco,Publisher_Vivendi Games,Publisher_Wanadoo,Publisher_Wargaming.net,Publisher_Warner Bros. Interactive Entertainment,Publisher_White Park Bay Software,Publisher_XS Games,Publisher_Xicat Interactive,Publisher_Xplosiv,Publisher_Xseed Games,Publisher_Yacht Club Games,Publisher_Zoo Digital Publishing,Publisher_Zoo Games,Publisher_Zushi Games,Publisher_bitComposer Games,Publisher_id Software,Publisher_inXile Entertainment,Rating_AO,Rating_E,Rating_E10+,Rating_EC,Rating_K-A,Rating_M,Rating_RP,Rating_T
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
Wii Sports,41.36,28.96,3.77,8.45,8.0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
Mario Kart Wii,15.68,12.76,3.79,3.29,8.3,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
Wii Sports Resort,15.61,10.93,3.28,2.95,8.0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
New Super Mario Bros.,11.28,9.14,6.5,2.88,8.5,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
New Super Mario Bros. Wii,14.44,6.94,4.7,2.24,8.4,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


#### 4.4.3 Standarisasi kolom numerikal

In [53]:
# memilih semua kolom dengan datatype float
column_numeric = list(df.dtypes[df.dtypes == 'float64'].keys())
column_numeric

['NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'User_Score']

In [54]:
# inisiasi MinMaxScaler
scaler = MinMaxScaler()

In [55]:
# Standarisasi data kolom numerikal
scaled = scaler.fit_transform(df[column_numeric])

In [56]:
# mengganti data dengan yang telah di standarissasi
i=0
for column in column_numeric:
    df[column] = scaled[:,i]
    i += 1

In [57]:
# melihat hasil data setelah distandarisasi dan di one-hot encoding
df.head()

Unnamed: 0_level_0,NA_Sales,EU_Sales,JP_Sales,Other_Sales,User_Score,Platform_3DS,Platform_DS,Platform_GBA,Platform_GC,Platform_PC,Platform_PS,Platform_PS2,Platform_PS3,Platform_PS4,Platform_PSP,Platform_PSV,Platform_Wii,Platform_X360,Platform_XB,Year_of_Release_1985.0,Year_of_Release_1988.0,Year_of_Release_1992.0,Year_of_Release_1994.0,Year_of_Release_1996.0,Year_of_Release_1997.0,Year_of_Release_1998.0,Year_of_Release_1999.0,Year_of_Release_2000.0,Year_of_Release_2001.0,Year_of_Release_2002.0,Year_of_Release_2003.0,Year_of_Release_2004.0,Year_of_Release_2005.0,Year_of_Release_2006.0,Year_of_Release_2007.0,Year_of_Release_2008.0,Year_of_Release_2009.0,Year_of_Release_2010.0,Year_of_Release_2011.0,Year_of_Release_2012.0,...,Publisher_Titus,Publisher_Tomy Corporation,Publisher_Touchstone,Publisher_Trion Worlds,Publisher_Tripwire Interactive,Publisher_Tru Blu Entertainment,Publisher_Ubisoft,Publisher_Ubisoft Annecy,Publisher_Universal Interactive,Publisher_Valcon Games,Publisher_ValuSoft,Publisher_Valve,Publisher_Valve Software,Publisher_Vir2L Studios,Publisher_Virgin Interactive,Publisher_Visco,Publisher_Vivendi Games,Publisher_Wanadoo,Publisher_Wargaming.net,Publisher_Warner Bros. Interactive Entertainment,Publisher_White Park Bay Software,Publisher_XS Games,Publisher_Xicat Interactive,Publisher_Xplosiv,Publisher_Xseed Games,Publisher_Yacht Club Games,Publisher_Zoo Digital Publishing,Publisher_Zoo Games,Publisher_Zushi Games,Publisher_bitComposer Games,Publisher_id Software,Publisher_inXile Entertainment,Rating_AO,Rating_E,Rating_E10+,Rating_EC,Rating_K-A,Rating_M,Rating_RP,Rating_T
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
Wii Sports,1.0,1.0,0.58,0.799432,0.824742,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
Mario Kart Wii,0.37911,0.440608,0.583077,0.311258,0.85567,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
Wii Sports Resort,0.377418,0.377417,0.504615,0.279092,0.824742,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
New Super Mario Bros.,0.272727,0.315608,1.0,0.272469,0.876289,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
New Super Mario Bros. Wii,0.34913,0.239641,0.723077,0.211921,0.865979,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


In [58]:
# mendeskripsikan lagi data
df.describe()

Unnamed: 0,NA_Sales,EU_Sales,JP_Sales,Other_Sales,User_Score,Platform_3DS,Platform_DS,Platform_GBA,Platform_GC,Platform_PC,Platform_PS,Platform_PS2,Platform_PS3,Platform_PS4,Platform_PSP,Platform_PSV,Platform_Wii,Platform_X360,Platform_XB,Year_of_Release_1985.0,Year_of_Release_1988.0,Year_of_Release_1992.0,Year_of_Release_1994.0,Year_of_Release_1996.0,Year_of_Release_1997.0,Year_of_Release_1998.0,Year_of_Release_1999.0,Year_of_Release_2000.0,Year_of_Release_2001.0,Year_of_Release_2002.0,Year_of_Release_2003.0,Year_of_Release_2004.0,Year_of_Release_2005.0,Year_of_Release_2006.0,Year_of_Release_2007.0,Year_of_Release_2008.0,Year_of_Release_2009.0,Year_of_Release_2010.0,Year_of_Release_2011.0,Year_of_Release_2012.0,...,Publisher_Titus,Publisher_Tomy Corporation,Publisher_Touchstone,Publisher_Trion Worlds,Publisher_Tripwire Interactive,Publisher_Tru Blu Entertainment,Publisher_Ubisoft,Publisher_Ubisoft Annecy,Publisher_Universal Interactive,Publisher_Valcon Games,Publisher_ValuSoft,Publisher_Valve,Publisher_Valve Software,Publisher_Vir2L Studios,Publisher_Virgin Interactive,Publisher_Visco,Publisher_Vivendi Games,Publisher_Wanadoo,Publisher_Wargaming.net,Publisher_Warner Bros. Interactive Entertainment,Publisher_White Park Bay Software,Publisher_XS Games,Publisher_Xicat Interactive,Publisher_Xplosiv,Publisher_Xseed Games,Publisher_Yacht Club Games,Publisher_Zoo Digital Publishing,Publisher_Zoo Games,Publisher_Zushi Games,Publisher_bitComposer Games,Publisher_id Software,Publisher_inXile Entertainment,Rating_AO,Rating_E,Rating_E10+,Rating_EC,Rating_K-A,Rating_M,Rating_RP,Rating_T
count,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,...,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0,6662.0
mean,0.00897,0.007746,0.009218,0.007604,0.738721,0.023717,0.070249,0.035575,0.050886,0.104473,0.022666,0.173822,0.120384,0.035575,0.058991,0.020114,0.069799,0.130591,0.083158,0.00015,0.00015,0.00015,0.00015,0.000901,0.002402,0.004503,0.004053,0.01456,0.036476,0.067397,0.073852,0.069198,0.083158,0.080156,0.086761,0.089313,0.083458,0.063194,0.066046,0.044881,...,0.000751,0.000901,0.0006,0.00045,0.0003,0.0006,0.068898,0.001651,0.002702,0.00045,0.00015,0.00015,0.00045,0.00015,0.002702,0.00015,0.017863,0.00045,0.00015,0.018763,0.00015,0.00045,0.00015,0.0003,0.0006,0.00045,0.004653,0.00045,0.0003,0.0003,0.00015,0.00015,0.00015,0.311618,0.130742,0.00015,0.0003,0.209397,0.00015,0.347493
std,0.022383,0.023017,0.042456,0.025355,0.15389,0.152176,0.255586,0.185242,0.219781,0.305896,0.148847,0.378985,0.325435,0.185242,0.235626,0.140401,0.254827,0.336978,0.276142,0.012252,0.012252,0.012252,0.012252,0.029999,0.048952,0.066959,0.063538,0.119793,0.187484,0.250727,0.261549,0.25381,0.276142,0.271555,0.281505,0.285216,0.276595,0.24333,0.248382,0.207059,...,0.027387,0.029999,0.024498,0.021217,0.017325,0.024498,0.2533,0.040604,0.051913,0.021217,0.012252,0.012252,0.021217,0.012252,0.051913,0.012252,0.132462,0.021217,0.012252,0.135698,0.012252,0.021217,0.012252,0.017325,0.024498,0.021217,0.068061,0.021217,0.017325,0.017325,0.012252,0.012252,0.012252,0.463189,0.337143,0.012252,0.017325,0.406908,0.012252,0.47621
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.001451,0.000691,0.0,0.000946,0.670103,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.003385,0.001727,0.0,0.001892,0.773196,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.008946,0.006906,0.001538,0.006623,0.845361,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


## 5. Membuat Model Sistem Rekomendasi *Content-based Filtering*

### 5.1 Menggunakan algoritma K-NearestNeighbors

In [59]:
# Membuat sistem rekomendasi dengan model K-Nearest Neighbor
# Inisiasi model 
model = NearestNeighbors(metric='euclidean')

# Melakukan fitting model terhadap data
model.fit(df)

NearestNeighbors(algorithm='auto', leaf_size=30, metric='euclidean',
                 metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                 radius=1.0)

In [60]:
# Membuat fungsi untuk mendapatkan rekomendasi game
def GameRecommended(gamename:str, recommended_games:int=6):
  print(f'Apabila pengguna menyukai Game: \n{gamename[0]}\n5 Game berikut ini direkomendasikan untuk dimainkan:')
  # Mencari game dengan kesamaan tertinggi dengan game yang disukai pengguna
  distances, neighbors = model.kneighbors(df.loc[gamename],n_neighbors=recommended_games)
  # Memasukkan game yang direkomendasikan pada sebuah list
  similar_game = []
  for gamename in df_game_name.loc[neighbors[0][:]].values:
    similar_game.append(gamename[0])
  # Memasukan skornya (jarak) pada sebuah list
  similar_distance = []
  for distance in distances[0]:
    similar_distance.append(f"{round(100-distance, 2)}%")
  # Mengembalikan sebuah dataframe berupa rekomendasi game
  return pd.DataFrame(data = {"Nama Aplikasi" : similar_game[1:], "Tingkat Kesamaan" : similar_distance[1:]})

In [61]:
# Memberikan rekomendasi terhadap game yang serupa dengan game yang dipilih
GameRecommended(df_game_name.loc[111])

Apabila pengguna menyukai Game: 
Final Fantasy IX
5 Game berikut ini direkomendasikan untuk dimainkan:


Unnamed: 0,Nama Aplikasi,Tingkat Kesamaan
0,Final Fantasy VIII,98.58%
1,Final Fantasy Tactics,98.57%
2,Xenogears,98.55%
3,Tales of Destiny II,98.55%
4,Chrono Cross,98.55%


### 5.2 Menggunakan Cosine Similarity

In [62]:
# Menghitung cosine similarity dari dataframe
cosine_sim = cosine_similarity(df)

# Menyimpan hasil perhitungan pada dataframe
cosine_sim_df = pd.DataFrame(cosine_sim, index=df_game_name['Game'], columns=df_game_name['Game'])
cosine_sim_df.head(3)

Game,Wii Sports,Mario Kart Wii,Wii Sports Resort,New Super Mario Bros.,New Super Mario Bros. Wii,Mario Kart DS,Wii Fit,Wii Fit Plus,Grand Theft Auto V,Grand Theft Auto: San Andreas,Grand Theft Auto V,Grand Theft Auto: Vice City,Brain Age 2: More Training in Minutes a Day,Gran Turismo 3: A-Spec,Call of Duty: Modern Warfare 3,Call of Duty: Black Ops,Call of Duty: Black Ops II,Call of Duty: Black Ops II,Call of Duty: Modern Warfare 2,Call of Duty: Modern Warfare 3,Grand Theft Auto III,Super Smash Bros. Brawl,Mario Kart 7,Call of Duty: Black Ops,Grand Theft Auto V,Animal Crossing: Wild World,Halo 3,Gran Turismo 4,Super Mario Galaxy,Grand Theft Auto IV,Gran Turismo,Super Mario 3D Land,Gran Turismo 5,Call of Duty: Modern Warfare 2,Grand Theft Auto IV,Call of Duty: Ghosts,New Super Mario Bros. 2,Halo: Reach,Final Fantasy VII,Halo 4,...,Young Justice: Legacy,TrackMania Turbo,Myst,Darkened Skye,Juiced 2: Hot Import Nights,Super Dungeon Bros,Dungeon Explorer: Warriors of Ancient Arts,Sherlock Holmes: The Devil's Daughter,Ride 2,Rugby World Cup 2015,Icewind Dale II,Dungeons 2,Pro Evolution Soccer 2010,Hoshigami: Ruining Blue Earth Remix,Carmageddon: Max Damage,Alone in the Dark,Clive Barker's Jericho,Grand Prix Legends,Madagascar: Escape 2 Africa,Wade Hixton's Counter Punch,MotoGP 14,Sega Rally Revo,Egg Mania: Eggstreme Madness,The Eye of Judgment: Legends,King's Bounty: Armored Princess,Transformers: Fall of Cybertron,Micro Machines V4,Dragon Ball Z for Kinect,Legacy of Kain: Defiance,Xblaze: Lost Memories,Trine,Karnaaj Rally,Hospital Tycoon,Ben 10 Omniverse 2,Bookworm Deluxe,E.T. The Extra-Terrestrial,Mortal Kombat: Deadly Alliance,Worms 2,Metal Gear Solid V: Ground Zeroes,Breach
Game,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
Wii Sports,1.0,0.681225,0.806006,0.65558,0.652319,0.504198,0.774904,0.769941,0.217456,0.238192,0.171001,0.182963,0.466721,0.323189,0.114013,0.14379,0.147556,0.124546,0.140593,0.114298,0.164948,0.458519,0.456036,0.147379,0.169285,0.461229,0.150258,0.333357,0.584422,0.146714,0.310686,0.446263,0.300416,0.136056,0.148824,0.081855,0.437377,0.138939,0.17858,0.128927,...,0.077096,0.228957,0.357456,0.07937,0.063022,0.029597,0.068939,0.083896,0.238815,0.321492,0.100237,0.095972,0.37474,0.096065,0.068939,0.071301,0.09166,0.235322,0.097042,0.251738,0.235322,0.236205,0.239657,0.096065,0.103373,0.095972,0.373988,0.027052,0.104409,0.086133,0.099181,0.251738,0.048428,0.064216,0.382273,0.181952,0.105425,0.098121,0.092749,0.072453
Mario Kart Wii,0.681225,1.0,0.686753,0.541707,0.686951,0.682935,0.677577,0.66904,0.180855,0.184008,0.151072,0.164835,0.509437,0.497469,0.085517,0.124278,0.122988,0.102481,0.122645,0.087803,0.153763,0.661797,0.659979,0.130517,0.155349,0.514492,0.13968,0.483503,0.648112,0.303219,0.497695,0.492151,0.476251,0.123617,0.304687,0.062475,0.484687,0.134865,0.18919,0.122807,...,0.092219,0.267665,0.243684,0.094948,0.246215,0.035388,0.082466,0.100364,0.444289,0.196588,0.11991,0.114815,0.268784,0.114947,0.082466,0.254845,0.109648,0.441033,0.280582,0.134594,0.441033,0.441862,0.280582,0.114947,0.123664,0.114815,0.434944,0.032342,0.124903,0.103041,0.118648,0.455775,0.057913,0.076806,0.280582,0.211163,0.126127,0.117379,0.11095,0.086671
Wii Sports Resort,0.806006,0.686753,1.0,0.530365,0.838024,0.520196,0.835151,0.995444,0.172429,0.176879,0.146538,0.159171,0.501545,0.331115,0.082378,0.120674,0.11734,0.099133,0.289034,0.083371,0.148907,0.49948,0.495664,0.125637,0.149494,0.506979,0.13588,0.322764,0.650163,0.135305,0.331867,0.490123,0.308507,0.289485,0.13504,0.060438,0.482395,0.131564,0.179902,0.119709,...,0.090348,0.268385,0.245033,0.093025,0.073847,0.034672,0.080796,0.098331,0.279942,0.376868,0.117478,0.112489,0.609117,0.112604,0.080796,0.083553,0.107423,0.275847,0.113743,0.295104,0.275847,0.276882,0.280941,0.112604,0.286817,0.112489,0.268378,0.031688,0.12237,0.100954,0.282934,0.295104,0.056737,0.075247,0.280941,0.213287,0.123572,0.114998,0.1087,0.084915


In [63]:
# Membuat fungsi untuk mendapatkan rekomendasi dengan Cosine Similarity
def CosineGameRecommended(gamename:str, recommended_games:int=5):
  print(f'Apabila pengguna menyukai Game: \n{gamename[0]}\n5 Game berikut ini direkomendasikan untuk dimainkan:')
  # Mencari nilai unik pada game yang disukai pengguna di baris dataframe cosine sim
  # Nilai unik (arr) dikembalikan dalam bentuk yang berurutan dari kecil ke besar 
  arr, ind = np.unique(cosine_sim_df.loc[gamename[0]], return_index=True)
  # Memasukkan nama game yang serupa dari index kedua terakhir sampai index n terakhir
  similar_game = []
  for index in ind[-(recommended_games+1):-1]:
    similar_game.append(df_game_name.loc[index][0])
  # Memasukkan skor cosine dari game yang serupa mulai dari index kedua terakhir sampai index n terakhir
  cosine_score = []
  for score in arr[-(recommended_games+1):-1]:
    cosine_score.append(score)
  # Mengembalikan sebuah dataframe berupa rekomendasi terhadap game yang dipilih
  return pd.DataFrame(data = {"Nama Aplikasi" : similar_game, "Cosine Similarity" : cosine_score}).sort_values(by='Cosine Similarity',ascending=False)

In [64]:
# memberikan rekomendasi dengan cosine similarity pada game yang dipilih
CosineGameRecommended(df_game_name.loc[111])

Apabila pengguna menyukai Game: 
Final Fantasy IX
5 Game berikut ini direkomendasikan untuk dimainkan:


Unnamed: 0,Nama Aplikasi,Cosine Similarity
4,Final Fantasy VIII,0.833562
3,Final Fantasy Tactics,0.825829
2,Xenogears,0.823134
1,Tales of Destiny II,0.822043
0,Chrono Cross,0.820439


## 6. Evaluasi Model Sistem Rekomendasi K-NearestNeighbors

### 6.1 Skor Calinski Harabasz

In [65]:
calinski_harabasz_score(df, df_game_name).round(2)

  y = column_or_1d(y, warn=True)


5.09

### 6.2 Skor Davies Bouldin

In [66]:
davies_bouldin_score(df, df_game_name).round(2)

  y = column_or_1d(y, warn=True)


2.93

## Penutupan
Model untuk memberikan rekomendasi game dengan *content-based filtering* telah selesai dibuat. Setelah diujikan, model ini bekerja cukup baik dalam memberikan 5 rekomendasi teratas terhadap game yang mungkin disukai/dimainkan pengguna. Namun demikian, masih ada beberapa kekurangan dari model yang dibuat seperti yang terlihat pada skor Calinski Harabasz dan Davies Bouldin. Untuk memperbaikinya dapat digunakan algoritma untuk membuat model rekomendasi yang lain seperti menggunakan deep learning ataupun *collaborative filtering* lalu dibandingkan performanya dengan model KNN saat ini.

### Referensi
- Dokumentasi Scikit-learn: [https://scikit-learn.org/stable/modules/classes.html](https://scikit-learn.org/stable/modules/classes.html)
- Referensi Laporan: [Contoh Algoritma Sistem Rekomendasi dengan Dokumentasi](https://github.com/fahmij8/ML-Exercise/blob/main/MLT-2/MLT_Proyek_Submission_2.ipynb)
- Dataset: [Game Sales with Rating Dataset](https://www.kaggle.com/rush4ratio/video-game-sales-with-ratings)