# Machine Learning

## Sumber dari video [Python Machine Learning Tutorial (Data Science)](https://www.youtube.com/watch?v=7eh4d6sabA0)

### Install dependency
Note: Library atau dependency di bawah ini sudah terpasang bersama dengan anaconda. Saya tuliskan kembali untuk catatan.

In [1]:
!pip install numpy pandas matplotlib scikit-learn



### Dataset
Note: Datasets di ambil dari website [kaggle](https://www.kaggle.com/) silahkan mendaftar terlebih dahulu kemudian unduh dataset [video game sales](https://www.kaggle.com/gregorut/videogamesales). Dataset juga dapat anda buat sendiri bisa dari database import sebegai csv atau membuat csv sendiri. 

In [23]:
# Import pandas libary
import pandas as pd

In [3]:
# melakukan pembacaan datasets csv
df = pd.read_csv('Datasets/vgsales.csv')

In [4]:
# Menampilkan datasets
df

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.00
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37
...,...,...,...,...,...,...,...,...,...,...,...
16593,16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Platform,Kemco,0.01,0.00,0.00,0.00,0.01
16594,16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.00,0.00,0.00,0.01
16595,16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.00,0.00,0.00,0.00,0.01
16596,16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.00,0.01,0.00,0.00,0.01


In [5]:
# Menampilkan jumlah records & kolom
df.shape

(16598, 11)

In [6]:
# Menampilkan informasi dasar setiap kolom
df.describe()

Unnamed: 0,Rank,Year,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
count,16598.0,16327.0,16598.0,16598.0,16598.0,16598.0,16598.0
mean,8300.605254,2006.406443,0.264667,0.146652,0.077782,0.048063,0.537441
std,4791.853933,5.828981,0.816683,0.505351,0.309291,0.188588,1.555028
min,1.0,1980.0,0.0,0.0,0.0,0.0,0.01
25%,4151.25,2003.0,0.0,0.0,0.0,0.0,0.06
50%,8300.5,2007.0,0.08,0.02,0.0,0.01,0.17
75%,12449.75,2010.0,0.24,0.11,0.04,0.04,0.47
max,16600.0,2020.0,41.49,29.02,10.22,10.57,82.74


In [7]:
# Menampilkan data dalam bentuk multidimensional array
df.values

array([[1, 'Wii Sports', 'Wii', ..., 3.77, 8.46, 82.74],
       [2, 'Super Mario Bros.', 'NES', ..., 6.81, 0.77, 40.24],
       [3, 'Mario Kart Wii', 'Wii', ..., 3.79, 3.31, 35.82],
       ...,
       [16598, 'SCORE International Baja 1000: The Official Game', 'PS2',
        ..., 0.0, 0.0, 0.01],
       [16599, 'Know How 2', 'DS', ..., 0.0, 0.0, 0.01],
       [16600, 'Spirits & Spells', 'GBA', ..., 0.0, 0.0, 0.01]],
      dtype=object)

## Machine learning untuk membuat prediksi genre musik berdasarkan usia

### Menyiapkan data
Note: Dataset dapat di unduh melalui link [https://bit.ly/3muqqta](https://bit.ly/3muqqta). Pada tahap ini data akan dipisahkan menjadi 2 menjadi input dan output.

In [24]:
# Import pandas library
import pandas as pd

In [25]:
# Membaca dataset dan menyimpannya kedalam variable music_data
music_data = pd.read_csv('Datasets/music.csv')

In [26]:
# Memisahkan data menjadi dua dataset terpisah (input & output)
x = music_data.drop(columns=['genre']).values
y = music_data['genre']

### Melakukan pelatihan
Note: Membuat model menggunakan DecisionTreeClassifier kemudian melakukan prediksi berdasarkan sampel data.

In [11]:
# Import class DecisionTreeClassifier dari library scikit-learn
from sklearn.tree import DecisionTreeClassifier

In [12]:
# Membuat model
model = DecisionTreeClassifier()
model.fit(x, y)

DecisionTreeClassifier()

In [None]:
# Membuat multidimensional array sample input untuk prediksi
samples = [ [21, 1], [22, 0] ]

In [27]:
# Melakukan prediksi dari variable samples
prediction = model.predict(samples)
prediction

array(['HipHop', 'Dance'], dtype=object)

### Menghitung akurasi
Note: Melakukan pemisahan data test dan train kemudian menghitung skor prediksi.

In [29]:
# Import class accuracy_score dan train_test_split dari library scikit-learn
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

In [30]:
# Memisahkan data test dan data train dengan perbandingan 0.2
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2)

In [31]:
# Membuat model DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(x_train, y_train)

DecisionTreeClassifier()

In [32]:
# Melakukan prediksi berdasarkan variable x_test
prediction = model.predict(x_test)
prediction

array(['Dance', 'Jazz', 'Classical', 'Classical'], dtype=object)

In [33]:
# Menghitung skor akurasi
score = accuracy_score(y_test, prediction)
score # Hasil yang baik diantara 0.75 sampai 1.0

0.75

### Menyimpan model
Note: Menyimpan model supaya tidak perlu melakukan pelatihan ulang.

In [18]:
# Import class joblib
import joblib

In [19]:
# menyimpan file model ke dalam dolder Models.
joblib.dump(model, 'Models/music-recomender.joblib')

['Models/music-recomender.joblib']

In [None]:
# Memuat model dari model yang tersimpan.
model = joblib.load('Models/music-recomender.joblib')

In [20]:
# membuat prediksi dari model yang tersimpan
prediction = model.predict([ [21, 1], [22, 0] ])
prediction

array(['HipHop', 'Dance'], dtype=object)

### Visualisasi Data
Note: Membuat visualisasi data menggunakan class tree

In [21]:
# Import class tree dari librari scikit-learn
from sklearn import tree

In [22]:
# Membuat visualisasi data dan menyimpannya ke dalam file .dot
tree.export_graphviz(
    model, # memuat model
    out_file='Graph/music-recommender.dot', # Path output
    feature_names=['age', 'gender'], # menampilkan aturan
    class_names=sorted(y.unique()), # Menampilkan class
    label='all', # Menampilkan label
    rounded=True, # Menambahkan sudut membulat pada kotak
    filled=True # Membuat grafik memiliki warna
)