# Summary

- Python bentuk:
    - Script based
    - Block based


- Python fungsi:
    - Web (Back-end) : Flask, Django
    - Data Analytics & Visualization
    - Machine Learning => Deep learning : sklearn, keras, mlstat, tensorflow
    - Hardware : microcontroller => microPython & mini-PC / PC


- Scikit-Learn: 
    salah satu package python untuk membuat model ML

<hr>

Sklearn's ML models:
- Regression
    - OLS / Simple Linear Regression
    - Multiple Variable / Multivariate Linear Regression
    - LASSO regression
    - Ridge regression
    - Elastic-Net regression
    - Polynomial regression
    - Robustness regression
    - Bayessian regression
    - Decision Tree (Regressor)
    - Random Forest (Regressor)
    - Extreme Random Forest / Extra Trees (Regressor)
    - Support Vector Machine / SVR (Regressor)
    - Voting regressor
    - Gradient Boosting regressor
    - Adaboost regressor
    - Nearest Neighbors regressor
- Classification
    - Logistic Regression
    - Decision Tree (Classifier)
    - Random Forest (Classifier)
    - Extreme Random Forest / Extra Trees (Classifier)
    - Support Vector Machine / SVC (Classifier)
    - Voting classifier
    - Gradient Boosting classifier
    - Adaboost classifier
    - K-Nearest Neighbors
    - Nearest Neighbors Classifier
- Clustering
    - K-Means
    - Affinity propagation
    - DBSCAN
    - OPTICS
    - Mean shift
    - Spectral clustering
    - Agglomerative clustering
    - Gaussian mixes
    - Birch / branching factor

<hr>

Sebelum bikin model (opsional):
- Split dataset: training & testing
- Scaling & Transforming
- Handle categorical data: Dummy Var & One Hot Encoder
- Dimensionality reduction: Principal Component Analysis

Setelah bikin model:
- Evaluasi dengan evaluation metrics
- Cross validation
- Hyperparameter tuning: Grid & Random
- Deployment: model save disisipkan ke backend

Feature asli: feature A, feature B, feature C, feature D
     |
     v
    PCA
     |
     v
Feature bayangan (principal component)
            : PC 1, PC 2, PC 3, PC 4  
            :  70 ,  20 ,  8  ,  2
            :  100
            
Feature yg dipakai: PC 1 & PC 2 = mewakili 70 + 20 = 90% feature asli

model => data training => feature A - D => 4 features!
model => data training => PC 1 & PC 2 => hanya 2 features!

In [1]:
import numpy as np
import pandas as pd

In [7]:
df = pd.read_csv('rating.csv')
df = df.iloc[:1000]
df.head()

Unnamed: 0,user_id,anime_id,rating
0,1,20,-1
1,1,24,-1
2,1,79,-1
3,1,226,-1
4,1,241,-1


In [9]:
df['user_id'].unique()

array([1, 2, 3, 4, 5, 6, 7], dtype=int64)

In [10]:
df2 = df.pivot_table(
    index = 'user_id',
    columns = 'anime_id'
)
df2

Unnamed: 0_level_0,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating
anime_id,6,15,17,18,20,22,24,30,31,32,...,31043,31240,31338,31636,31722,31845,31859,31964,32182,32828
user_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1,,,,,-1.0,,-1.0,,,,...,,,-1.0,,,-1.0,,,,
2,,,,,,,,,,,...,,,,,,,,,,
3,,,,,8.0,,,,,,...,10.0,,,,,,7.0,,,
4,-1.0,,,,,,,,,,...,,,,,,,,,,
5,8.0,6.0,6.0,6.0,6.0,5.0,1.0,1.0,,,...,,8.0,,-1.0,7.0,,,-1.0,9.0,7.0
6,,,,,-1.0,,,,,,...,,,,,,,,,,
7,,,,,,7.0,,10.0,9.0,9.0,...,,,,,,,,,,


In [13]:
df2 = df2.replace([np.NaN, -1], 0)
df2

Unnamed: 0_level_0,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating
anime_id,6,15,17,18,20,22,24,30,31,32,...,31043,31240,31338,31636,31722,31845,31859,31964,32182,32828
user_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,...,10.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,8.0,6.0,6.0,6.0,6.0,5.0,1.0,1.0,0.0,0.0,...,0.0,8.0,0.0,0.0,7.0,0.0,0.0,0.0,9.0,7.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,7.0,0.0,10.0,9.0,9.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [23]:
df2cor = df2.corr().loc['rating']
df2cor = df2cor.fillna(0)
df2cor

Unnamed: 0_level_0,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating,rating
anime_id,6,15,17,18,20,22,24,30,31,32,...,31043,31240,31338,31636,31722,31845,31859,31964,32182,32828
anime_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
6,1.000000,1.000000,1.000000,1.000000,0.509175,0.485530,1.000000,-0.067458,-0.166667,-0.166667,...,-0.166667,1.000000,0.0,0.0,1.000000,0.0,-0.166667,0.0,1.000000,1.000000
15,1.000000,1.000000,1.000000,1.000000,0.509175,0.485530,1.000000,-0.067458,-0.166667,-0.166667,...,-0.166667,1.000000,0.0,0.0,1.000000,0.0,-0.166667,0.0,1.000000,1.000000
17,1.000000,1.000000,1.000000,1.000000,0.509175,0.485530,1.000000,-0.067458,-0.166667,-0.166667,...,-0.166667,1.000000,0.0,0.0,1.000000,0.0,-0.166667,0.0,1.000000,1.000000
18,1.000000,1.000000,1.000000,1.000000,0.509175,0.485530,1.000000,-0.067458,-0.166667,-0.166667,...,-0.166667,1.000000,0.0,0.0,1.000000,0.0,-0.166667,0.0,1.000000,1.000000
20,0.509175,0.509175,0.509175,0.509175,1.000000,0.096738,0.509175,-0.206089,-0.254588,-0.254588,...,0.763763,0.509175,0.0,0.0,0.509175,0.0,0.763763,0.0,0.509175,0.509175
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
31845,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000
31859,-0.166667,-0.166667,-0.166667,-0.166667,0.763763,-0.253320,-0.166667,-0.185510,-0.166667,-0.166667,...,1.000000,-0.166667,0.0,0.0,-0.166667,0.0,1.000000,0.0,-0.166667,-0.166667
31964,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.000000
32182,1.000000,1.000000,1.000000,1.000000,0.509175,0.485530,1.000000,-0.067458,-0.166667,-0.166667,...,-0.166667,1.000000,0.0,0.0,1.000000,0.0,-0.166667,0.0,1.000000,1.000000


In [24]:
# rating yg diberikan Andi (anime_id, rating)
Andi = [(6, 9), (15, 9)]

# Berikan rekomendasi untuk caca
# caca memeberikan rating 10 untuk Doraemon, One Piece
# caca memberikan rating 9 untuk Naruto

In [35]:
# rekomendasi untuk Andi
skorSimilar = pd.DataFrame()
for anime_id, rating in Andi:
    # print(anime_id, rating)
    skor = df2cor.iloc[anime_id] * (rating/11)
    skor = skor.sort_values(ascending=False)
    # print(skor)
    skorSimilar = skorSimilar.append(skor, ignore_index=True)
skorSimilar

Unnamed: 0,"(rating, 32828)","(rating, 11013)","(rating, 9253)","(rating, 9355)","(rating, 9471)","(rating, 9741)","(rating, 32182)","(rating, 9936)","(rating, 9969)","(rating, 10108)",...,"(rating, 28497)","(rating, 2201)","(rating, 2237)","(rating, 15451)","(rating, 16512)","(rating, 10507)","(rating, 5231)","(rating, 11771)","(rating, 18097)","(rating, 11757)"
0,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182,0.818182,...,-0.136364,-0.136364,-0.136364,-0.142248,-0.209451,-0.210599,-0.211254,-0.211254,-0.211254,-0.22903
1,-0.136364,-0.136364,-0.136364,-0.136364,-0.136364,-0.136364,-0.136364,-0.136364,-0.136364,-0.136364,...,-0.136364,-0.136364,0.818182,0.438597,0.442175,0.477359,0.528134,-0.211254,0.528134,0.305373


In [36]:
skorSimilar.sum().sort_values(ascending=False)

(rating, 15809)    1.056268
(rating, 3515)     1.056268
(rating, 136)      1.056268
(rating, 1253)     1.056268
(rating, 1257)     1.056268
                     ...   
(rating, 10408)   -0.272727
(rating, 9760)    -0.272727
(rating, 28701)   -0.272727
(rating, 8074)    -0.286910
(rating, 11771)   -0.422507
Length: 716, dtype: float64

- Rekomendasi untuk Andi:
    - (rating, 3515)     1.161895
    - (rating, 1257)     1.161895
    - (rating, 15809)    1.161895
    - (rating, 136)      1.161895
    - (rating, 1253)     1.161895