# Preparation

Preparation biasa dilakukan untuk mempersiapkan data sebelum masuk dalam tahap pemodelan. <br>
Berikut adalah tahapan yang akan dilalui pada data `Iris.csv` sebelum tahap pemodelan :
1. Import Library
2. Input Dataset
3. Preprocessing
4. Train-Test Split

## Import Library

In [None]:
import pandas as pd
import numpy as np

## Input Dataset

In [None]:
df = pd.read_csv('Iris.csv')

## Preprocessing

In [None]:
X = df.drop(['Species','Id'],1)
y = df['Species']

## Train-Test Split

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.3,random_state = 123)

# Modeling

Pada bagian ini, kita akan menerapkan dengan bahasa python beberapa model yang telah kita pelajari yaitu :
1. Support Vector Machine (SVM)
2. Ensemble Method
    - Voting Classifier
    - Bagging Classifier
    - Random Forest Classifier
    - Adaptive Boosting Classifier
    - Gradient Boosting Classifier
    
Beserta akan ada contoh **tuning hyperparameter** untuk svm dan ensemble

## Support Vector Machine

Support vector machine merupakan pemodelan yang memiliki konsep <br>
**memaksimalkan margin** pada hyperplane dengan data.<br>
Sehingga mampu mengklasifikasi dengan baik.

In [None]:
def evaluasi_model(model,X_test,y_test):
    from sklearn.metrics import accuracy_score
    y_pred = model.predict(X_test)
    return accuracy_score(y_test,y_pred)

In [None]:
from sklearn.svm import SVC
svm = SVC()
svm.fit(X_train,y_train)

SVC()

In [None]:
evaluasi_model(svm,X_test,y_test)

0.9111111111111111

### Tuning Hyperparameter - Support Vector Machine

In [None]:
params = {'C':[0.5,1,2],'kernel':['linear','rbf']}

In [None]:
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(estimator=svm,
             param_grid=params,
             scoring = 'accuracy',
             n_jobs = 2,
             cv = 3
            )

In [None]:
grid.fit(X_train,y_train)

GridSearchCV(cv=3, estimator=SVC(), n_jobs=2,
             param_grid={'C': [0.5, 1, 2], 'kernel': ['linear', 'rbf']},
             scoring='accuracy')

In [None]:
grid.best_params_

{'C': 2, 'kernel': 'rbf'}

In [None]:
evaluasi_model(grid,X_test,y_test)

0.9333333333333333

## Ensemble Method

Ensemble merupakan **penggabungan** dari beberapa model menjadi satu <br>
Sehingga mendapatkan model yang cukup powerful.

### Ensemble Method - Voting Classifier (Menggabungkan Beberapa Model)

In [None]:
from sklearn.ensemble import VotingClassifier

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

logreg = LogisticRegression()
dtc = DecisionTreeClassifier()
svm = SVC()
knn = KNeighborsClassifier()

list_model = [('lr',logreg),('tree',dtc),('svm',svm),('knn',knn)]

In [None]:
vote = VotingClassifier(list_model)
vote.fit(X_train,y_train)

VotingClassifier(estimators=[('lr', LogisticRegression()),
                             ('tree', DecisionTreeClassifier()), ('svm', SVC()),
                             ('knn', KNeighborsClassifier())])

In [None]:
evaluasi_model(vote,X_test,y_test)

0.9333333333333333

### Ensemble Method - Bagging (Bootstrap Aggregating)

In [None]:
from sklearn.ensemble import BaggingClassifier

In [None]:
bagging = BaggingClassifier()
bagging.fit(X_train,y_train)

BaggingClassifier()

In [None]:
evaluasi_model(bagging,X_test,y_test)

0.9333333333333333

### Ensemble Method - Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
rf = RandomForestClassifier(random_state=444)
rf.fit(X_train,y_train)

RandomForestClassifier(random_state=444)

In [None]:
evaluasi_model(rf,X_test,y_test)

0.9555555555555556

### Ensemble Method - Gradient Boosting Classifier

In [None]:
from sklearn.ensemble import GradientBoostingClassifier

In [None]:
grad = GradientBoostingClassifier()
grad.fit(X_train,y_train)

GradientBoostingClassifier()

In [None]:
evaluasi_model(bagging,X_test,y_test)

0.9333333333333333

### Tuning Hyperparameter - Ensemble Method (Random Forest)

In [None]:
params = {'n_estimators':[50,100,150],
          'max_features':['auto','sqrt','log2']}

In [None]:
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(estimator=rf,
             param_grid=params,
             scoring = 'accuracy',
             n_jobs = 2,
             cv = 3
            )

In [None]:
grid.fit(X_train,y_train)

GridSearchCV(cv=3, estimator=RandomForestClassifier(random_state=444), n_jobs=2,
             param_grid={'max_features': ['auto', 'sqrt', 'log2'],
                         'n_estimators': [50, 100, 150]},
             scoring='accuracy')

In [None]:
grid.best_params_

{'max_features': 'auto', 'n_estimators': 100}

In [None]:
evaluasi_model(grid,X_test,y_test)

0.9555555555555556