# Hyperparameter Tuning

- Memanfaatkan Cross validation untuk mencari best value dari parameter model


- Step-by-step:
    1. Lakukan cross validation.
    2. Berikan opsi untuk parameter yang akan diatur, misal: 
        
        ```penalty = ['l1', 'l2', 'elasticnet', 'none']```
        
        ```max_iter = [10, 100, 1000, 10000]```


- Ada 2 cara melakukan hyperparameter tuning:
    - __Randomized Search CV__: nilai & parameter diacak random
    - __Grid Search CV__: mencoba setiap kombinasi value yang ditawarkan

<hr>

### Iris dataset & Logistic Regression

In [44]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

In [62]:
df = pd.DataFrame(
    load_iris()['data'],
    columns = ['SL', 'SW', 'PL', 'PW']
)
df['target'] = load_iris()['target']
df.head()

Unnamed: 0,SL,SW,PL,PW,target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [63]:
# splitting
xtr, xts, ytr, yts = train_test_split(
    df[['SL', 'SW', 'PL', 'PW']],
    df['target'],
    test_size = .2
)
len(xtr)

120

In [64]:
# create model w/ default param
modelInit = LogisticRegression()
modelInit.fit(xtr, ytr)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [65]:
print(modelInit.score(xts, yts))

0.9666666666666667


- Meningkatkan akurasi model dengan mencari best value untuk parameter:
    - ```penalty = [‘l1’, ‘l2’, ‘elasticnet’, ‘none’]```
    - ```solver = [‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’]```
    - ```max_iter = [10, 100, 1000, 10000]```

In [66]:
penalty = ['l1', 'l2', 'elasticnet', 'none']
solver = ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']
max_iter = [10, 100, 1000, 10000]

param = {'penalty': penalty, 'solver': solver, 'max_iter': max_iter}

<hr>

### 1. Hyperparameter Tuning: Randomized Search CV

In [67]:
from sklearn.model_selection import RandomizedSearchCV

In [68]:
model = LogisticRegression()

In [69]:
modelrs = RandomizedSearchCV(
    estimator = model,
    param_distributions = param,
    cv = 5
)

In [70]:
modelrs.fit(xtr, ytr)

ValueError: Only 'saga' solver supports elasticnet penalty, got solver=liblinear.

ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got elasticnet penalty.

ValueError: Solver sag supports only 'l2' or 'none' penalties, got elasticnet penalty.



RandomizedSearchCV(cv=5, error_score=nan,
                   estimator=LogisticRegression(C=1.0, class_weight=None,
                                                dual=False, fit_intercept=True,
                                                intercept_scaling=1,
                                                l1_ratio=None, max_iter=100,
                                                multi_class='auto', n_jobs=None,
                                                penalty='l2', random_state=None,
                                                solver='lbfgs', tol=0.0001,
                                                verbose=0, warm_start=False),
                   iid='deprecated', n_iter=10, n_jobs=None,
                   param_distributions={'max_iter': [10, 100, 1000, 10000],
                                        'penalty': ['l1', 'l2', 'elasticnet',
                                                    'none'],
                                        'solver': ['newton-cg', 'l

In [71]:
modelrs.best_params_

{'solver': 'sag', 'penalty': 'l2', 'max_iter': 100}

In [73]:
modelRSbest = LogisticRegression(solver='sag', penalty='l2', max_iter= 100)
modelRSbest.fit(xtr, ytr)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='sag', tol=0.0001, verbose=0,
                   warm_start=False)

In [74]:
modelRSbest.score(xts, yts)

0.9666666666666667

<hr>

### 2. Hyperparameter Tuning: Grid Search CV

In [76]:
from sklearn.model_selection import GridSearchCV

In [77]:
model = LogisticRegression()

In [79]:
modelgs = GridSearchCV(
    model,
    param,
    cv = 5
)

In [80]:
modelgs.fit(xtr, ytr)

ValueError: Solver newton-cg supports only 'l2' or 'none' penalties, got l1 penalty.

ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

ValueError: Solver sag supports only 'l2' or 'none' penalties, got l1 penalty.

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
ST

GridSearchCV(cv=5, error_score=nan,
             estimator=LogisticRegression(C=1.0, class_weight=None, dual=False,
                                          fit_intercept=True,
                                          intercept_scaling=1, l1_ratio=None,
                                          max_iter=100, multi_class='auto',
                                          n_jobs=None, penalty='l2',
                                          random_state=None, solver='lbfgs',
                                          tol=0.0001, verbose=0,
                                          warm_start=False),
             iid='deprecated', n_jobs=None,
             param_grid={'max_iter': [10, 100, 1000, 10000],
                         'penalty': ['l1', 'l2', 'elasticnet', 'none'],
                         'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag',
                                    'saga']},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scorin

In [81]:
modelgs.best_params_

{'max_iter': 100, 'penalty': 'l2', 'solver': 'sag'}

In [82]:
modelGSbest = LogisticRegression(max_iter=100, penalty='l2', solver='sag')

In [83]:
modelGSbest.fit(xtr, ytr)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='sag', tol=0.0001, verbose=0,
                   warm_start=False)

In [84]:
modelGSbest.score(xts, yts)

0.9666666666666667