# Hyperparameters

#### What are hyperparameters?

Hyperparameters in Machine learning are those parameters that are explicitly defined by the user to control the learning process.
These hyperparameters are used to improve the learning of the model, and their values are set before starting the learning process of the model.

#### Examples of hyperparameters in ML
The k in kNN or K-Nearest Neighbour algorithm
Train-test split ratio
Branches in Decision Tree

#### What is hyperparameter tuning?
The process of selecting the best hyperparameters to use is known as hyperparameter tuning, and the tuning process is also known as hyperparameter optimization


### Hyperparameters for Decision Tree

The few other hyperparameters that would restrict the structure of the decision tree are:

min_samples_split – Minimum number of samples a node must possess before splitting.
min_samples_leaf – Minimum number of samples a leaf node must possess.
min_weight_fraction_leaf – Minimum fraction of the sum total of weights required to be at a leaf node.
max_leaf_nodes – Maximum number of leaf nodes a decision tree can have.
max_features – Maximum number of features that are taken into the account for splitting each node.


# 1. Hyperparameter tuning using GridSearchCV 

In [1]:
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn .metrics import accuracy_score


In [2]:
df=pd.read_csv('heart-disease.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [3]:
# Check for missing values
df.isnull().sum()


age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64

In [4]:
# Check for categorical features
# b) Check for categorical values
# get all categorical columns in the dataframe
catCols = [col for col in df.columns if df[col].dtype=="O"]
catCols

[]

In [5]:
# Features and Target
x=df.drop(['target'],axis=1)
y=df['target']


In [6]:
# Split it into train and test dataset
x_train, x_test, y_train, y_test=train_test_split(x,y,test_size=0.2)

In [7]:
# Build a model
from sklearn import tree
classifier=tree.DecisionTreeClassifier()

In [8]:
# Training of model
classifier.fit(x_train, y_train)

DecisionTreeClassifier()

In [9]:
# Testing to make predictions
pred=classifier.predict(x_test)

In [10]:
# calculate accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test,pred)

0.7377049180327869

# Tuning with gridsearchcv

In [11]:

param_dict={
             'criterion':['gini','entropy'],
             'max_depth':range(1,10),
             'min_samples_split':(2,10),
             'min_samples_leaf':range(1,10)
           }



In [77]:
grid=GridSearchCV(classifier,
                  param_grid=param_dict,
                  cv=10,
                  n_jobs=1)
grid.fit(x_train,y_train)

GridSearchCV(cv=10, estimator=DecisionTreeClassifier(), n_jobs=1,
             param_grid={'criterion': ['gini', 'entropy'],
                         'max_depth': range(1, 10),
                         'min_samples_leaf': range(1, 10),
                         'min_samples_split': (2, 10)})

In [78]:
# Finding the best hyperparameter
grid.best_params_

{'criterion': 'gini',
 'max_depth': 3,
 'min_samples_leaf': 5,
 'min_samples_split': 2}

In [79]:
grid.best_estimator_

DecisionTreeClassifier(max_depth=3, min_samples_leaf=5)

In [80]:
grid.best_score_

0.8145

# 2. RandomsearchCV

In [112]:
from sklearn.model_selection import RandomizedSearchCV
param_dict={
             'criterion':['gini','entropy'],
             'max_depth':range(1,10),
             'min_samples_split':(2,10),
             'min_samples_leaf':range(1,10)
           }

In [113]:
randomcv=RandomizedSearchCV(classifier,
                           param_distributions=param_dict)
randomcv.fit(x_train,y_train)

RandomizedSearchCV(estimator=DecisionTreeClassifier(),
                   param_distributions={'criterion': ['gini', 'entropy'],
                                        'max_depth': range(1, 10),
                                        'min_samples_leaf': range(1, 10),
                                        'min_samples_split': (2, 10)})

In [114]:
# Finding the best hyperparameter
grid.best_params_

{'criterion': 'gini',
 'max_depth': 3,
 'min_samples_leaf': 5,
 'min_samples_split': 2}

In [115]:
grid.best_estimator_

DecisionTreeClassifier(max_depth=3, min_samples_leaf=5)

In [116]:
grid.best_score_

0.8145