# **HyperParameter tuning:**
for `Decision Tree Classification Model`

* We will see how we can work on Decision tree classifier and use the iris dataset.
* We will be performing the hyperParameter tuning to find the best parameters for our model.

In [3]:
# importing the libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [4]:
# loading the dataset:

iris = load_iris()
iris.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [6]:
# print all the keys using for loop:

for k in iris.keys():
    print(k)

data
target
frame
target_names
DESCR
feature_names
filename
data_module


In [7]:
# now, print the feature_names and target_names:

print(f"the feature names are: ", iris.feature_names)
print(f"the target names are: ", iris.target_names)

the feature names are:  ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
the target names are:  ['setosa' 'versicolor' 'virginica']


In [9]:
# lets split the data into features and target:

X = iris.data
y = iris.target

In [10]:
# split the data into training and testing:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [11]:
# call the model:

model = DecisionTreeClassifier()

# define the parameter grid:

param_grid = { 'criterion': ['gini', 'entropy'],
                'max_depth': np.arange(1, 10),
                'min_samples_split': np.arange(1, 10),
                'min_samples_leaf': np.arange(1, 10),
                'max_features': ['auto', 'sqrt', 'log2']
                 }

# call the GridSearchCV:

grid = GridSearchCV(model, param_grid, cv=5, n_jobs=-1)

# fit the model:

grid.fit(X_train, y_train)

# print the best prarameters:

print(f"The best parameters are: {grid.best_params_}")

8910 fits failed out of a total of 21870.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
3695 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\Muhammad Faizan\.conda\envs\python_machinelearning\Lib\site-packages\sklearn\model_selection\_validation.py", line 888, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\Muhammad Faizan\.conda\envs\python_machinelearning\Lib\site-packages\sklearn\base.py", line 1466, in wrapper
    estimator._validate_params()
  File "c:\Users\Muhammad Faizan\.conda\envs\python_machinelearning\Lib\site-packages\sklearn\base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\Muhammad Faizan\.conda\envs\

In [12]:
# print the best score:

print(f"The best score is: {grid.best_score_}")

The best score is: 0.9583333333333334


## now lets try Random Search CV:

In [14]:
# call the model:

model = DecisionTreeClassifier()

# define the parameter grid:

param_grid = { 'criterion': ['gini', 'entropy'],
                'max_depth': np.arange(1, 10),
                'min_samples_split': np.arange(1, 10),
                'min_samples_leaf': np.arange(1, 10),
                'max_features': ['auto', 'sqrt', 'log2']
                 }

# call the GridSearchCV:

random = RandomizedSearchCV(model, param_grid, cv=5, n_jobs=-1)

# fit the model:

grid.fit(X_train, y_train)

# print the best prarameters:

print(f"The best parameters are: {grid.best_params_}")

The best parameters are: {'min_samples_split': np.int64(2), 'min_samples_leaf': np.int64(7), 'max_features': 'log2', 'max_depth': np.int64(2), 'criterion': 'gini'}


25 fits failed out of a total of 50.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
7 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\Muhammad Faizan\.conda\envs\python_machinelearning\Lib\site-packages\sklearn\model_selection\_validation.py", line 888, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\Muhammad Faizan\.conda\envs\python_machinelearning\Lib\site-packages\sklearn\base.py", line 1466, in wrapper
    estimator._validate_params()
  File "c:\Users\Muhammad Faizan\.conda\envs\python_machinelearning\Lib\site-packages\sklearn\base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\Muhammad Faizan\.conda\envs\python_m

---

* In this notebook, I practiced the GridSearchCV and RandomizedSearchCV on the iris dataset.
* I used the DecisionTreeClassifier model.
* I used the GridSearchCV and RandomizedSearchCV to find the best parameters for the model.
* I used the accuracy_score, classification_report, and confusion_matrix to evaluate the model.
* I got the best score of `0.95` using the GridSearchCV.
* I hope you enjoyed this notebook and found it helpful. :)

---

# About Me:

<img src="https://scontent.flhe6-1.fna.fbcdn.net/v/t39.30808-6/449152277_18043153459857839_8752993961510467418_n.jpg?_nc_cat=108&ccb=1-7&_nc_sid=127cfc&_nc_ohc=6slHzGIxf0EQ7kNvgEeodY9&_nc_ht=scontent.flhe6-1.fna&oh=00_AYCiVUtssn2d_rREDU_FoRbXvszHQImqOjfNEiVq94lfBA&oe=66861B78" width="30%">

**Muhammd Faizan**

3rd Year BS Computer Science student at University of Agriculture, Faisalabad.\
Contact me for queries/collabs/correction

[Kaggle](https://www.kaggle.com/faizanyousafonly/)\
[Linkedin](https://www.linkedin.com/in/mrfaizanyousaf/)\
[GitHub](https://github.com/faizan-yousaf/)\
[Email] faizan6t45@gmail.com or faizanyousaf815@gmail.com \
[Phone/WhatsApp]() +923065375389