# Decision Tree & Grid Search Example

This notebook shows how to tune a simple classifier like a decision tree via GridSearch.

In [1]:
%load_ext watermark
%watermark -p scikit-learn,mlxtend,xgboost

scikit-learn: 1.0
mlxtend     : 0.19.0
xgboost     : 1.5.0



## Dataset

In [2]:
from sklearn import model_selection
from sklearn.model_selection import train_test_split
from sklearn import datasets


data = datasets.load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

X_train_sub, X_valid, y_train_sub, y_valid = \
    train_test_split(X_train, y_train, test_size=0.2, random_state=1, stratify=y_train)

print('Train/Valid/Test sizes:', y_train.shape[0], y_valid.shape[0], y_test.shape[0])

Train/Valid/Test sizes: 398 80 171


## Grid Search

In [3]:
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier


clf = DecisionTreeClassifier(random_state=123)

params =  {
    'min_samples_split': [2, 3, 4],
    'max_depth': [6, 16, None]
}


grid = GridSearchCV(estimator=clf,
                    param_grid=params,
                    cv=10,
                    n_jobs=1,
                    verbose=2)

grid.fit(X_train, y_train)

grid.best_score_

Fitting 10 folds for each of 9 candidates, totalling 90 fits
[CV] END ...................max_depth=6, min_samples_split=2; total time=   0.0s
[CV] END ...................max_depth=6, min_samples_split=2; total time=   0.0s
[CV] END ...................max_depth=6, min_samples_split=2; total time=   0.0s
[CV] END ...................max_depth=6, min_samples_split=2; total time=   0.0s
[CV] END ...................max_depth=6, min_samples_split=2; total time=   0.0s
[CV] END ...................max_depth=6, min_samples_split=2; total time=   0.0s
[CV] END ...................max_depth=6, min_samples_split=2; total time=   0.0s
[CV] END ...................max_depth=6, min_samples_split=2; total time=   0.0s
[CV] END ...................max_depth=6, min_samples_split=2; total time=   0.0s
[CV] END ...................max_depth=6, min_samples_split=2; total time=   0.0s
[CV] END ...................max_depth=6, min_samples_split=3; total time=   0.0s
[CV] END ...................max_depth=6, min_sam

0.9274358974358975

In [4]:
grid.best_params_

{'max_depth': 16, 'min_samples_split': 4}

In [5]:
print(f"Training Accuracy: {grid.best_estimator_.score(X_train, y_train):0.2f}")
#print(f"Validation Accuracy: {grid.best_estimator_.score(X_valid, y_valid):0.2f}")
print(f"Test Accuracy: {grid.best_estimator_.score(X_test, y_test):0.2f}")

Training Accuracy: 1.00
Test Accuracy: 0.94
