### Hyperparameter tuning with RandomizedSearchCV
GridSearchCV can be computationally expensive, especially if you are searching over a large hyperparameter space and dealing with multiple hyperparameters. A solution to this is to use RandomizedSearchCV, in which not all hyperparameter values are tried out. Instead, a fixed number of hyperparameter settings is sampled from specified probability distributions. 

Just like k-NN, linear regression, and logistic regression, decision trees in scikit-learn have .fit() and .predict() method.   

Decision trees have many parameters that can be tuned, such as __max_features, max_depth__, and __min_samples_leaf__: This makes it an ideal use case for RandomizedSearchCV.

In [1]:
# Import necessary modules
import pandas as pd
import numpy as np

from sklearn.linear_model import LogisticRegression

In [2]:
column_names = ['pregnancies', 'glucose', 'diastolic', 'triceps', 'insulin', 'bmi',
       'dpf', 'age', 'diabetes']

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data',
                 names = column_names)

In [3]:
# Note the use of .drop() to drop the target variable 'party' from the feature array X as well as the use of the 
# .values attribute to ensure X and y are NumPy arrays. Without using .values, X and y are a DataFrame and Series 
# respectively; the scikit-learn API will accept them in this form also as long as they are of the right shape.

# build predictor and target df
X, y = df.drop('diabetes', axis=1).values, df['diabetes'].values

In [4]:
# Import necessary modules
from scipy.stats import randint
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import RandomizedSearchCV

# Setup the parameters and distributions to sample from: param_dist
param_dist = {"max_depth": [3, None],
              "max_features": randint(1, 9),
              "min_samples_leaf": randint(1, 9),
              "criterion": ["gini", "entropy"]}

# Instantiate a Decision Tree classifier: tree
tree = DecisionTreeClassifier()

# Instantiate the RandomizedSearchCV object: tree_cv
tree_cv = RandomizedSearchCV(tree, param_dist, cv=5)

# Fit it to the data
tree_cv.fit(X,y)

# Print the tuned parameters and score
print("Tuned Decision Tree Parameters: {}".format(tree_cv.best_params_))
print("Best score is {}".format(tree_cv.best_score_))

Tuned Decision Tree Parameters: {'criterion': 'entropy', 'max_depth': 3, 'max_features': 7, 'min_samples_leaf': 4}
Best score is 0.7421875
