# GridSearchCV Tutorial Notebook

GridSearchCV is an sklearn based library, stemming from the model_selection set of libraries. It is used to help find the optimal set of parameters from a given set of hyperparameters in a grid using cross validation. The model and hyperparameters must be entered into GridSearchCV, and doing so will return the best parameter values to predict the data

For additional information on GridSearchCV, [see here](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)

--------------------------------------------------------------------------------

To start with GridSearchCV, import GridSearchCV and train_test_split from sklearn.model_selection, and SVC from sklearn.svm

For demonstration purposes, numpy and datasets (from sklearn) are also imported as well.

In [None]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC

Below, we will load in some training data and seperate it into training and testing data.

In [None]:
# Load the dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Next, we will create an SVM classifier using the SVM library from sklearn. SVC, in this example, is the type of model we are using to predict our data. A model type must be defined when using GridSearchCV

In [None]:
# Create an SVM classifier
svm = SVC()

Below, we will create our grid of hyperparameters that we wish to go through and run against our data.

In [None]:
# Define the hyperparameters grid to search
param_grid = {
    'C': [0.1, 1, 10, 100],  # Different regularization parameters
    'gamma': [1, 0.1, 0.01, 0.001],  # Different kernel coefficient values
    'kernel': ['rbf', 'linear', 'poly']  # Different kernel types
}

Now, we will input our model type, grid, and how many times we want to cross-validate the data (folds).

From there, we will fit the grid search to the training data

In [None]:
# Create GridSearchCV
grid_search = GridSearchCV(estimator=svm, param_grid=param_grid, cv=5, n_jobs=-1, verbose=2)

In [None]:
# Fit the GridSearchCV with the training data
grid_search.fit(X_train, y_train)

Fitting 5 folds for each of 48 candidates, totalling 240 fits


Now, having fitted our training data, we can print out the best parameters and the score of the best parameter found.

In [None]:
# Get the best parameters and the best score
best_params = grid_search.best_params_
best_score = grid_search.best_score_

print("Best Parameters:", best_params)
print("Best Score:", best_score)

Best Parameters: {'C': 100, 'gamma': 0.01, 'kernel': 'rbf'}
Best Score: 0.9714285714285715


Using the best parameters found, we can predict this against our testing data.

In [None]:
# Use the best model to make predictions on the test data
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

In [None]:
# Now you can evaluate the best model as usual
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 1.0


That wraps up this tutorial. Thanks for reading.