In this tutorial, we will do 2 things:
1. Evaluating our model performance
2. Improving our model performance

Improving the model performance can be done with technique called __Model Selection__ that consists of choosing the best parameters of your machine learning models. Every time we built a machine learning model we have two types of parameters:
- Parameters that were changed and optimal values for them was found by running the model
- Parameters that we choose ourselves.

***

__Reference:__
[What is the Difference Between a Parameter and a Hyperparameter?](https://machinelearningmastery.com/difference-between-a-parameter-and-a-hyperparameter/)

A __model parameter__ is a configuration variable that is internal to the model and whose value can be estimated from data. They are often not set manually by the practitioner. Some examples of model parameters include:

- The weights in an artificial neural network.
- The support vectors in a support vector machine.
- The coefficients in a linear regression or logistic regression.

A __model hyperparameter__ is a configuration that is external to the model and whose value cannot be estimated from data. They are often used in processes to help estimate model parameters. They are often specified by the practitioner. They can often be set using heuristics. They are often tuned for a given predictive modeling problem.

We cannot know the best value for a model hyperparameter on a given problem. We may use rules of thumb, copy values used on other problems, or search for the best value by trial and error.

When a machine learning algorithm is tuned for a specific problem, such as when you are using a grid search or a random search, then you are tuning the hyperparameters of the model or order to discover the parameters of the model that result in the most skillful predictions.

Some examples of model hyperparameters include:
- The learning rate for training a neural network.
- The C and sigma hyperparameters for support vector machines.
- The k in k-nearest neighbors.

 A good rule of thumb to overcome this confusion is as follows:
> If you have to specify a model parameter manually then
it is probably a model hyperparameter.

***

Grid Search helps us choose the model hyperparameters.

Just by checking accuracy of our model on one test data we cannot be sure of the models performance and accuracy. Also the variance problem.

And so there is a technique called __k-Fold cross validation__ that improves this a lot becasue that will fix this variance problem.

__k-Fold cross validation__ will split the training set into 10 folds (wken k=10) and we train our model on 9 folds and test on the last remaing fold. So we can get 10 combinations of data and in each combination we will have 9 folds to train the model and 1 to test it.

![](k1.png)

Then we can take average of the accuracy of the 10 evaluations and also compute the standard deviation to look at the variance.

And we can know in which of the four category we will be:
![](k2.png)

So our model will be more relevant.

The most relevant position for the k-fold cross validation is right after we have build the model. 

In [2]:
# k-Fold Cross Validation

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting Kernel SVM to the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)


[Sklearn k-Fold Cross Validation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html)

__cv__ parameter is the number of folds that we want to split the training set into.



In [3]:
# Applying k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
# accuracies is a vector that will get the 10 accuracies computed through k-fold cross validation
accuracies

array([0.8       , 0.96666667, 0.8       , 0.96666667, 0.86666667,
       0.86666667, 0.9       , 0.93333333, 1.        , 0.93333333])

In [4]:
accuracies.mean()

0.9033333333333333

In [5]:
accuracies.std() # we get 6% std deviation - low variance

0.06574360974438671