<a href="https://colab.research.google.com/github/soujanya-vattikolla/ML-Basics-Definitions/blob/main/KFoldCrossValidation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Exercise

Use iris flower dataset from sklearn library and use cross_val_score against following models to measure the performance of each. In the end figure out the model with best performance,

1. Logistic Regression
2. SVM
3. Decision Tree
4. Random Forest

In [46]:
# import required libraries

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
import numpy as np
from sklearn.datasets import load_iris

In [47]:
# loading iris dataset

iris_dataset = load_iris()

In [48]:
X = iris_dataset.data
y = iris_dataset.target

In [49]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)

In [50]:
# Logistic Regression

logreg = LogisticRegression(solver='liblinear')
logreg.fit(X_train,y_train)
logreg.score(X_test,y_test)

0.9555555555555556

In [51]:
# SVM

svmmodel = SVC(gamma='auto')
svmmodel.fit(X_train,y_train)
svmmodel.score(X_test,y_test)

0.9555555555555556

In [52]:
# RandomForestClassifier

rf = RandomForestClassifier(n_estimators=90)
rf.fit(X_train,y_train)
rf.score(X_test,y_test)

0.9777777777777777

In [53]:
# DecisionTreeClassifier

dt = DecisionTreeClassifier()
dt.fit(X_train,y_train)
dt.score(X_test,y_test)

0.9777777777777777

KFold cross validation

In [54]:
def get_score(model,X_train,X_test,y_train,y_test):
    model.fit(X_train,y_train)
    return model.score(X_test,y_test)

In [55]:
from sklearn.model_selection import StratifiedKFold
folds = StratifiedKFold(n_splits=3)

scores_Logistic = []
scores_SVM = []
scores_RandomForestClassifier = []
scores_DecisionTree = []

for train_index,test_index in folds.split(X,y):
  X_train,X_test,y_train,y_test = X[train_index],X[test_index],y[train_index],y[test_index]
  
  # appending the scores
  scores_Logistic.append(get_score(LogisticRegression(solver='liblinear'),X_train,X_test,y_train,y_test))
  scores_SVM.append(get_score(SVC(gamma='auto'),X_train,X_test,y_train,y_test))
  scores_RandomForestClassifier.append(get_score(RandomForestClassifier(n_estimators=90),X_train,X_test,y_train,y_test))
  scores_DecisionTree.append(get_score(DecisionTreeClassifier(),X_train,X_test,y_train,y_test))


In [56]:
scores_Logistic

[0.96, 0.96, 0.94]

In [57]:
scores_SVM

[0.98, 0.98, 0.96]

In [58]:
scores_RandomForestClassifier

[0.98, 0.94, 0.96]

In [59]:
scores_DecisionTree

[0.98, 0.94, 0.98]

Using cross_val_score function

In [60]:
from sklearn.model_selection import cross_val_score

Logistic regression model performance using cross_val_score

In [61]:
logistic_scores = cross_val_score(LogisticRegression(solver='liblinear'),X,y,cv=3)
logistic_scores

array([0.96, 0.96, 0.94])

In [62]:
np.average(logistic_scores)

0.9533333333333333

SVM model performance using cross_val_score

In [63]:
svm_scores = cross_val_score(SVC(gamma='auto'),X,y,cv=3)
svm_scores

array([0.98, 0.98, 0.96])

In [64]:
np.average(svm_scores)

0.9733333333333333

random forest performance using cross_val_score

In [65]:
random_scores = cross_val_score(RandomForestClassifier(n_estimators=90),X,y,cv=3)
random_scores

array([0.98, 0.94, 0.96])

In [66]:
np.average(random_scores)

0.96

Decision Tree performance using cross_val_score

In [67]:
decision_scores = cross_val_score(DecisionTreeClassifier(),X,y,cv=3)
decision_scores

array([0.98, 0.94, 0.96])

In [68]:
np.average(decision_scores)

0.96

We can observe that SVM is giving a accuracy of 97%