Using K-Neighbors-Classifier

In [1]:
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier

In [2]:
iris= load_iris()

In [3]:
X, Y = iris.data, iris.target

In [4]:
knn= KNeighborsClassifier(n_neighbors=3)

In [5]:
cv_scores= cross_val_score(knn, X, Y, cv=5)

In [6]:
print(f"\nCross Validation Scores: {cv_scores}")
print(f"\nMean CV Section: {cv_scores.mean()}\n")


Cross Validation Scores: [0.96666667 0.96666667 0.93333333 0.96666667 1.        ]

Mean CV Section: 0.9666666666666668



Using K-Fold and Desicion Tree Classifier

In [7]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import KFold

In [8]:
clf= DecisionTreeClassifier(random_state=42)

In [9]:
kfolds= KFold(n_splits=5)
kfold_cv_scores= cross_val_score(clf, X, Y, cv= kfolds)

In [10]:
print(f"Cross Validation Scores by using K-Fold: {kfold_cv_scores}")
print(f"\nAverage CV Scores: {kfold_cv_scores.mean()}")
print(f"\nNumber of CV Scores used in Average: {len(kfold_cv_scores)}\n")

Cross Validation Scores by using K-Fold: [1.         1.         0.83333333 0.93333333 0.8       ]

Average CV Scores: 0.9133333333333333

Number of CV Scores used in Average: 5



Using Stratified K-Fold

In cases where classes are imbalanced we need a way to account for the imbalance in both the train and validation sets. To do so we can stratify the target classes, meaning that both sets will have an equal proportion of all classes.

In [11]:
from sklearn.model_selection import StratifiedKFold

In [12]:
sk_folds= StratifiedKFold(n_splits=5)

In [13]:
sk_folds_cv_scores= cross_val_score(clf, X, Y, cv=sk_folds)

print(f"Cross Validation Scores by using SK-Fold: {sk_folds_cv_scores}")
print(f"\nAverage CV Scores: {sk_folds_cv_scores.mean()}")
print(f"\nNumber of CV Scores used in Average: {len(sk_folds_cv_scores)}\n")

Using Leave-One-Out (LOO)

Instead of selecting the number of splits in the training data set like k-fold LeaveOneOut, utilize 1 observation to validate and n-1 observations to train. This method is an exaustive technique.

In [14]:
from sklearn.model_selection import LeaveOneOut

In [15]:
loo= LeaveOneOut()

In [16]:
loo_cv_score= cross_val_score(clf, X, Y, cv=loo)

In [17]:
print(f"Cross Validation Scores by using Leave-One-Out: {loo_cv_score}")
print(f"\nAverage CV Scores: {loo_cv_score.mean()}")
print(f"\nNumber of CV Scores used in Average: {len(loo_cv_score)}\n")

Cross Validation Scores by using Leave-One-Out: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1.
 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 0. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1.]

Average CV Scores: 0.94

Number of CV Scores used in Average: 150



Using Leave-P-Out

Leave-P-Out is simply a nuanced diffence to the Leave-One-Out idea, in that we can select the number of p to use in our validation set.

In [18]:
from sklearn.model_selection import LeavePOut

In [19]:
lpo= LeavePOut(p=2)

In [20]:
lpo_cv_score= cross_val_score(clf, X,Y, cv=lpo)

In [21]:
print(f"Cross Validation Scores by using Leave-P-Out: {lpo_cv_score}")
print(f"\nAverage CV Scores: {lpo_cv_score.mean()}")
print(f"\nNumber of CV Scores used in Average: {len(lpo_cv_score)}\n")

Cross Validation Scores by using Leave-P-Out: [1. 1. 1. ... 1. 1. 1.]

Average CV Scores: 0.9382997762863534

Number of CV Scores used in Average: 11175



Using Shuffle Split

Unlike KFold, ShuffleSplit leaves out a percentage of the data, not to be used in the train or validation sets. To do so we must decide what the train and test sizes are, as well as the number of splits.

In [22]:
from sklearn.model_selection import ShuffleSplit

In [23]:
shs= ShuffleSplit(train_size=0.6, test_size=0.3, n_splits=5)

In [24]:
shs_cv_score= cross_val_score(clf, X, Y, cv=shs)

In [25]:
print(f"Cross Validation Scores by using Shuffle-Split: {shs_cv_score}")
print(f"\nAverage CV Scores: {shs_cv_score.mean()}")
print(f"\nNumber of CV Scores used in Average: {len(shs_cv_score)}\n")

Cross Validation Scores by using Shuffle-Split: [0.91111111 0.93333333 0.95555556 0.93333333 0.93333333]

Average CV Scores: 0.9333333333333333

Number of CV Scores used in Average: 5

