# Methods to Create Learning and Testing Envoirnment for Classification models

Broadly, classificaition models works in majorly two steps namely, learning and testing. Where the learning phase is used to train the classification model whereas, performance of the trained model is evaluated during testing phase. In order to design these steps, the given data set is divided into training and test sets. The key difference between these two subsets of data set is that the information on class labels is provided to the model whereas, in test set the class labels are hidden from the model. 

There are different techniques that can use the given data set to create robust learning the and create useful estimates of performance of the classification models. Below are useful techniques

1. Hold-out

Under Hold-out method, given data set is randomly divided into training set(70-80%) and test set (30-20%). The training data set is used to train the model and the trained model is tested is on test data set using evaluation metrices such as accuracy, confusion matrix, precision, recall and more. 

Key features of  Hold-out technique

1. It is the simplest method to train and test any classification model. The size of split 
can depend on the size and specifics of the data set. Although (70-30)% is common split 
setting.

2. Using Hold-out, the algorithm evaluation is fast. 

3. It is suitable for large data sets where there is strong evidence that both the slipts 
of the data are true representative of the underlying domain.

4. The major downside of Hold-out method is that the model trained can be a high variance 
model. It is for the reason that model training is done on only one sample of training 
data. This kind of learning where only one sample is given to the model to train may 
not develop a robut learning. A robust/ low variance model do not fluctuate much in 
performance whereas, high variance model indicates less stability in terms of 
performance. This method does not allow a exhaustive learning of the model. 




2. Repeated Hold-out

This is the variation of Hold-out method. Where instead of making one sample set of training and test sets, we create n sample sets of traning and test sets. Where n is entered by the user. The model is trained and tested on each sample set and performance is evaluated. This way, we get performance of the model on n different sample inputs. The average performance of the model can be evaluated by taking average of performances on different runs. 

Key features of  Repeated Hold-out technique

1. It is suitable for large data sets.

2. Repeated Hold-out helps to produce low variance model( because of differnt n training 
samples are used for learning).

3.  The major downside of this method is that creation of n samples of training and test 
sets from the same data set may produce repititions in train and test splits. 


3. k-fold cross validation

k-fold cross validation is an approach helps to develop more robust/stable and low variance model. Its key strategy is that we first create random set of training and test set for the given dataset. Apply k-fold cross validation on training set. Where given training set is divided into k folds. The model is trained on k-1 folds and one fold held back for evaluation. This process is repeated so that each fold is given a chance once as training and once as testing. After running k-cross validation, we end up with k different performance scores that we can summerize using mean. The mean score gives us the estimate of performance of model on the unseen data. Under K-fold cross validation, model undergoes rigorous training by providing it different set of training data on each run. The mean performance of the model more than 80% under K-fold cross validation ensures low varianace and highly stable model. 

Once we get estimate of stability of the model, we develop and test the model using the splits created in the initial step of the technique. 

Key features of  k-fold cross validation

1. This method is the more reliable estimate of the performance of the model on new data. It is more accurate because the algorithm is trained and evaluated mulitple times on different data. 
    
2. The choice of k  is usually determined by the number of instances contained in our dataset. For eg., suppose given train data set has n=15 instances. Let k is set to 2. It means that given data set will be divided into two equally sized sets. In this case, set 1 containing 8 samples(53% of n) of data and set 2 with 7 samples(46% of n). Under 2-fold cross validation, set 1( 53% ) of data will be used once for training and once for testing. Similarly set 2(46%) of data will be used once for training and once for testing. At the end, mean performance is evaluated for both the sets. Interestingly, under this setting, we are giving model equal amount of data for learning and testig. Ideally, we train model on more data so that the learning becomes strong. Setting k =2 with data set of instances 15 may not be a good choice. Lets set k =3. We get three equally distributed sets for n =15. Under each run of 3 fold cross validation, 2 out of 3 sets (which means 66% of the data) will be used for training and remaining 1 set (which means 33% of the data) for testing. This setting is much better than the choice of setting k =1 because here we are giving more data to the model to train. 
    
3. For modest sized data sets in thousands, k value 5,10 are most common.
    


4. Leave One Out Cross Validation

It is very similar to k-fold cross validation with once key difference that here k is set to number of observations in the data set. The key features of this method are same as k-fold cross valiation. Howerver, one downside of this method is that it can be computationally more expensive procedure than k-fold cross validation. It is good choice when data set size is small and aim is to balance model variance


This Python hands-on session will demonstrate implementation of these techniques on  heart data set. The data set is taken from https://www.kaggle.com/ronitf/heart-disease-uci. The size of the data set is (303 x 14). The tutorial will help you learn how to train and test classification model under four methods described above. The classification model considered for illustration is Decision tree. However, the four methods of defining train and test enviornment defined above remains the same for any other classification model. 

# 1. Importing Libraries

In [None]:
# importing Pandas for data manipulation
import pandas as pd
import numpy as np

# importing decision tree classifier

from sklearn.tree import DecisionTreeClassifier

# importing method for Hold-out

from sklearn.model_selection import train_test_split 

# importing method for k-fold cross validation
from sklearn.model_selection import KFold

# importing method for evaluating performance score of the model under k-fold cross validation
from sklearn.model_selection import cross_val_score

# importing method for repeated Hold-out
from sklearn.model_selection import ShuffleSplit

# importing method for leave one out cross validation
from sklearn.model_selection import LeaveOneOut


#importing methods for model evaluation
from sklearn import metrics

# 2. Loading data set

In [None]:
dataset = pd.read_csv("Data sets/heart.csv")
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
age         303 non-null int64
sex         303 non-null int64
cp          303 non-null int64
trestbps    303 non-null int64
chol        303 non-null int64
fbs         303 non-null int64
restecg     303 non-null int64
thalach     303 non-null int64
exang       303 non-null int64
oldpeak     303 non-null float64
slope       303 non-null int64
ca          303 non-null int64
thal        303 non-null int64
target      303 non-null int64
dtypes: float64(1), int64(13)
memory usage: 33.2 KB


# 3. Creating Training and Test split

This step remains common for Hold-out, Repeated Hold-out, k-cross validationa and Leave one out cross validation

In [None]:
# My_data contains all data points from My_data set from from first feature to 12th feature(indicator features)
My_data = dataset.iloc[:,0:13] 

# My_target contains class information which is 13th feature in the data set of 

My_data_target=dataset.iloc[:,13]


X_train, X_test, Y_train, Y_test = train_test_split(My_data, My_data_target, test_size=0.3, random_state=10)

print("The sample training data without target feature\n")
print(X_train.shape)
print("\nThe sample with only target feature\n")
print(Y_test.shape)

The sample training data without target feature

(212, 13)

The sample with only target feature

(91,)


# 4. Learning and Testing Model Performance under Hold-out method

In [None]:
# creating instance of Decision tree classifier

DT_model_Holdout = DecisionTreeClassifier()

# fitting the model to training data set
DT_model_Holdout.fit(X_train, Y_train)

# Getting prediction on test set

DT_model_Holdout_pred_test= DT_model_Holdout.predict(X_test)

# Computing Model Accuracy

print("Accuracy:",round(metrics.accuracy_score(Y_test, DT_model_Holdout_pred_test),2) * 100, "%")

print ("---------------")

# Printing confusion matrix

print ("Confusion matrix")

print ("---------------")

print(metrics.confusion_matrix(Y_test, DT_model_Holdout_pred_test))

# Model detailed classification report
target_names = ['class 0', 'class 1']


print ("---------------")

print("Classification report", metrics.classification_report(Y_test, DT_model_Holdout_pred_test,target_names =target_names))


Accuracy: 78.0 %
---------------
Confusion matrix
---------------
[[34 16]
 [ 4 37]]
---------------
Classification report               precision    recall  f1-score   support

     class 0       0.89      0.68      0.77        50
     class 1       0.70      0.90      0.79        41

    accuracy                           0.78        91
   macro avg       0.80      0.79      0.78        91
weighted avg       0.81      0.78      0.78        91



The model accuracy on test set is 78%(which appears to be good). However, model being trained on Hold-out method can not ensure its stability in performance on different test sets. 

# 5. Learning and Testing Model Performance under Repeated Hold-out method

In [None]:
# creating instance of Decision tree classifier
DT_Repeated_holdout = DecisionTreeClassifier()
# Creating instance of required number of random splits and sizes of training and test sets.
# following code will create 10 random training and test sets with ratio 70-30% ratio.
Repeated_holdout = ShuffleSplit(n_splits = 10, test_size=.30,random_state=10)
# Evaluating performance of model on each sample set of training and test set
Repeated_holdout_results = cross_val_score(DT_Repeated_holdout,
                                           X_train,Y_train, cv= Repeated_holdout 
                                          )
Model_Eval_Score_Repeatedholdout =[]
Model_Eval_Score_Repeatedholdout.append(Repeated_holdout_results)
CV_IterationsBy_Repeatedholdout = pd.DataFrame(np.transpose(Model_Eval_Score_Repeatedholdout), columns=['score'])
print(CV_IterationsBy_Repeatedholdout)

# printing the mean of accuracy the model
print("The mean performance of the model using Repeated Hold out: \n")
print("Average: ", round(CV_IterationsBy_Repeatedholdout.mean(), 2)*100)

      score
0  0.765625
1  0.828125
2  0.765625
3  0.796875
4  0.796875
5  0.703125
6  0.671875
7  0.828125
8  0.687500
9  0.734375
The mean performance of the model using Repeated Hold out: 

Average:  score    76.0
dtype: float64


The model performance at each iteration is outputted. The average accuracy of 73% indicates models stability when experimented on 10 unseen data. Once we accept this stability then next step is to  do the final training of the model and test it for given test set as done in Hold-out method. 

# 6. Testing Model Performance under Repeated Hold-out method

This step remains the same in all methods. Where model learning and testing is performed. 

In [None]:
DT_Repeated_holdout.fit(X_train, Y_train)

# Getting prediction on train and test sets

DT_model_RepeatedHoldout_pred_test= DT_Repeated_holdout.predict(X_test)

# Computing Model Accuracy

print("Accuracy:",round(metrics.accuracy_score(Y_test, DT_model_RepeatedHoldout_pred_test),2) * 100, "%")

print ("---------------")

# Printing confusion matrix

print ("Confusion matrix")

print ("---------------")

print(metrics.confusion_matrix(Y_test,DT_model_RepeatedHoldout_pred_test))

# Model detailed classification report
target_names = ['class 0', 'class 1']


print ("---------------")

print("Classification report", metrics.classification_report(Y_test, DT_model_RepeatedHoldout_pred_test,target_names =target_names))



Accuracy: 77.0 %
---------------
Confusion matrix
---------------
[[33 17]
 [ 4 37]]
---------------
Classification report               precision    recall  f1-score   support

     class 0       0.89      0.66      0.76        50
     class 1       0.69      0.90      0.78        41

    accuracy                           0.77        91
   macro avg       0.79      0.78      0.77        91
weighted avg       0.80      0.77      0.77        91



The output is same as that of Hold -out method

# 7. Learning and Testing Model Performance under k-cross validation

In [None]:
# creating instance of decision tree classifier
DT_Kfold = DecisionTreeClassifier()
# Setting number of folds as 10
Kfold = KFold(n_splits = 10,random_state=10)
#evaluating model performance on each fold
Repeated_holdout_results = cross_val_score(DT_Kfold,X_train,Y_train, cv= Kfold 
                                          )
Model_Eval_Score_kfold =[]
Model_Eval_Score_kfold.append(Repeated_holdout_results)
CV_IterationsBy_model_kfold = pd.DataFrame(np.transpose(Model_Eval_Score_kfold), columns=['score'])
print(CV_IterationsBy_model_kfold)

# printing the mean of accuracy of each model
print("The mean performance of the model using Repeated Hold out: \n")
print("Average: ", round(CV_IterationsBy_model_kfold.mean(), 2)*100)

      score
0  0.681818
1  0.590909
2  0.619048
3  0.714286
4  0.809524
5  0.761905
6  0.761905
7  0.619048
8  0.666667
9  0.761905
The mean performance of the model using Repeated Hold out: 

Average:  score    70.0
dtype: float64


The model performance at each iteration is outputted. The average accuracy of 71% indicates models stability when experimented on 10 unseen data. The difference in average performance by repeated hold-out and k fold cross validation is due to the fact that in the repeated hold-out there are repitations of instances but, in k cross-validation such repitations does not exist. 

Once we accept this stability then next step is to  do the final training of the model and test it for given test set as done in Hold-out method. 

# 8. Testing Model Performance under k-fold Cross Validation

This step remains the same in all methods. Where model learning and testing is performed. 

In [None]:
DT_Kfold.fit(X_train, Y_train)

# Getting prediction on train and test sets

DT_model_kfold_pred_test= DT_Kfold.predict(X_test)

# Computing Model Accuracy

print("Accuracy:",round(metrics.accuracy_score(Y_test, DT_model_kfold_pred_test),2) * 100, "%")

print ("---------------")

# Printing confusion matrix

print ("Confusion matrix")

print ("---------------")

print(metrics.confusion_matrix(Y_test,DT_model_kfold_pred_test))

# Model detailed classification report
target_names = ['class 0', 'class 1']


print ("---------------")

print("Classification report", metrics.classification_report(Y_test, DT_model_kfold_pred_test,target_names =target_names))




Accuracy: 77.0 %
---------------
Confusion matrix
---------------
[[33 17]
 [ 4 37]]
---------------
Classification report               precision    recall  f1-score   support

     class 0       0.89      0.66      0.76        50
     class 1       0.69      0.90      0.78        41

    accuracy                           0.77        91
   macro avg       0.79      0.78      0.77        91
weighted avg       0.80      0.77      0.77        91



The output is same as that of Hold -out method

# 9. Learning and Testing Model Performance under Leave One Out Cross validation

In [None]:
# creating instance of decision tree classifier
DT_LOOCV = DecisionTreeClassifier()
loocv= LeaveOneOut()
# evaluating model performance on each fold
LOOCV_results = cross_val_score(DT_LOOCV,X_train,Y_train, cv= loocv
                                          )
Model_Eval_Score_LOOCV =[]
Model_Eval_Score_LOOCV.append(LOOCV_results)
CV_IterationsBy_model_LOOCV = pd.DataFrame(np.transpose(Model_Eval_Score_LOOCV), columns=['score'])
#print(CV_IterationsBy_model_LOOCV)

# printing the mean of accuracy of each model
print("The mean performance of the model using Repeated Hold out: \n")
print("Average: ", round(CV_IterationsBy_model_LOOCV.mean(), 2)*100)

The mean performance of the model using Repeated Hold out: 

Average:  score    73.0
dtype: float64


# 10. Testing Model Performance under k-fold Cross Validation
This step remains the same in all methods. Where model learning and testing is performed.

In [None]:
DT_LOOCV.fit(X_train, Y_train)

# Getting prediction on train and test sets

DT_model_LOOCV_pred_test= DT_LOOCV.predict(X_test)

# Computing Model Accuracy

print("Accuracy:",round(metrics.accuracy_score(Y_test, DT_model_LOOCV_pred_test),2) * 100, "%")

print ("---------------")

# Printing confusion matrix

print ("Confusion matrix")

print ("---------------")

print(metrics.confusion_matrix(Y_test,DT_model_LOOCV_pred_test))

# Model detailed classification report
target_names = ['class 0', 'class 1']


print ("---------------")

print("Classification report", metrics.classification_report(Y_test, DT_model_LOOCV_pred_test,target_names =target_names))




Accuracy: 74.0 %
---------------
Confusion matrix
---------------
[[31 19]
 [ 5 36]]
---------------
Classification report               precision    recall  f1-score   support

     class 0       0.86      0.62      0.72        50
     class 1       0.65      0.88      0.75        41

    accuracy                           0.74        91
   macro avg       0.76      0.75      0.74        91
weighted avg       0.77      0.74      0.73        91

