Conducting k-fold cross-validation with k=5, which trains a Linear Regression model on different subsets of the data. It evaluates the model's performance using metrics like Mean Squared Error (MSE), normalized MSE (NMSE), and R-squared (R2) for each fold. The results are systematically recorded in a DataFrame, `cross_val_metrics`, facilitating a thorough analysis of the model's consistency and generalization across diverse data splits.

In [3]:
cross_val_metrics = pd.DataFrame(columns=['MSE', 'norm_MSE', 'R2'])

kf = KFold(n_splits=5)
i=1
for train_index, test_index in kf.split(X_train):
    print('Split {}: \n\tTest Folds: [{}] \n\tTrain Folds {}'.format(i, i, [j for j in range(1,6) if j != i]));
    
    x_train_fold = X_train.values[train_index]
    y_train_fold = y_train.values[train_index]
    x_test_fold = X_train.values[test_index,:]
    y_test_fold = y_train.values[test_index]
    
    lr = LinearRegression().fit(x_train_fold,y_train_fold)
    y_pred_fold = lr.predict(x_test_fold)
    fold_mse =mean_squared_error(y_test_fold, y_pred_fold)
    fold_nmse =  1-r2_score(y_test_fold, y_pred_fold)
    fold_r2 = r2_score(y_test_fold, y_pred_fold)
    print(f'\tMSE: {fold_mse:3.3f} NMSE: {fold_nmse:3.3f} R2: {fold_r2:3.3f}')

    cross_val_metrics.loc[f'Fold {i}', :] = [fold_mse,fold_nmse, fold_r2]
    i = i + 1

Split 1: 
	Test Folds: [1] 
	Train Folds [2, 3, 4, 5]
	MSE: 15.126 NMSE: 0.156 R2: 0.844
Split 2: 
	Test Folds: [2] 
	Train Folds [1, 3, 4, 5]
	MSE: 12.915 NMSE: 0.186 R2: 0.814
Split 3: 
	Test Folds: [3] 
	Train Folds [1, 2, 4, 5]
	MSE: 15.121 NMSE: 0.205 R2: 0.795
Split 4: 
	Test Folds: [4] 
	Train Folds [1, 2, 3, 5]
	MSE: 13.265 NMSE: 0.170 R2: 0.830
Split 5: 
	Test Folds: [5] 
	Train Folds [1, 2, 3, 4]
	MSE: 13.932 NMSE: 0.189 R2: 0.811


In [4]:
cross_val_metrics.loc['Mean',:] = cross_val_metrics.mean()

cross_val_metrics

Unnamed: 0,MSE,norm_MSE,R2
Fold 1,15.125774,0.156393,0.843607
Fold 2,12.915315,0.185853,0.814147
Fold 3,15.1209,0.205437,0.794563
Fold 4,13.264579,0.169535,0.830465
Fold 5,13.932224,0.189038,0.810962
Mean,14.071759,0.181251,0.818749
