# Cross Validation
  * **1. Train|Test split**
  * **2. Train|Validation|Test split**
  * **3. Sklearn Cross_validation score**
  * **4. Sklearn Cross_Validate**

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

## 1. Train|Test split

### Process of Train|Test split
   * **1. Clean and adjust data as necessary for X and Y**
   * **2. Split data in `Train|Test` for X and Y**
   * **3. Fit/Train scaler on Training X data**
   * **4. Scale X_Test data**
   * **5. Create Model**
   * **6. Fit/Train model on X_Train Data**
   * **7. Evaluate Model on X_text data(comparing Prediction with Y_test data)**
   * **8. Adjust parameters as necessary and repeat setep 6 & 7**

In [2]:
df = pd.read_csv("C:/Users/Rahul/ML_data/Advertising.csv")
df.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


In [3]:
# Setting up dataset
X = df.drop("sales", axis=1)
y= df['sales']


In [4]:
X.head()

Unnamed: 0,TV,radio,newspaper
0,230.1,37.8,69.2
1,44.5,39.3,45.1
2,17.2,45.9,69.3
3,151.5,41.3,58.5
4,180.8,10.8,58.4


In [5]:
y.head()

0    22.1
1    10.4
2     9.3
3    18.5
4    12.9
Name: sales, dtype: float64

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 101 )

In [8]:
# preprocessing dataset
from sklearn.preprocessing import StandardScaler

In [9]:
scaler = StandardScaler()

In [10]:
# here we only fit our train data to avoid data leakage problem
scaler.fit(X_train)

StandardScaler()

In [11]:
X_train = scaler.transform(X_train)

In [12]:
X_test = scaler.transform(X_test)

In [13]:
# Create the model
from sklearn.linear_model import Ridge

In [14]:
model_1 = Ridge(alpha = 100)

In [15]:
# Fit the model
model_1.fit(X_train, y_train)

Ridge(alpha=100)

In [16]:
# Evaluate our model prediction
y_pred = model_1.predict(X_test)

In [17]:
y_pred

array([15.34908128, 17.05755308, 12.73784965, 16.18231062, 10.85075815,
        9.87999576, 17.6105132 , 15.80786278, 11.32616781, 17.30158479,
       12.8883864 , 13.64670913, 13.71636726, 18.83377117, 17.38617584,
       11.59912699, 14.88899736, 10.07145317, 10.14692243, 17.90771073,
       10.25837266, 16.71492563, 20.57087744, 19.66643199, 10.14020781,
       13.40084066, 18.09910709, 10.80433113, 13.00876939, 13.79206361,
       12.73015096, 17.42108555, 11.50183684, 10.10362749, 16.18778637,
       10.45161746, 11.25953403, 10.42658319, 12.30681396, 11.82281519,
       14.75707677, 11.58372535, 12.01609545, 10.90016204, 12.55896716,
       11.62961585, 10.8495293 , 15.74187916, 14.09264772, 18.45114683,
       13.43419788, 14.05075373, 16.0980788 , 12.07046074, 13.15048011,
        8.75095421, 19.21013193, 12.92686996, 16.49277745, 14.83525505])

In [18]:
# import evaluation metrics
from sklearn.metrics import mean_squared_error

In [19]:
mse = mean_squared_error( y_test, y_pred)
mse

7.34177578903413

In [20]:
rmse = np.sqrt(mse)
rmse

2.709571144855608

**Now we know that in case of train test split we only guess the alpha value and try it**

In [21]:
model_2 = Ridge(alpha=1)

In [22]:
model_2.fit(X_train, y_train)

Ridge(alpha=1)

In [23]:
y_pred_2 = model_2.predict(X_test)

In [24]:
mse_2 = mean_squared_error(y_test, y_pred_2)
mse_2

2.319021579428751

**Now we can say that our model perform better with alpha value of 1**

 ### Advantages|Disadvantages of Train|Test split
**Advantages-**
   * **1. The split process is Done only one line of code `or` in a single step**
   * **2. The split process is Done only single time**
   
**Distadvantages-**
   * **1. Finding the best|Optimal `hyperparameter` is very tidious in case of Train|Test split**
   * **2. And we are adjusting the `hyperparameter` after evalute our model on our test data which is not very optimal**

## 2. Train|Validation|Test Split

**Procedure of Train|Validation|Test Split**

   * **1. Clean and adjust data as necessary for X and Y**
   * **2. Split data in `Train|Validation|Test` for X and Y**
   * **3. Fit/Train scaler on Training X data**
   * **4. Scale X Eval data**
   * **5. Create Model**
   * **6. Fit/Train model on X_Train Data**
   * **7. Evaluate Model on X Evaluation Data(comparing Prediction with Y_test data)**
   * **8. Adjust parameters as necessary and repeat setep 6 & 7**
   * **9. Get final Error metrics on Test_data(After getting error metrics on Test data do not alter the `hyperparameters`)**

In [26]:
X1 = df.drop("sales", axis=1)
y1 = df["sales"]

**Now we apply train test split operation 2 times for `Train|Validation|Test` split**

In [27]:
X1_train, X1_val, y1_train, y1_val = train_test_split(X1, y1, test_size=0.3, random_state = 101)

In [28]:
X_val, X_test, y_val, y_test = train_test_split(X1_val, y1_val, test_size= 0.5, random_state=101)

In [29]:
from sklearn.preprocessing import StandardScaler

In [31]:
scaler = StandardScaler()

In [32]:
X1_train = scaler.fit(X1_train)

In [33]:
X_test = scaler.transform(X_test)

In [34]:
X_val = scaler.transform(X_val)

In [36]:
X1_train = scaler.transform(X_train)

In [37]:
from sklearn.linear_model import Ridge

In [38]:
model_3 = Ridge(alpha=100)

In [39]:
model_3.fit(X_train, y_train)

Ridge(alpha=100)

In [42]:
y_preds = model_3.predict(X_val)
y_preds

array([16.0980788 , 10.8495293 ,  8.75095421, 14.83525505, 12.55896716,
       12.8883864 , 11.58372535, 12.01609545, 16.18778637, 10.90016204,
       11.32616781, 17.90771073, 14.09264772, 13.79206361, 13.71636726,
        9.87999576, 15.34908128, 13.00876939, 13.43419788, 10.85075815,
       14.75707677, 18.83377117, 17.30158479, 15.74187916, 16.49277745,
       19.66643199, 17.6105132 , 10.07145317, 13.64670913, 17.42108555])

In [45]:
# mse on validation data
mse = mean_squared_error(y_val, y_preds)

In [46]:
mse

7.320101458823871

In [47]:
model_4 = Ridge(alpha = 1)

In [48]:
model_4.fit(X_train, y_train)

Ridge(alpha=1)

In [49]:
model_4_preds = model_4.predict(X_val)

In [50]:
model_4_mse = mean_squared_error(y_val, model_4_preds)
model_4_mse

2.3837830750569853

In [51]:
# now let's calculate final error
final_preds = model_4.predict(X_test)

In [52]:
final_mse = mean_squared_error(y_test, final_preds)
final_mse

2.254260083800517

## 3. Sklearn Cross_validation score | K-Fold Cross-Validation
   **Process of K-Fold Cross-Validation**
   * **1. Devide the datset into `Train` and `Test` set.**
   * **2. Then Set out the value of `K`.**
   * **3. Then calculate the error on each time to the numbers of k_times.**
   * **4. After calculate the error of k_times then we have to calculate the `Average_error`.**
   * **5. After calculate the `Average_error` if it's satisfy then validate on the test set if not then agin change the alpha value and agin calculate the `Average_error`.**

In [53]:
df3 = pd.read_csv("C:/Users/Rahul/ML_data/Advertising.csv")
df3.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


In [54]:
X = df.drop("sales", axis=1)
X.head()

Unnamed: 0,TV,radio,newspaper
0,230.1,37.8,69.2
1,44.5,39.3,45.1
2,17.2,45.9,69.3
3,151.5,41.3,58.5
4,180.8,10.8,58.4


In [55]:
y = df["sales"]

In [57]:
from sklearn.model_selection import train_test_split

In [58]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state=101)

In [59]:
from sklearn.preprocessing import StandardScaler

In [60]:
scaler = StandardScaler()

In [61]:
scaler.fit(X_train)

StandardScaler()

In [62]:
X_train = scaler.transform(X_train)

In [63]:
X_test = scaler.transform(X_test)

In [65]:
from sklearn.model_selection import cross_val_score

In [66]:
from sklearn.linear_model import Ridge

In [67]:
model_5 = Ridge(alpha = 5)

In [71]:
cv_score = cross_val_score(model_5, X_train, y_train, cv=5, scoring='neg_mean_squared_error')

In [72]:
cv_score

array([-3.25206636, -1.46242717, -5.55131475, -2.16131556, -4.48235569])

In [74]:
# average error
abs(cv_score.mean())

3.3818959068533525

In [75]:
model_6 = Ridge(alpha=1)

In [76]:
cv_score2 = cross_val_score(model_5, X_train, y_train, cv=5, scoring='neg_mean_squared_error')

In [77]:
cv_score2

array([-3.25206636, -1.46242717, -5.55131475, -2.16131556, -4.48235569])

In [78]:
abs(cv_score.mean())

3.3818959068533525

In [79]:
# fit the model with train data
model_6.fit(X_train, y_train)

Ridge(alpha=1)

In [80]:
model_6_preds = model_6.predict(X_test)

In [81]:
model_6_mse = mean_squared_error(y_test, model_6_preds)

In [82]:
model_6_mse

2.319021579428751

**This is our final error score**

## 4. Sklearn Cross_Validate

In [83]:
df = pd.read_csv("C:/Users/Rahul/ML_data/Advertising.csv")

X = df.drop("sales", axis=1)
y = df["sales"]

# Train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3, random_state = 101)

# preprocessing
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)



In [84]:
from sklearn.model_selection import cross_validate

In [85]:
model_7 = Ridge(100)


In [86]:
score = cross_validate(model_7, X_train, y_train, scoring = ['neg_mean_squared_error', 'neg_mean_absolute_error'],
                      cv=10)

In [87]:
score

{'fit_time': array([0.0550065 , 0.0030005 , 0.00200462, 0.00400209, 0.002002  ,
        0.00199962, 0.00300193, 0.00200009, 0.00300145, 0.00200009]),
 'score_time': array([0.0210011 , 0.00100112, 0.00099707, 0.00199962, 0.00200486,
        0.00100088, 0.00199604, 0.00100183, 0.00099611, 0.00199604]),
 'test_neg_mean_squared_error': array([ -6.06067062, -10.62703078,  -3.99342608,  -5.00949402,
         -9.14179955, -13.08625636,  -3.83940454,  -9.05878567,
         -9.05545685,  -5.77888211]),
 'test_neg_mean_absolute_error': array([-1.8102116 , -2.54195751, -1.46959386, -1.86276886, -2.52069737,
        -2.45999491, -1.45197069, -2.37739501, -2.44334397, -1.89979708])}

In [88]:
# as we see we got a messed dictionary of error so we converted inot a dataframe
score = pd.DataFrame(score)

In [89]:
score

Unnamed: 0,fit_time,score_time,test_neg_mean_squared_error,test_neg_mean_absolute_error
0,0.055007,0.021001,-6.060671,-1.810212
1,0.003,0.001001,-10.627031,-2.541958
2,0.002005,0.000997,-3.993426,-1.469594
3,0.004002,0.002,-5.009494,-1.862769
4,0.002002,0.002005,-9.1418,-2.520697
5,0.002,0.001001,-13.086256,-2.459995
6,0.003002,0.001996,-3.839405,-1.451971
7,0.002,0.001002,-9.058786,-2.377395
8,0.003001,0.000996,-9.055457,-2.443344
9,0.002,0.001996,-5.778882,-1.899797


In [90]:
score.mean()

fit_time                        0.007802
score_time                      0.003399
test_neg_mean_squared_error    -7.565121
test_neg_mean_absolute_error   -2.083773
dtype: float64

In [91]:
# aging build the model with different alpha value
model_8 = Ridge(alpha = 1)

In [92]:
scores = cross_validate(model_8, X_train, y_train, scoring = ['neg_mean_squared_error', 'neg_mean_absolute_error'],
                       cv= 10)
scores

{'fit_time': array([0.00299931, 0.00199556, 0.00199819, 0.00301075, 0.00301838,
        0.00200534, 0.00198579, 0.0019846 , 0.00202084, 0.00100112]),
 'score_time': array([0.00200009, 0.00200224, 0.00100231, 0.00198483, 0.00099707,
        0.00099444, 0.00101447, 0.00200343, 0.00100422, 0.00098634]),
 'test_neg_mean_squared_error': array([-2.96250773, -3.05737833, -2.1737403 , -0.83303438, -3.46401792,
        -8.2326467 , -1.90586431, -2.76504844, -4.98950515, -2.84643818]),
 'test_neg_mean_absolute_error': array([-1.45717399, -1.5553078 , -1.23877012, -0.76893775, -1.43448944,
        -1.4943158 , -1.08136203, -1.25001123, -1.58097132, -1.22332553])}

In [93]:
scores = pd.DataFrame(scores)
scores

Unnamed: 0,fit_time,score_time,test_neg_mean_squared_error,test_neg_mean_absolute_error
0,0.002999,0.002,-2.962508,-1.457174
1,0.001996,0.002002,-3.057378,-1.555308
2,0.001998,0.001002,-2.17374,-1.23877
3,0.003011,0.001985,-0.833034,-0.768938
4,0.003018,0.000997,-3.464018,-1.434489
5,0.002005,0.000994,-8.232647,-1.494316
6,0.001986,0.001014,-1.905864,-1.081362
7,0.001985,0.002003,-2.765048,-1.250011
8,0.002021,0.001004,-4.989505,-1.580971
9,0.001001,0.000986,-2.846438,-1.223326


In [94]:
scores.mean()

fit_time                        0.002202
score_time                      0.001399
test_neg_mean_squared_error    -3.323018
test_neg_mean_absolute_error   -1.308467
dtype: float64

In [95]:
model_8.fit(X_train, y_train)

Ridge(alpha=1)

In [96]:
y_preds = model_8.predict(X_test)
y_preds

array([15.73544249, 19.56177685, 11.47282584, 16.99614361,  9.19583919,
        7.06034338, 20.24078477, 17.27047482,  9.7997058 , 19.18969381,
       12.40827613, 13.88321006, 13.72330625, 21.24960621, 18.41451801,
       10.00739858, 15.54023734,  7.72694272,  7.59886443, 20.3595504 ,
        7.831815  , 18.21607253, 24.61611392, 22.77116018,  8.0117733 ,
       12.667102  , 21.40567156,  8.10250725, 12.43158049, 12.53481984,
       10.81678067, 19.21537816, 10.09192883,  6.76998079, 17.29636618,
        7.81497124,  9.28808588,  8.31202002, 10.6122371 , 10.6533735 ,
       13.05491413,  9.80364168, 10.24764859,  8.09836046, 11.58209801,
       10.10783927,  9.025001  , 16.24936342, 13.26025422, 20.77690029,
       12.51477346, 13.96784546, 17.53696507, 11.15686875, 12.57233878,
        5.56009018, 23.21824128, 12.62301353, 18.72931877, 15.18197827])

In [97]:
mse = mean_squared_error(y_test, y_preds)
mse

2.319021579428751

# Grid-Search
**This is the method of `cross-Validataion` which is used when a model have more than one `hyperameters`**

In [98]:
df = pd.read_csv("C:/Users/Rahul/ML_data/Advertising.csv")
df.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


In [99]:
# Creating X and y
X = df.drop("sales", axis=1)
y = df["sales"]

# Train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.3, random_state = 101)

# preprocessing
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)


In [100]:
from sklearn.linear_model import ElasticNet

In [101]:
# setting up elastic net as our base model
base_model = ElasticNet()

In [102]:
parms_grid = {'alpha' : [0.1, 1, 5, 10, 50, 100], 
    "l1_ratio" : [.1, .5, .7, .95, .99, 1]}

In [104]:
from sklearn.model_selection import GridSearchCV

In [110]:
grid_model = GridSearchCV(estimator=base_model,
                         param_grid = parms_grid,
                         scoring = 'neg_mean_squared_error',
                         cv=5, verbose=1)

In [113]:
grid_model.fit(X_train, y_train)

Fitting 5 folds for each of 36 candidates, totalling 180 fits


GridSearchCV(cv=5, estimator=ElasticNet(),
             param_grid={'alpha': [0.1, 1, 5, 10, 50, 100],
                         'l1_ratio': [0.1, 0.5, 0.7, 0.95, 0.99, 1]},
             scoring='neg_mean_squared_error', verbose=1)

In [121]:
grid_model.best_params_

{'alpha': 0.1, 'l1_ratio': 1}

In [117]:
pd.DataFrame(grid_model.cv_results_)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_alpha,param_l1_ratio,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.001799,0.0003978129,0.000601,0.0004904863,0.1,0.1,"{'alpha': 0.1, 'l1_ratio': 0.1}",-3.453021,-1.40519,-5.789125,-2.187302,-4.645576,-3.496043,1.591601,6
1,0.001602,0.0008045578,0.000601,0.0004903833,0.1,0.5,"{'alpha': 0.1, 'l1_ratio': 0.5}",-3.32544,-1.427522,-5.59561,-2.163089,-4.451679,-3.392668,1.506827,5
2,0.001401,0.000497934,0.0008,0.0004000283,0.1,0.7,"{'alpha': 0.1, 'l1_ratio': 0.7}",-3.26988,-1.442432,-5.502437,-2.16395,-4.356738,-3.347088,1.462765,4
3,0.001801,0.0003991374,0.000399,0.000488753,0.1,0.95,"{'alpha': 0.1, 'l1_ratio': 0.95}",-3.213052,-1.472417,-5.396258,-2.177452,-4.24108,-3.300052,1.406248,3
4,0.001603,0.0004947494,0.000999,0.0006310539,0.1,0.99,"{'alpha': 0.1, 'l1_ratio': 0.99}",-3.208124,-1.478489,-5.380242,-2.181097,-4.222968,-3.294184,1.396953,2
5,0.001198,0.0004024839,0.0002,0.0003996849,0.1,1.0,"{'alpha': 0.1, 'l1_ratio': 1}",-3.206943,-1.480065,-5.376257,-2.182076,-4.21846,-3.29276,1.394613,1
6,0.001,2.780415e-07,0.0008,0.0004000916,1.0,0.1,"{'alpha': 1, 'l1_ratio': 0.1}",-9.827475,-5.261525,-11.875347,-7.449195,-8.542329,-8.591174,2.222939,12
7,0.001,2.780415e-07,0.0008,0.000400091,1.0,0.5,"{'alpha': 1, 'l1_ratio': 0.5}",-8.707071,-4.214228,-10.879261,-6.204545,-7.173031,-7.435627,2.255532,11
8,0.0012,0.000399733,0.0002,0.0004002571,1.0,0.7,"{'alpha': 1, 'l1_ratio': 0.7}",-7.92087,-3.549562,-10.024877,-5.379553,-6.324836,-6.63994,2.206213,10
9,0.001034,4.558633e-05,0.000215,0.0002632124,1.0,0.95,"{'alpha': 1, 'l1_ratio': 0.95}",-6.729435,-2.591285,-8.709842,-4.156317,-5.329916,-5.503359,2.102835,9


In [114]:
grid_preds = grid_model.predict(X_test)

In [115]:
grid_mse = mean_squared_error(y_test, grid_preds)
grid_mse

2.3873426420874737