### The concept of hyperparameter
Hyperparameter tuning is one of the most important parts of a machine learning pipeline. A wrong choice of the hyperparameters’ values may lead to wrong results and a model with poor performance.<br>

Hyperparameters are model parameters whose values are set **before** training.<br> These hyperparameters might address model design questions such as:

- What **degree of polynomial features** should I use for my linear model?
- What should be the **maximum depth** allowed for my decision tree?
- What should be the **minimum number of samples** required at a leaf node in my decision tree?
- **How many trees** should I include in my random forest?
- **How many neurons** should I have in my neural network layer?
- **How many layers** should I have in my neural network?
- What should I set my **learning rate** to for gradient descent?

Let's make it simple. For example, **the number of neurons** of a feed-forward neural network is a hyperparameter, because we set it before training. Another example of hyperparameter is **the number of trees** in a random forest or the penalty intensity of a Lasso regression. As you can see, the hyperparameters are all numbers that are set before the training phase and their values affect the behavior of the model.

### IMPORTANT!
Hyperparameters are **not** model parameters and they cannot be directly trained from the data. Model parameters are **learned** during training when we optimize a loss function using something like gradient descent.


### The reason for tuning the hyperparameters
Why should we tune the hyperparameters of a model?<br>

That is because we don’t really know the models' optimal values in advance. A model with different hyperparameters is, actually, a different model so it may have a lower performance.<br>

In the case of neural networks, a low number of neurons could lead to underfitting and a high number could lead to overfitting.<br>

In both cases, the model is not good, so we need to find the intermediate number of neurons that leads to the best performance.<br>

If the model has several hyperparameters, we need to find the best combination of values of the hyperparameters searching in a multi-dimensional space. That’s why hyperparameter tuning, which is the process of finding the right values of the hyperparameters, is a very complex and time-expensive task.

### Hyperparameter tuning in practice
Tuning hyperparameters means making decisions on the **stopping criteria**. There are several stopping criteria, but we're going to deal with four first, such as:
1. The max_depth
2. The minimum size of the node: min_samples_split
3. The minimum lift: min_impurity_decrease
4. The cost-complexity<br>
---
The **max depth** means the maximum number of depth in the decision tree. The tree structure cannot be deeper than this value we set using **`max_depth`**. The smaller it is, the smaller the tree will be.<br>

The **minimum size of the node** is the number of data(samples) to split. The smaller the value, the larger the tree will be, and its default value is 2.<br>

We can set this using **`min_samples_split`** A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The equation for min_sample_split is:<br>

$$\frac{N_t}{N} \times (impurity - \frac{N_{tR}}{N_t} \times right\;impurity - \frac{N_{tl}}{N_t} \times left\;impurity)$$

Where<br>
$N$ is the total number of samples<br>
$N_t$ is the total number of samples in current node<br>
$N_{tL}$ is the number of samples in the left child<br>
$N_{tR}$ is the number of samples in the right child<br>
$N$, $N_t$, $N_{tL}$, $N_{tR}$ are all refer to the weighted sum, if `sample_weight` is passed.<br>

The **minimum lift** is a criterion to see if the association rules between the items are coincidental or not. We can set the minimum lift using **`min_impurity_decrease`**.<br>

When the lift is the same or smaller than the value set, the tree will not split more. The smaller the value, the larger the tree will be.<br>

For pruning, we can think of two types of it. The first is **pre-pruning**, and the other is **post-pruning**. Pre-pruning is also called **early stopping**. It means literally stopping the training early. And we can do it by setting the max depth or the number of branches. Post-pruning is the process of performing pruning after we train the model. We can do post-pruning using the cost-complexity pruning technique.<br>

The **cost complexity** is a concept that is used in **cost complexity pruning**. Pruning is a technique to prevent overfitting by limiting the model by setting penalty coefficients for the impurity and for the decision tree being larger.<br>

In practice, we can do cost complexity pruning by finding the **$\alpha$** value with the least influence and prune the node with that value. The equation for cost complexity pruning is:

$$R_\alpha (T) = R(T) + \alpha |T|$$

where<br>
$R(T)$ is the learning errors of the leaf nodes<br>
$|T|$ is the number of leaf nodes<br>
$\alpha$ is the complexity parameter

When we focus on reducing the  𝑅(𝑇)  value only, the size of the tree gets bigger. It means the tree structure has more branches.  𝛼 decides the number of leaf nodes to be remained, thus we need to modify it to prevent overfitting. The bigger the  𝛼  value, the more nodes being pruned will be.<br>

Note that we need to calculate the $R_\alpha (T_t)$ for the sub-trees. The equation is very similar to above one.

$$R_\alpha (T_t) = R(T_t) + \alpha |T_t|$$

---
Using the stopping criteria such as above, we can set the optimal conditions for model training, and this process is called hyperparameter tuning.

### GridSearch

Amongst the hyperparameter tuning techniques, GridSearch, a sort of exhaustive search, shows the best performance. GridSearch is a technique that finds the best combination among the possible combinations. However, GridSearch also has cons because the training consumes a lot of time.<br>

For now, we wikk implement an exhaustive search using GreadSearCV module.

In [None]:
# Downloading data
!wget 'https://bit.ly/3gLj0Q6'

# Unzip the downloaded data
import zipfile
with zipfile.ZipFile('3gLj0Q6', 'r') as existing_zip:
    existing_zip.extractall('data')

In [None]:
# Import pandas and RandomForestRegressor
import pandas as pd
from sklearn.ensemble import RandomForestRegressor

In [None]:
# Load data
train = pd.read_csv('data/train.csv')
test = pd.read_csv('data/test.csv')

In [None]:
# Check if the data loading is successful
print('============ Train Data ============\n')
print('Train Data Information\n', train.info(), '\n')
print('Train Data Shape: ', train.shape, '\n')

print('============ Test Data ============')
print('Test Data Information\n', test.info(), '\n')
print('Test Data Shape: ', test.shape, '\n')

In [None]:
# Check if there are missing values
print(train.isnull().sum(), '\n')
print(test.isnull().sum())

In [None]:
# Remove the missing values using linear interpolation
train.interpolate(inplace=True)
test.interpolate(inplace=True)

In [None]:
# Check if the null values are replaced well.
print(train.isnull().sum(), '\n')
print(test.isnull().sum())

In [None]:
# Declare the model
X_train = train.drop(['count'], axis=1)
Y_train = train['count']

# Train the model
model = RandomForestRegressor(criterion = 'squared_error')
model.fit(X_train, Y_train)

In [None]:
# Print the feature importances
model.feature_importances_

In [None]:
# Create train datasets by removing the less important features
X_train1 = train.drop(['count', 'id'], axis=1)
X_train2 = train.drop(['count', 'id', 'hour_bef_windspeed'], axis=1)
X_train3 = train.drop(['count', 'id', 'hour_bef_windspeed', 'hour_bef_pm2.5'], axis=1)

Y_train = train['count']

# Create test datasets
test1 = test.drop(['id'], axis=1)
test2 = test.drop(['id', 'hour_bef_windspeed'], axis=1)
test3 = test.drop(['id', 'hour_bef_windspeed', 'hour_bef_pm2.5'], axis=1)

In [None]:
# Check the shape of training and test data
print('X_train1.shape: ', X_train1.shape, '\n')
print('X_train2.shape: ', X_train2.shape, '\n')
print('X_train3.shape: ', X_train3.shape, '\n')
print('Y_train.shape: ', Y_train.shape, '\n')
print('test1.shape', test1.shape, '\n')
print('test2.shape', test2.shape, '\n')
print('test3.shape', test3.shape, '\n')

In [None]:
# Declare separate models
model1 = RandomForestRegressor(criterion = 'squared_error')
model2 = RandomForestRegressor(criterion = 'squared_error')
model3 = RandomForestRegressor(criterion = 'squared_error')

# Train the saparated models
model1.fit(X_train1, Y_train)
model2.fit(X_train2, Y_train)
model3.fit(X_train3, Y_train)

### RandomForest Hyperparameters

**n_estimators:** Number of decision making tree
- Default = 10
- When increase it, the performance may get better, but may cause too much train time.<br>

**min_samples_split**: The minimum number of sample used to split node
- Used to control overfitting
- Default = 2: The smaller the value, the greater possibility of overfitting because of the increasing node split<br>

**min_samples_leaf**: The minimum number of samples to be leaf node
- Along to min_samples_split, it is used to control the overfitting
- When the data is imbalanced, some data of a specific class may extremely small, thus it needs to be kept the small value<br>

**max_features**: Maximum number of features for optimal split
- Default = 'auto'
    - Note: The default value of max_feature is none in decision tree
- When specified in int type: The number of features
- When specified in float type: The ratio of features
- 'sqrt' or 'auto': Samples as many as $\sqrt{The\;number\;of\;whole\;features}$
- log : Samples as many as $\log_2{(The\;number\;of\;whole\;features)}$<br>

**max_depth**: Maximum depth of the tree
- Default = none
    - Split until the class value is completely determined
    - Or until the number of data is less than min_samples_split
- As the depth increases, it may overfit, so proper control is required.<br>

**max_leaf_nodes**: The maximum number of leaf nodes

### GridSearchCV initializer
- estimator: classifier, regressor, pipeline, and so on.

- param_grid: In the dictionary type, input the parameters that are going to be used for parameter tuning.

- scoring: Method to evaluate the prediction performance. Usually set to accuracy.

- cv: Specifies the number of divisions in cross-validation(The number of fold).

- refit: The default value is True. When it is set default, it finds the optimal hyperparameter and retrains it.

- n_jobs: The default value is 1, Set -1 to use all cores.

In [None]:
from sklearn.model_selection import GridSearchCV
import time

model = RandomForestRegressor(criterion = 'squared_error',
                              random_state=2022)

# Preference for GridSearchCV
params = {'n_estimators': [200, 300, 500],
          'max_features': [5, 6, 8],
          'min_samples_leaf': [1, 3, 5]}

# Separate GridSearchCV for each model
greedy_CV1 = GridSearchCV(model1,
                          param_grid = params,
                          cv = 3,
                          n_jobs = -1,
                          verbose=2)

greedy_CV2 = GridSearchCV(model2,
                          param_grid = params,
                          cv = 3,
                          n_jobs = -1,
                          verbose=2)

greedy_CV3 = GridSearchCV(model3,
                          param_grid = params,
                          cv = 3,
                          n_jobs = -1,
                          verbose=2)

start_time = time.time()

# Train the model with each train data
greedy_CV1.fit(X_train1, Y_train)
greedy_CV2.fit(X_train2, Y_train)
greedy_CV3.fit(X_train3, Y_train)

end_time = time.time()

print(end_time-start_time)

In [None]:
# Predict with the model
prediction1 = greedy_CV1.predict(test1)
prediction2 = greedy_CV2.predict(test2)
prediction3 = greedy_CV3.predict(test3)

print(prediction1)
print(prediction2)
print(prediction3)

In [None]:
# Save the results in csv files
import numpy as np

GridSearchCV_result1 = pd.read_csv('data/submission.csv')
GridSearchCV_result1['count'] = np.round(prediction1, 2)

GridSearchCV_result2 = pd.read_csv('data/submission.csv')
GridSearchCV_result2['count'] = np.round(prediction2, 2)

GridSearchCV_result3 = pd.read_csv('data/submission.csv')
GridSearchCV_result3['count'] = np.round(prediction3, 2)

In [None]:
print(GridSearchCV_result1.head(), '\n')
print(GridSearchCV_result2.head(), '\n')
print(GridSearchCV_result3.head())

In [None]:
GridSearchCV_result1.to_csv('GridSearchCV_result1.csv', index=False)
GridSearchCV_result2.to_csv('GridSearchCV_result2.csv', index=False)
GridSearchCV_result3.to_csv('GridSearchCV_result3.csv', index=False)

### GridSearch

Amongst the hyperparameter tuning techniques, GridSearch, a sort of exhaustive search, shows the best performance. GridSearch is a technique that finds the best combination among the possible combinations. However, GridSearch also has cons because the training consumes a lot of time.<br>

For now, we wikk implement an exhaustive search using GreadSearCV module.

In [1]:
# Downloading data
!wget 'https://bit.ly/3gLj0Q6'

# Unzip the downloaded data
import zipfile
with zipfile.ZipFile('3gLj0Q6', 'r') as existing_zip:
    existing_zip.extractall('data')

--2022-09-09 14:04:57--  https://bit.ly/3gLj0Q6
Resolving bit.ly (bit.ly)... 67.199.248.11, 67.199.248.10
Connecting to bit.ly (bit.ly)|67.199.248.11|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://drive.google.com/uc?export=download&id=1or_QN1ksv81DNog6Tu_kWcZ5jJWf5W9E [following]
--2022-09-09 14:04:57--  https://drive.google.com/uc?export=download&id=1or_QN1ksv81DNog6Tu_kWcZ5jJWf5W9E
Resolving drive.google.com (drive.google.com)... 172.217.174.110, 2404:6800:4004:825::200e
Connecting to drive.google.com (drive.google.com)|172.217.174.110|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://doc-0c-10-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/jevu56pm6ghc5grm1s630dvrmn1eot8k/1662699825000/17946651057176172524/*/1or_QN1ksv81DNog6Tu_kWcZ5jJWf5W9E?e=download&uuid=a9050603-3434-4aea-b16b-158a74693759 [following]
--2022-09-09 14:04:58--  https://doc-0c-10-docs.googleuserc

In [2]:
# Import pandas and RandomForestRegressor
import pandas as pd
from sklearn.ensemble import RandomForestRegressor

In [3]:
# Load data
train = pd.read_csv('data/train.csv')
test = pd.read_csv('data/test.csv')

In [4]:
# Check if the data loading is successful
print('============ Train Data ============\n')
print('Train Data Information\n', train.info(), '\n')
print('Train Data Shape: ', train.shape, '\n')

print('============ Test Data ============')
print('Test Data Information\n', test.info(), '\n')
print('Test Data Shape: ', test.shape, '\n')


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1459 entries, 0 to 1458
Data columns (total 11 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   id                      1459 non-null   int64  
 1   hour                    1459 non-null   int64  
 2   hour_bef_temperature    1457 non-null   float64
 3   hour_bef_precipitation  1457 non-null   float64
 4   hour_bef_windspeed      1450 non-null   float64
 5   hour_bef_humidity       1457 non-null   float64
 6   hour_bef_visibility     1457 non-null   float64
 7   hour_bef_ozone          1383 non-null   float64
 8   hour_bef_pm10           1369 non-null   float64
 9   hour_bef_pm2.5          1342 non-null   float64
 10  count                   1459 non-null   float64
dtypes: float64(9), int64(2)
memory usage: 125.5 KB
Train Data Information
 None 

Train Data Shape:  (1459, 11) 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 715 entries, 0 to 714
Data columns (to

In [5]:
# Check if there are missing values
print(train.isnull().sum(), '\n')
print(test.isnull().sum())

id                          0
hour                        0
hour_bef_temperature        2
hour_bef_precipitation      2
hour_bef_windspeed          9
hour_bef_humidity           2
hour_bef_visibility         2
hour_bef_ozone             76
hour_bef_pm10              90
hour_bef_pm2.5            117
count                       0
dtype: int64 

id                         0
hour                       0
hour_bef_temperature       1
hour_bef_precipitation     1
hour_bef_windspeed         1
hour_bef_humidity          1
hour_bef_visibility        1
hour_bef_ozone            35
hour_bef_pm10             37
hour_bef_pm2.5            36
dtype: int64


In [6]:
# Remove the missing values using linear interpolation
train.interpolate(inplace=True)
test.interpolate(inplace=True)

In [7]:
# Check if the null values are replaced well.
print(train.isnull().sum(), '\n')
print(test.isnull().sum())

id                        0
hour                      0
hour_bef_temperature      0
hour_bef_precipitation    0
hour_bef_windspeed        0
hour_bef_humidity         0
hour_bef_visibility       0
hour_bef_ozone            0
hour_bef_pm10             0
hour_bef_pm2.5            0
count                     0
dtype: int64 

id                        0
hour                      0
hour_bef_temperature      0
hour_bef_precipitation    0
hour_bef_windspeed        0
hour_bef_humidity         0
hour_bef_visibility       0
hour_bef_ozone            0
hour_bef_pm10             0
hour_bef_pm2.5            0
dtype: int64


In [8]:
# Declare the model
X_train = train.drop(['count'], axis=1)
Y_train = train['count']

# Train the model
model = RandomForestRegressor(criterion = 'squared_error')
model.fit(X_train, Y_train)

RandomForestRegressor()

In [9]:
# Print the feature importances
model.feature_importances_

array([0.02539346, 0.5968231 , 0.17926783, 0.01670445, 0.02576764,
       0.03619231, 0.03358093, 0.03428622, 0.03153636, 0.02044771])

In [10]:
# Create train datasets by removing the less important features
X_train1 = train.drop(['count', 'id'], axis=1)
X_train2 = train.drop(['count', 'id', 'hour_bef_windspeed'], axis=1)
X_train3 = train.drop(['count', 'id', 'hour_bef_windspeed', 'hour_bef_pm2.5'], axis=1)

Y_train = train['count']

# Create test datasets
test1 = test.drop(['id'], axis=1)
test2 = test.drop(['id', 'hour_bef_windspeed'], axis=1)
test3 = test.drop(['id', 'hour_bef_windspeed', 'hour_bef_pm2.5'], axis=1)

In [11]:
# Check the shape of training and test data
print('X_train1.shape: ', X_train1.shape, '\n')
print('X_train2.shape: ', X_train2.shape, '\n')
print('X_train3.shape: ', X_train3.shape, '\n')
print('Y_train.shape: ', Y_train.shape, '\n')
print('test1.shape', test1.shape, '\n')
print('test2.shape', test2.shape, '\n')
print('test3.shape', test3.shape, '\n')

X_train1.shape:  (1459, 9) 

X_train2.shape:  (1459, 8) 

X_train3.shape:  (1459, 7) 

Y_train.shape:  (1459,) 

test1.shape (715, 9) 

test2.shape (715, 8) 

test3.shape (715, 7) 



In [12]:
# Declare separate models
model1 = RandomForestRegressor(criterion = 'squared_error')
model2 = RandomForestRegressor(criterion = 'squared_error')
model3 = RandomForestRegressor(criterion = 'squared_error')

# Train the saparated models
model1.fit(X_train1, Y_train)
model2.fit(X_train2, Y_train)
model3.fit(X_train3, Y_train)

RandomForestRegressor()

### RandomForest Hyperparameters

**n_estimators:** Number of decision making tree
- Default = 10
- When increase it, the performance may get better, but may cause too much train time.<br>

**min_samples_split**: The minimum number of sample used to split node
- Used to control overfitting
- Default = 2: The smaller the value, the greater possibility of overfitting because of the increasing node split<br>

**min_samples_leaf**: The minimum number of samples to be leaf node
- Along to min_samples_split, it is used to control the overfitting
- When the data is imbalanced, some data of a specific class may extremely small, thus it needs to be kept the small value<br>

**max_features**: Maximum number of features for optimal split
- Default = 'auto'
    - Note: The default value of max_feature is none in decision tree
- When specified in int type: The number of features
- When specified in float type: The ratio of features
- 'sqrt' or 'auto': Samples as many as $\sqrt{The\;number\;of\;whole\;features}$
- log : Samples as many as $\log_2{(The\;number\;of\;whole\;features)}$<br>

**max_depth**: Maximum depth of the tree
- Default = none
    - Split until the class value is completely determined
    - Or until the number of data is less than min_samples_split
- As the depth increases, it may overfit, so proper control is required.<br>

**max_leaf_nodes**: The maximum number of leaf nodes

### GridSearchCV initializer
- estimator: classifier, regressor, pipeline, and so on.

- param_grid: In the dictionary type, input the parameters that are going to be used for parameter tuning.

- scoring: Method to evaluate the prediction performance. Usually set to accuracy.

- cv: Specifies the number of divisions in cross-validation(The number of fold).

- refit: The default value is True. When it is set default, it finds the optimal hyperparameter and retrains it.

- n_jobs: The default value is 1, Set -1 to use all cores.

In [13]:
from sklearn.model_selection import GridSearchCV
import time

model = RandomForestRegressor(criterion = 'squared_error',
                              random_state=2022)

# Preference for GridSearchCV
params = {'n_estimators': [200, 300, 500],
          'max_features': [5, 6, 8],
          'min_samples_leaf': [1, 3, 5]}

# Separate GridSearchCV for each model
greedy_CV1 = GridSearchCV(model1,
                          param_grid = params,
                          cv = 3,
                          n_jobs = -1,
                          verbose=2)

greedy_CV2 = GridSearchCV(model2,
                          param_grid = params,
                          cv = 3,
                          n_jobs = -1,
                          verbose=2)

greedy_CV3 = GridSearchCV(model3,
                          param_grid = params,
                          cv = 3,
                          n_jobs = -1,
                          verbose=2)

start_time = time.time()

# Train the model with each train data
greedy_CV1.fit(X_train1, Y_train)
greedy_CV2.fit(X_train2, Y_train)
greedy_CV3.fit(X_train3, Y_train)

end_time = time.time()

print(end_time-start_time)

Fitting 3 folds for each of 27 candidates, totalling 81 fits
Fitting 3 folds for each of 27 candidates, totalling 81 fits
Fitting 3 folds for each of 27 candidates, totalling 81 fits


27 fits failed out of a total of 81.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
27 fits failed with the following error:
Traceback (most recent call last):
  File "/home/raymond/anaconda3/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 680, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/raymond/anaconda3/lib/python3.9/site-packages/sklearn/ensemble/_forest.py", line 450, in fit
    trees = Parallel(
  File "/home/raymond/anaconda3/lib/python3.9/site-packages/joblib/parallel.py", line 1043, in __call__
    if self.dispatch_one_batch(iterator):
  File "/home/raymond/anaconda3/lib/python3.9/site-packages/joblib/parallel.py", line 861, in dispatch_one_batch
    self._dispatch(

20.009126663208008


In [14]:
# Predict with the model
prediction1 = greedy_CV1.predict(test1)
prediction2 = greedy_CV2.predict(test2)
prediction3 = greedy_CV3.predict(test3)

print(prediction1)
print(prediction2)
print(prediction3)

[ 97.74  214.305  82.48   38.185  72.06  125.79  181.56  304.565  44.4
 117.175 302.65  257.535 100.005  43.15  200.18  163.025  27.14  169.93
 335.46  159.795 224.225  85.985  27.475 140.08  133.92  116.725  26.93
 118.745 111.11  160.51   77.215  36.575  62.76  131.765 283.16   39.705
 135.25  103.15  233.845  84.745  61.55  123.305 163.88   88.385 327.975
 178.39   96.23   62.465  19.245  87.145 227.205  92.36  168.635  88.74
 197.24  139.945  51.425 175.54   25.905  19.72   92.345  82.185 258.2
 303.2   146.265 312.255  27.94  244.4   112.38   32.455 102.925  34.335
 124.425  15.395 318.785 227.815  35.975 171.735 231.08   27.545 248.265
 134.395  86.215  83.295  91.04  324.685  49.845 171.745 112.2   273.94
 292.135 159.215  65.48  101.275  41.865  80.64  102.63   27.885 220.145
 143.53   18.935 153.795  36.73  120.515  78.54   72.54  101.385  25.77
 176.775 123.345 179.    243.47  170.335 120.785  61.4   129.475 230.61
  39.125 201.34   19.205 103.71  106.26  175.085 123.68   51.

In [15]:
# Save the results in csv files
import numpy as np

GridSearchCV_result1 = pd.read_csv('data/submission.csv')
GridSearchCV_result1['count'] = np.round(prediction1, 2)

GridSearchCV_result2 = pd.read_csv('data/submission.csv')
GridSearchCV_result2['count'] = np.round(prediction2, 2)

GridSearchCV_result3 = pd.read_csv('data/submission.csv')
GridSearchCV_result3['count'] = np.round(prediction3, 2)

In [16]:
print(GridSearchCV_result1.head(), '\n')
print(GridSearchCV_result2.head(), '\n')
print(GridSearchCV_result3.head())

   id   count
0   0   97.74
1   1  214.30
2   2   82.48
3   4   38.18
4   5   72.06 

   id   count
0   0   99.89
1   1  207.40
2   2   76.83
3   4   32.95
4   5   64.37 

   id   count
0   0  108.35
1   1  207.19
2   2   88.80
3   4   49.33
4   5   57.73


In [17]:
GridSearchCV_result1.to_csv('GridSearchCV_result1.csv', index=False)
GridSearchCV_result2.to_csv('GridSearchCV_result2.csv', index=False)
GridSearchCV_result3.to_csv('GridSearchCV_result3.csv', index=False)