# ClimateWins 2.4: Machine Learning, Optimizing Hyperparameters

  
ClimateWins has been contacted by an Air Ambulance company that’s noticed its work on weather prediction. It’s interested in predicting days with high precision that are safe to fly its helicopters. ClimateWins has asked you to start the process by finding optimized hyperparameters in your deep learning and random forest models. Having a highly optimized model will lead to better predictions based on past data and, thus, the ability to predict future data in many different applications. Do a hyperparameter search on your models, iterating to define small sections if needed.

### Table of Contents
#### Part II
[1. Import libraries and dataset](#1.-Import-libarires-and-dataset)  
[2. Split the data](#2.-Split-the-data)  
[3. The Bayesian optimization](#3.-The-Bayesian-optimization)


### Part III
4. Iteration

### 1. Import libarires and dataset

In [17]:
import pandas as pd
import numpy as np
import seaborn as sns
import os
import operator
import time
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
from numpy import unique
from numpy import reshape
from keras.models import Sequential
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from keras.layers import Conv1D, Conv2D, Dense, Dropout, BatchNormalization, Flatten, MaxPooling1D
from keras.layers import Dense, Dropout
from keras.optimizers import Adam, SGD, RMSprop, Adadelta, Adagrad, Adamax, Nadam, Ftrl
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.wrappers.scikit_learn import KerasClassifier
from math import floor
from sklearn.metrics import make_scorer, accuracy_score
from bayes_opt import BayesianOptimization
from sklearn.model_selection import StratifiedKFold
from keras.layers import LeakyReLU
LeakyReLU = LeakyReLU(alpha=0.1)
import warnings
warnings.filterwarnings('ignore')
pd.set_option("display.max_columns", None)

In [18]:
path = r'C:\Users\jinu5\Desktop\careerfoundry\ML\DataSet\original data'
df = pd.read_csv(os.path.join(path, 'df_cleaned.csv'), index_col=False)
df_y = pd.read_csv(os.path.join(path, 'Dataset-Answers-Weather_Prediction_Pleasant_Weather.csv'), index_col=False)

In [19]:
city_dict = {
 0: 'BASEL',
 1: 'BELGRADE',
 2: 'BUDAPEST',
 3: 'DEBILT',
 4: 'DUSSELDORF',
 5: 'HEATHROW',
 6: 'KASSEL',
 7: 'LJUBLJANA',
 8: 'MAASTRICHT',
 9: 'MADRID',
 10: 'MUNCHENB',
 11: 'OSLO',
 12: 'SONNBLICK',
 13: 'STOCKHOLM',
 14: 'VALENTIA'
}

In [20]:
obtype = [
    'cloud_cover',
 'humidity',
 'pressure',
 'global_radiation',
 'precipitation',
 'sunshine',
 'temp_mean',
 'temp_min',
 'temp_max'
]

In [21]:
tf.random.set_seed(23)

### ------

### 2. Split the data

In [22]:
dfy = df_y.drop(['DATE'], axis=1)

In [23]:
# creating X,y dataframe with Numpy type
X=df.to_numpy()
y=pd.get_dummies(dfy).to_numpy()

In [24]:
X.shape

(22950, 135)

In [25]:
X = X.reshape(-1,15,9)

In [26]:
X.shape

(22950, 15, 9)

In [28]:
y

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [29]:
y.shape

(22950, 15)

In [30]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=23)

In [31]:
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

(17212, 15, 9) (17212, 15)
(5738, 15, 9) (5738, 15)


In [32]:
from sklearn.utils.multiclass import type_of_target
type_of_target(y_train)

'multilabel-indicator'

In [101]:
type_of_target(y_train)

'multiclass'

In [102]:
y_train

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

In [33]:
# y_train is one-hot encoding now,
# Use argmax to get rid of on-hot encoding and supply the numerical value.
Y_train = np.argmax(y_train, axis = 1)
print(Y_train.shape)
Y_train

(17212,)


array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

In [34]:
type_of_target(Y_train)

'multiclass'

In [35]:
# n_classes = len(y_train[0])
len(y_train[0])

15

### 3. The Bayesian optimization
[Go back to the Table of Contents](#Table-of-Contents)

In [56]:
timesteps = len(X_train[0])
input_dim = len(X_train[0][0])
n_classes = 15 
# Make scorer accuracy
score_acc = make_scorer(accuracy_score)

In [57]:
len(X_train[0][0])

9

In [58]:
# Create function
def bay_area(neurons, activation, kernel, optimizer, learning_rate, batch_size, epochs,
              layers1, layers2, normalization, dropout, dropout_rate): 
    optimizerL = ['SGD', 'Adam', 'RMSprop', 'Adadelta', 'Adagrad', 'Adamax', 'Nadam', 'Ftrl','SGD']
    optimizerD= {'Adam':Adam(lr=learning_rate), 'SGD':SGD(lr=learning_rate),
                 'RMSprop':RMSprop(lr=learning_rate), 'Adadelta':Adadelta(lr=learning_rate),
                 'Adagrad':Adagrad(lr=learning_rate), 'Adamax':Adamax(lr=learning_rate),
                 'Nadam':Nadam(lr=learning_rate), 'Ftrl':Ftrl(lr=learning_rate)}
    activationL = ['relu', 'sigmoid', 'softplus', 'softsign', 'tanh', 'selu',
                   'elu', 'exponential', LeakyReLU,'relu']
    
    neurons = round(neurons)
    kernel = round(kernel)
    activation = activationL[round(activation)]
    optimizer = optimizerD[optimizerL[round(optimizer)]]
    batch_size = round(batch_size)
    
    epochs = round(epochs)
    layers1 = round(layers1)
    layers2 = round(layers2)
    
    def cnn_model():
        model = Sequential()
        model.add(Conv1D(neurons, kernel_size=kernel,activation=activation, input_shape=(timesteps, input_dim)))
        #model.add(Conv1D(32, kernel_size=1,activation='relu', input_shape=(timesteps, input_dim)))
        
        if normalization > 0.5:
            model.add(BatchNormalization())
        for i in range(layers1):
            model.add(Dense(neurons, activation=activation)) #(neurons, activation=activation))
        if dropout > 0.5:
            model.add(Dropout(dropout_rate, seed=123))
        for i in range(layers2):
            model.add(Dense(neurons, activation=activation))
        model.add(MaxPooling1D())
        model.add(Flatten())
        model.add(Dense(n_classes, activation='softmax')) #sigmoid softmax
        #model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy']) #categorical_crossentropy
        model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy']) #categorical_crossentropy
        return model
    es = EarlyStopping(monitor='accuracy', mode='max', verbose=2, patience=20)
    nn = KerasClassifier(build_fn=cnn_model, epochs=epochs, batch_size=batch_size, verbose=2)
    kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=123)
    score = cross_val_score(nn, X_train, Y_train, scoring=score_acc, cv=kfold, fit_params={'callbacks':[es]}).mean()
    return score

### for test, 'init_points' be set as only 10.  
##### nn_opt.maximize(init_points=10, n_iter=4) #25 


In [59]:
start = time.time()
params ={
    'neurons': (10, 100),
    'kernel': (1, 3),
    'activation':(0, 9), #9
    'optimizer':(0,7), #7
    'learning_rate':(0.01, 1),
    'batch_size': (200, 1000), #(10, 50), #
    'epochs':(20, 100),
    'layers1':(1,3),
    'layers2':(1,3),
    'normalization':(0,1),
    'dropout':(0,1),
    'dropout_rate':(0,0.3)
}
# Run Bayesian Optimization
nn_opt = BayesianOptimization(bay_area, params, random_state=42)
nn_opt.maximize(init_points=10, n_iter=4) #25
print('Search took %s minutes' % ((time.time() - start)/60))

|   iter    |  target   | activa... | batch_... |  dropout  | dropou... |  epochs   |  kernel   |  layers1  |  layers2  | learni... |  neurons  | normal... | optimizer |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Epoch 1/32
15/15 - 2s - loss: 2.7069 - accuracy: 0.6016 - 2s/epoch - 138ms/step
Epoch 2/32
15/15 - 1s - loss: 2.7004 - accuracy: 0.6451 - 984ms/epoch - 66ms/step
Epoch 3/32
15/15 - 1s - loss: 2.6970 - accuracy: 0.6451 - 964ms/epoch - 64ms/step
Epoch 4/32
15/15 - 1s - loss: 2.6942 - accuracy: 0.6451 - 857ms/epoch - 57ms/step
Epoch 5/32
15/15 - 1s - loss: 2.6916 - accuracy: 0.6451 - 820ms/epoch - 55ms/step
Epoch 6/32
15/15 - 1s - loss: 2.6894 - accuracy: 0.6451 - 883ms/epoch - 59ms/step
Epoch 7/32
15/15 - 1s - loss: 2.6872 - accuracy: 0.6451 - 1s/epoch - 92ms/step
Epoch 8/32
15/15 - 1s - loss: 2.6852 - accuracy: 0.6451 - 1s/epoch - 83ms/step
Epoch 9/32
1

ValueError: Input y contains NaN.

In [60]:
optimum = nn_opt.max['params']
learning_rate = optimum['learning_rate']
activationL = ['relu', 'sigmoid', 'softplus', 'softsign', 'tanh', 'selu',
               'elu', 'exponential', LeakyReLU,'relu']
optimum['activation'] = activationL[round(optimum['activation'])]
optimum['batch_size'] = round(optimum['batch_size'])
optimum['epochs'] = round(optimum['epochs'])
optimum['layers1'] = round(optimum['layers1'])
optimum['layers2'] = round(optimum['layers2'])
optimum['neurons'] = round(optimum['neurons'])
optimizerL = ['Adam', 'SGD', 'RMSprop', 'Adadelta', 'Adagrad', 'Adamax', 'Nadam', 'Ftrl','Adam']
optimizerD= {'Adam':Adam(lr=learning_rate), 'SGD':SGD(lr=learning_rate),
             'RMSprop':RMSprop(lr=learning_rate), 'Adadelta':Adadelta(lr=learning_rate),
             'Adagrad':Adagrad(lr=learning_rate), 'Adamax':Adamax(lr=learning_rate),
             'Nadam':Nadam(lr=learning_rate), 'Ftrl':Ftrl(lr=learning_rate)}
optimum['optimizer'] = optimizerD[optimizerL[round(optimum['optimizer'])]]
optimum

{'activation': 'elu',
 'batch_size': 277,
 'dropout': 0.22164503518759693,
 'dropout_rate': 0.010262915259696382,
 'epochs': 29,
 'kernel': 2.442337694118109,
 'layers1': 1,
 'layers2': 2,
 'learning_rate': 0.6874254256386801,
 'neurons': 67,
 'normalization': 0.8969842748803915,
 'optimizer': <keras.optimizers.legacy.rmsprop.RMSprop at 0x18d03ff0650>}

In [76]:
####Need to add learning rate to the optimizer type??
epochs = 29
batch_size = 277
#n_hidden = 64

timesteps = len(X_train[0])
input_dim = len(X_train[0][0])
n_classes = 15 #_count_classes(Y_train)
layers1 = 1
layers2 = 2
activation = 'elu'
kernel = 3
neurons = 67
normalization = 0.896
dropout = 0.22164503518759693
dropout_rate =0.010262915259696382
optimizer = 'RMSprop'

model = Sequential()
model.add(Conv1D(neurons, kernel_size=kernel, activation=activation, input_shape=(timesteps, input_dim)))
if normalization > 0.5:
    model.add(BatchNormalization())
for i in range(layers1):
    model.add(Dense(neurons, activation=activation))
if dropout > 0.5:
    model.add(Dropout(dropout_rate, seed=123))
for i in range(layers2):
    model.add(Dense(neurons, activation=activation))
model.add(MaxPooling1D())
model.add(Flatten())
model.add(Dense(n_classes, activation='softmax')) #softmax sigmoid
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy']) #binary_crossentropy

In [77]:
model.fit(X_train, Y_train, batch_size=batch_size, epochs=epochs, verbose=2)

Epoch 1/29
63/63 - 2s - loss: 0.8778 - accuracy: 0.7018 - 2s/epoch - 35ms/step
Epoch 2/29
63/63 - 1s - loss: 0.6828 - accuracy: 0.7637 - 1s/epoch - 16ms/step
Epoch 3/29
63/63 - 1s - loss: 0.6244 - accuracy: 0.7786 - 987ms/epoch - 16ms/step
Epoch 4/29
63/63 - 1s - loss: 0.5786 - accuracy: 0.7909 - 1s/epoch - 16ms/step
Epoch 5/29
63/63 - 1s - loss: 0.5461 - accuracy: 0.8041 - 992ms/epoch - 16ms/step
Epoch 6/29
63/63 - 1s - loss: 0.5166 - accuracy: 0.8109 - 1s/epoch - 16ms/step
Epoch 7/29
63/63 - 1s - loss: 0.4957 - accuracy: 0.8201 - 968ms/epoch - 15ms/step
Epoch 8/29
63/63 - 1s - loss: 0.4697 - accuracy: 0.8304 - 1s/epoch - 16ms/step
Epoch 9/29
63/63 - 1s - loss: 0.4506 - accuracy: 0.8348 - 963ms/epoch - 15ms/step
Epoch 10/29
63/63 - 1s - loss: 0.4311 - accuracy: 0.8428 - 963ms/epoch - 15ms/step
Epoch 11/29
63/63 - 1s - loss: 0.4093 - accuracy: 0.8500 - 1s/epoch - 17ms/step
Epoch 12/29
63/63 - 1s - loss: 0.3929 - accuracy: 0.8554 - 979ms/epoch - 16ms/step
Epoch 13/29
63/63 - 1s - loss: 

<keras.callbacks.History at 0x18d2f2d6dd0>

In [78]:
def confusion_matrix(Y_true, Y_pred):
    Y_true = pd.Series([city_dict[y] for y in np.argmax(Y_true, axis=1)])
    Y_pred = pd.Series([city_dict[y] for y in np.argmax(Y_pred, axis=1)])

    return pd.crosstab(Y_true, Y_pred, rownames=['True'], colnames=['Pred'])

In [79]:
print(confusion_matrix(y_test, model.predict(X_test)))

Pred        BASEL  BELGRADE  BUDAPEST  DEBILT  DUSSELDORF  HEATHROW  KASSEL  \
True                                                                          
BASEL        2991       208       113      51          19        24       7   
BELGRADE       57       823        87      32           2        11       5   
BUDAPEST       12        25       132      18           2         2       1   
DEBILT          7         5         4      59          11         5       1   
DUSSELDORF      2         2         3       4          15         3       0   
HEATHROW        2         3         7       4           2        64       1   
KASSEL          2         2         3       0           1         1       0   
LJUBLJANA       6         1         7       0           3         2       0   
MAASTRICHT      0         0         0       0           4         0       0   
MADRID          4         2        15       1           1        17       0   
MUNCHENB        0         1         0       0       

### for higher accuracy, 'init_points' be set as 25.  
##### nn_opt.maximize(init_points=25, n_iter=4) #25 


In [70]:
start = time.time()
params ={
    'neurons': (10, 100),
    'kernel': (1, 3),
    'activation':(0, 9), #9
    'optimizer':(0,7), #7
    'learning_rate':(0.01, 1),
    'batch_size': (200, 1000), #(10, 50), #
    'epochs':(20, 100),
    'layers1':(1,3),
    'layers2':(1,3),
    'normalization':(0,1),
    'dropout':(0,1),
    'dropout_rate':(0,0.3)
}
# Run Bayesian Optimization
nn_opt = BayesianOptimization(bay_area, params, random_state=42)
nn_opt.maximize(init_points=25, n_iter=4) #25
print('Search took %s minutes' % ((time.time() - start)/60))

|   iter    |  target   | activa... | batch_... |  dropout  | dropou... |  epochs   |  kernel   |  layers1  |  layers2  | learni... |  neurons  | normal... | optimizer |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Epoch 1/32
15/15 - 3s - loss: 2.7081 - accuracy: 0.6008 - 3s/epoch - 190ms/step
Epoch 2/32
15/15 - 1s - loss: 2.7004 - accuracy: 0.6451 - 1s/epoch - 69ms/step
Epoch 3/32
15/15 - 1s - loss: 2.6971 - accuracy: 0.6451 - 1s/epoch - 68ms/step
Epoch 4/32
15/15 - 1s - loss: 2.6942 - accuracy: 0.6451 - 1s/epoch - 67ms/step
Epoch 5/32
15/15 - 1s - loss: 2.6917 - accuracy: 0.6451 - 1s/epoch - 68ms/step
Epoch 6/32
15/15 - 1s - loss: 2.6894 - accuracy: 0.6451 - 1s/epoch - 68ms/step
Epoch 7/32
15/15 - 1s - loss: 2.6873 - accuracy: 0.6451 - 1s/epoch - 68ms/step
Epoch 8/32
15/15 - 1s - loss: 2.6853 - accuracy: 0.6451 - 1s/epoch - 68ms/step
Epoch 9/32
15/15 - 1s - los

ValueError: Input y contains NaN.

In [71]:
optimum = nn_opt.max['params']
learning_rate = optimum['learning_rate']
activationL = ['relu', 'sigmoid', 'softplus', 'softsign', 'tanh', 'selu',
               'elu', 'exponential', LeakyReLU,'relu']
optimum['activation'] = activationL[round(optimum['activation'])]
optimum['batch_size'] = round(optimum['batch_size'])
optimum['epochs'] = round(optimum['epochs'])
optimum['layers1'] = round(optimum['layers1'])
optimum['layers2'] = round(optimum['layers2'])
optimum['neurons'] = round(optimum['neurons'])
optimizerL = ['Adam', 'SGD', 'RMSprop', 'Adadelta', 'Adagrad', 'Adamax', 'Nadam', 'Ftrl','Adam']
optimizerD= {'Adam':Adam(lr=learning_rate), 'SGD':SGD(lr=learning_rate),
             'RMSprop':RMSprop(lr=learning_rate), 'Adadelta':Adadelta(lr=learning_rate),
             'Adagrad':Adagrad(lr=learning_rate), 'Adamax':Adamax(lr=learning_rate),
             'Nadam':Nadam(lr=learning_rate), 'Ftrl':Ftrl(lr=learning_rate)}
optimum['optimizer'] = optimizerD[optimizerL[round(optimum['optimizer'])]]
optimum

{'activation': <keras.layers.activation.leaky_relu.LeakyReLU at 0x18d6bf93a50>,
 'batch_size': 706,
 'dropout': 0.3390297910487007,
 'dropout_rate': 0.10476287238379826,
 'epochs': 78,
 'kernel': 2.794220519905154,
 'layers1': 3,
 'layers2': 3,
 'learning_rate': 0.6456113296927449,
 'neurons': 18,
 'normalization': 0.16162871409461377,
 'optimizer': <keras.optimizers.legacy.nadam.Nadam at 0x18d3840a4d0>}

In [99]:
from keras.optimizers import Nadam

epochs = 78
batch_size = 706
n_hidden = 64

timesteps = len(X_train[0])
input_dim = len(X_train[0][0])
n_classes = 15 
layers1 = 3
layers2 = 3
activation = LeakyReLU
kernel = 2.794220519905154
neurons = 18
normalization = 0.16162871409461377
dropout = 0.3390297910487007
dropout_rate = 0.10476287238379826
learning_rate = 0.6456113296927449

optimizer = Nadam(learning_rate=0.6456113296927449)

model = Sequential()
model.add(Conv1D(neurons, kernel_size=int(kernel), activation=activation, input_shape=(timesteps, input_dim)))
if normalization > 0.5:
    model.add(BatchNormalization())
for i in range(layers1):
    model.add(Dense(neurons, activation=activation))
if dropout > 0.5:
    model.add(Dropout(dropout_rate, seed=123))
for i in range(layers2):
    model.add(Dense(neurons, activation=activation))
model.add(MaxPooling1D())
model.add(Flatten())
model.add(Dense(n_classes, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])


In [100]:
model.fit(X_train, Y_train, batch_size=batch_size, epochs=epochs, verbose=2)

Epoch 1/78
25/25 - 2s - loss: 47740500.0000 - accuracy: 0.3890 - 2s/epoch - 82ms/step
Epoch 2/78
25/25 - 0s - loss: 239487.0312 - accuracy: 0.4712 - 303ms/epoch - 12ms/step
Epoch 3/78
25/25 - 0s - loss: 10747.9639 - accuracy: 0.5688 - 299ms/epoch - 12ms/step
Epoch 4/78
25/25 - 0s - loss: 5788.5073 - accuracy: 0.5659 - 339ms/epoch - 14ms/step
Epoch 5/78
25/25 - 0s - loss: 3991.8699 - accuracy: 0.5640 - 352ms/epoch - 14ms/step
Epoch 6/78
25/25 - 0s - loss: 3195.7712 - accuracy: 0.5495 - 306ms/epoch - 12ms/step
Epoch 7/78
25/25 - 0s - loss: 2758.0210 - accuracy: 0.5498 - 308ms/epoch - 12ms/step
Epoch 8/78
25/25 - 0s - loss: 2413.5681 - accuracy: 0.5445 - 306ms/epoch - 12ms/step
Epoch 9/78
25/25 - 0s - loss: 2102.1936 - accuracy: 0.5455 - 313ms/epoch - 13ms/step
Epoch 10/78
25/25 - 0s - loss: 2005.4667 - accuracy: 0.5411 - 323ms/epoch - 13ms/step
Epoch 11/78
25/25 - 0s - loss: 1807.8378 - accuracy: 0.5381 - 329ms/epoch - 13ms/step
Epoch 12/78
25/25 - 0s - loss: 1723.5496 - accuracy: 0.5331

<keras.callbacks.History at 0x18d35618950>

In [101]:
print(confusion_matrix(y_test, model.predict(X_test)))

Pred        BASEL  BELGRADE  BUDAPEST  DEBILT  DUSSELDORF  HEATHROW  KASSEL  \
True                                                                          
BASEL        2628        28       426     117           9        53       8   
BELGRADE      550         0       410      50           7        29       1   
BUDAPEST       96         0        85      15           1         8       0   
DEBILT         47         0        41       6           0         1       0   
DUSSELDORF     18         0        10       1           1         2       0   
HEATHROW       47         0        34       3           0         7       0   
KASSEL          3         0         9       2           0         1       0   
LJUBLJANA      31         0        25       3           0         1       0   
MAASTRICHT      6         0         0       0           0         0       0   
MADRID        218         7        84      26           3        17       3   
MUNCHENB        0         0         2       0       

### Above is the final confusion matrix

### Compare your new values with the ones from your model in Exercise 2.2. Has the optimization helped? Does it seem to be overfitting the data? Record your observations in the same document from Part 1, along with a screenshot of the final confusion matrix.

After Bayesian optimization, the first 15x15 convergence metric was generated, but I question the accuracy of this. If it was overfitted, most of the numbers should be located along the diagonal of the convergence matrix, but instead, this convergence matrix shows a lot of inaccuracies. 

In [69]:
## for comparison, the previous confusion matrix is brought from Exercise 2.2. 
# Pred        BASEL  BELGRADE  BUDAPEST  DEBILT  DUSSELDORF  HEATHROW  KASSEL  LJUBLJANA  MAASTRICHT  MADRID  MUNCHENB  OSLO  STOCKHOLM  VALENTIA  
# True                                                                          
# BASEL        3488        97         7       4           4         5       0          7           2      67         0     0          0         1  
# #BELGRADE      35      1022        16       2           0         1       0          2           1      12         1     0          0         0  
# BUDAPEST       11        39       134       4           0         2       3          5           0      16         0     0          0         0
# DEBILT          5         6         3      58           3         4       0          0           0       3         0     0          0         0
# DUSSELDORF      4         3         3       4           9         5       0          0           0       1         0     0          0         0 
# HEATHROW        6         3         2       2           0        50       1          0           0      18         0     0          0         0
# KASSEL          0         2         1       0           0         1       5          2           0       0         0     0          0         0  
# LJUBLJANA       5         1         2       1           0         3       0         29           0      19         0     1          0         0 
# MAASTRICHT      5         0         0       0           1         1       1          0           1       0         0     0          0         0  
# MADRID         27         8         7       2           2         3       1          7           0     401         0     0          0         0   
# MUNCHENB        1         2         0       0           0         0       0          1           0       2         1     0          0         1  
# OSLO            3         0         1       0           0         0       0          0           0       1         0     0          0         0 
# STOCKHOLM       2         0         0       1           0         0       0          0           0       0         0     0          1         0 
# VALENTIA        0         0         0       0           0         0       0          0           0       0         0     0          0         1

### Part 3
4. Iteration
[Go back to the Table of Contents](#Table-of-Contents)  luate.

#### 1. In this same document, write out how you might break the data down into smaller components to test and iterate upon. Which model would you use for each iteration?

In Exercise 2.2, you learned that different climate factors have different degrees of importance in different cities. 
And when we analyzed data from stations across Europe, the accuracy was lower than when we analyzed a single station. 
Predicting the weather for all of Europe should be done by breaking it down into smaller components and then combining the results. 
For example, in Exercise 2.2, we used data from other cities with similar climates to clean up missing or strange data. 

Hence, I expect that with KNN or K-means clustering, we can divide the European cities into a few clusters. 
And by analyzing the climate of these clusters, I think we can make more accurate climate predictions for the whole of Europe. 


#### 2. Expand on your observations from the random forest and deep learning models.                                                           
Random forests and deep learning each have different implications. The main thing that random forests can tell us is importance. You can see which cities or which climate factors have more influence on pleasant weather. Deep learning models like CNN are thought to be used to learn and predict data. 

#### 3. What variables would you recommend that Air Ambulance pay the most attention to while deciding whether it’s safe to fly? 
Air Ambulance, which uses helicopters, needs to pay attention to Pleasant day, a climate factor that ClimateWins is primarily interested in. 
Pleasant day is the most commonly applicable criterion for outdoor activities. 
In addition, I would recommend that Air Ambulance pay more attention to wind speed, cloud cover, and precipitation. However, since all of these factors combine to determine a pleasant day, I recommend that Air Ambulance also prioritize a pleasant day. 