# 2.2 ClimateWins Keras-Convolution Neural Network

## Contents:

### 1. Importing Libraries and Data
- Libraries in line with Keras, 2.2 scripts
- Data: unscaled (observations), pleasant weather (predictions)

### 2. Data Wranging

### 3. Reshaping for ML modeling

### 4. Split data

### 5. Create Keras Model
- Start with CNN or RNN script for the HAR data

### 6. Compile and Run the model

### 7. Create confusion matrix of results
    

In [1]:
# Importing libraries for Keras, CNNs

import pandas as pd
import numpy as np
import seaborn as sns
import os
import operator
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
from tensorflow import keras
from numpy import unique
from numpy import reshape
from keras.models import Sequential
from keras.layers import Conv1D, Conv2D, Dense, BatchNormalization, Flatten, MaxPooling1D

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder


In [2]:
# Setting Path 

path = r'/Users/jeremyobach/Documents/Data Analytics/CareerFoundry/Specialization - Machine Learning/Real World Applications of Machine Learning/ML Achievement 2 MASTER FOLDER'
path

'/Users/jeremyobach/Documents/Data Analytics/CareerFoundry/Specialization - Machine Learning/Real World Applications of Machine Learning/ML Achievement 2 MASTER FOLDER'

In [3]:
# delimiting columns displayed
pd.options.display.max_columns = None

In [4]:
# Importing prediction data

pleasant = pd.read_csv (os.path.join(path, '02 Data', 'pleasant_wx_predictions.csv'), index_col = False)

In [5]:
# Importing unscaled observation data

unscaled =  pd.read_csv (os.path.join(path, '02 Data', 'wx_unscaled.csv'), index_col = False)

In [6]:
unscaled.shape

(22950, 170)

In [7]:
pleasant.shape

(22950, 16)

### 2. Data Wrangling

- Ensure wx data is structured with correct shape to feed the deep learning model.
    - drop DATE and MONTH from observations, DATE from predictions
    - drop 3 wx stations not included in pleasant_wx
    - 2 types of observations (columns) are missing multiple years for most wx stations; remove them
    - 3 individual observations need to be filled in
    - Export data as "Cleaned" version, X shape should be (22950, 135) and y shape should be (22950, 15)

In [8]:
# dropping all columns regarding Gdansk, Roma, Tours from unscaled, as they aren't included in pleasant_wx


columns_to_drop = ['GDANSK_cloud_cover', 'GDANSK_humidity', 'GDANSK_precipitation', 
                   'GDANSK_snow_depth', 'GDANSK_temp_mean', 'GDANSK_temp_min', 
                   'GDANSK_temp_max', 'ROMA_cloud_cover', 'ROMA_wind_speed', 
                   'ROMA_humidity', 'ROMA_pressure', 'ROMA_sunshine', 'ROMA_temp_mean',
                   'TOURS_wind_speed', 'TOURS_humidity', 'TOURS_pressure',
                   'TOURS_global_radiation', 'TOURS_precipitation', 'TOURS_temp_mean', 
                   'TOURS_temp_min', 'TOURS_temp_max']

# Using the drop() method to drop the specified columns
unscaled.drop(columns=columns_to_drop, inplace=True)

In [9]:
unscaled.shape 

(22950, 149)

In [10]:
unscaled.drop(columns = ['DATE', 'MONTH'], inplace = True)

In [11]:
pleasant.drop(columns = 'DATE', inplace = True) 

In [12]:
pleasant.shape

# prediction dataset is correct shape

(22950, 15)

Two types of observations are missing multiple years for most weather stations. Remove them.

- wind speed (only 11 stations)
- snow depth (only 7 stations)

In [13]:
further_drops = ['BASEL_snow_depth',  'DUSSELDORF_snow_depth', 'HEATHROW_snow_depth',
                  'MUNCHENB_snow_depth', 'OSLO_snow_depth',  'VALENTIA_snow_depth',
                 'BASEL_wind_speed', 'DEBILT_wind_speed', 'DUSSELDORF_wind_speed',
                  'KASSEL_wind_speed', 'LJUBLJANA_wind_speed',  'MAASTRICHT_wind_speed',
                  'MADRID_wind_speed', 'OSLO_wind_speed','SONNBLICK_wind_speed',]

unscaled.drop(columns=further_drops, inplace=True)

In [14]:
# Inserting "dummy values" for the 3 that are still missing

# missing 'Kassel_cloud_cover', with 'Ljubjana_cloud_cover'
# missing 'Munchenb_pressure' with 'Sonnblick_pressure'
# missing 'Stockholm_humidity' with Oslo_humidity'

unscaled.columns.get_loc('HEATHROW_temp_max')

53

In [15]:
unscaled.columns.get_loc('MUNCHENB_humidity')

90

In [16]:
unscaled.columns.get_loc('STOCKHOLM_cloud_cover')

115

In [17]:
unscaled.insert(54,'KASSEL_cloud_cover', unscaled['LJUBLJANA_cloud_cover'])
unscaled.insert(92,'MUNCHENB_pressure',unscaled['SONNBLICK_pressure'])
unscaled.insert(118, 'STOCKHOLM_humidity', unscaled['OSLO_humidity'])
unscaled.columns.tolist()

['BASEL_cloud_cover',
 'BASEL_humidity',
 'BASEL_pressure',
 'BASEL_global_radiation',
 'BASEL_precipitation',
 'BASEL_sunshine',
 'BASEL_temp_mean',
 'BASEL_temp_min',
 'BASEL_temp_max',
 'BELGRADE_cloud_cover',
 'BELGRADE_humidity',
 'BELGRADE_pressure',
 'BELGRADE_global_radiation',
 'BELGRADE_precipitation',
 'BELGRADE_sunshine',
 'BELGRADE_temp_mean',
 'BELGRADE_temp_min',
 'BELGRADE_temp_max',
 'BUDAPEST_cloud_cover',
 'BUDAPEST_humidity',
 'BUDAPEST_pressure',
 'BUDAPEST_global_radiation',
 'BUDAPEST_precipitation',
 'BUDAPEST_sunshine',
 'BUDAPEST_temp_mean',
 'BUDAPEST_temp_min',
 'BUDAPEST_temp_max',
 'DEBILT_cloud_cover',
 'DEBILT_humidity',
 'DEBILT_pressure',
 'DEBILT_global_radiation',
 'DEBILT_precipitation',
 'DEBILT_sunshine',
 'DEBILT_temp_mean',
 'DEBILT_temp_min',
 'DEBILT_temp_max',
 'DUSSELDORF_cloud_cover',
 'DUSSELDORF_humidity',
 'DUSSELDORF_pressure',
 'DUSSELDORF_global_radiation',
 'DUSSELDORF_precipitation',
 'DUSSELDORF_sunshine',
 'DUSSELDORF_temp_mean',


In [18]:
unscaled.shape

(22950, 135)

In [19]:
unscaled.to_pickle(os.path.join(path, '02 Data', 'X_cleaned.pkl'))

### 3. Reshaping for ML modeling
- Ensure the layers can be fed to the deep learning model correctly. 
- You’ll need to split the observations (X) into 15 groups of 9 types of observations, and your labels (y) should also be in 15 groups 
    - (it doesn’t need to be transformed or reshaped). 
        - The final shapes should be X = (22950, 15, 9) and y = (22950, 15).


In [20]:
X = pd.read_pickle(os.path.join(path, '02 Data', 'X_cleaned.pkl'))

In [21]:
y = pleasant

In [22]:
X.shape

(22950, 135)

In [23]:
# turning X and y from a df to arrays

X = np.array(X)
y = np.array(y)

In [24]:
X = X.reshape(-1,15,9)

In [25]:
# verifying array shape

X

array([[[  7.    ,   0.85  ,   1.018 , ...,   6.5   ,   0.8   ,
          10.9   ],
        [  1.    ,   0.81  ,   1.0195, ...,   3.7   ,  -0.9   ,
           7.9   ],
        [  4.    ,   0.67  ,   1.017 , ...,   2.4   ,  -0.4   ,
           5.1   ],
        ...,
        [  4.    ,   0.73  ,   1.0304, ...,  -5.9   ,  -8.5   ,
          -3.2   ],
        [  5.    ,   0.98  ,   1.0114, ...,   4.2   ,   2.2   ,
           4.9   ],
        [  5.    ,   0.88  ,   1.0003, ...,   8.5   ,   6.    ,
          10.9   ]],

       [[  6.    ,   0.84  ,   1.018 , ...,   6.1   ,   3.3   ,
          10.1   ],
        [  6.    ,   0.84  ,   1.0172, ...,   2.9   ,   2.2   ,
           4.4   ],
        [  4.    ,   0.67  ,   1.017 , ...,   2.3   ,   1.4   ,
           3.1   ],
        ...,
        [  6.    ,   0.97  ,   1.0292, ...,  -9.5   , -10.5   ,
          -8.5   ],
        [  5.    ,   0.62  ,   1.0114, ...,   4.    ,   3.    ,
           5.    ],
        [  7.    ,   0.91  ,   1.0007, ...,   8.

### 4. Splitting data (training and test sets)

In [26]:
#Split data into training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state = 39)

### 5. Keras Model

In [159]:
epochs = 32
batch_size = 20
n_hidden = 32

timesteps = len(X_train[0])
input_dim = len(X_train[0][0])
n_classes = len(y_train[0])

model = Sequential()
model.add(Conv1D(n_hidden, kernel_size=3, activation='relu', input_shape=(timesteps, input_dim)))
model.add(Dense(16, activation='relu'))
model.add(MaxPooling1D())
model.add(Flatten())
model.add(Dense(n_classes, activation='sigmoid')) #sigmoid, tanh, softmax

In [160]:
model.summary()

In [161]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [162]:
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, verbose=2)

Epoch 1/32
861/861 - 1s - 776us/step - accuracy: 0.1250 - loss: 2781.9458
Epoch 2/32
861/861 - 0s - 416us/step - accuracy: 0.1333 - loss: 24143.6504
Epoch 3/32
861/861 - 0s - 412us/step - accuracy: 0.1384 - loss: 83097.7031
Epoch 4/32
861/861 - 0s - 411us/step - accuracy: 0.1353 - loss: 183371.7188
Epoch 5/32
861/861 - 0s - 412us/step - accuracy: 0.1396 - loss: 340290.0000
Epoch 6/32
861/861 - 0s - 414us/step - accuracy: 0.1346 - loss: 538112.1875
Epoch 7/32
861/861 - 0s - 409us/step - accuracy: 0.1330 - loss: 796954.4375
Epoch 8/32
861/861 - 0s - 414us/step - accuracy: 0.1332 - loss: 1093977.0000
Epoch 9/32
861/861 - 0s - 413us/step - accuracy: 0.1336 - loss: 1486309.2500
Epoch 10/32
861/861 - 0s - 411us/step - accuracy: 0.1276 - loss: 1891779.1250
Epoch 11/32
861/861 - 0s - 415us/step - accuracy: 0.1271 - loss: 2399655.2500
Epoch 12/32
861/861 - 0s - 412us/step - accuracy: 0.1261 - loss: 2957358.7500
Epoch 13/32
861/861 - 0s - 415us/step - accuracy: 0.1272 - loss: 3587205.2500
Epoch 

<keras.src.callbacks.history.History at 0x3203e1890>

In [163]:
#Change this to Weather true/false
activities = {
0: 'BASEL',
1: 'BELGRADE',
2: 'BUDAPEST',
3: 'DEBILT',
4: 'DUSSELDORF',
5: 'HEATHROW',
6: 'KASSEL',
7: 'LJUBLJANA',
8: 'MAASTRICHT',
9: 'MADRID',
10: 'MUNCHENB',
11: 'OSLO',
12: 'SONNBLICK',
13: 'STOCKHOLM',
14: 'VALENTIA'

}

In [164]:
def confusion_matrix(Y_true, Y_pred):
    Y_true = pd.Series([activities[y] for y in np.argmax(Y_true, axis=1)])
    Y_pred = pd.Series([activities[y] for y in np.argmax(Y_pred, axis=1)])

    return pd.crosstab(Y_true, Y_pred, rownames=['True'], colnames=['Pred'])

In [165]:
# Evaluate
print(confusion_matrix(y_test, model.predict(X_test)))

[1m180/180[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 400us/step
Pred        BASEL
True             
BASEL        3707
BELGRADE     1081
BUDAPEST      196
DEBILT         89
DUSSELDORF     33
HEATHROW      107
KASSEL         15
LJUBLJANA      69
MAASTRICHT     10
MADRID        409
MUNCHENB       10
OSLO            7
STOCKHOLM       1
VALENTIA        4
