Link for original live copy is here : 
[Google Colab link (with comment enabled)](https://colab.research.google.com/drive/1VcIU5kY3YHIIWShgLgywwjfkJa9XW6qZ)


Imports necessary packages.


1.   Basically numpy is a math and matrix based package
2.   pandas is a large dataset management package
3.   keras is a neural network library
4.   sklearn is a datascience based package



In [0]:
from math import sqrt
from numpy import array
from matplotlib import pyplot
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from pandas import to_datetime
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Flatten
from keras.layers import Dropout
from keras.layers import TimeDistributed
from keras import callbacks
from keras import losses
from keras import activations
from keras.layers import Activation

import os

In [0]:
from keras.utils.vis_utils import plot_model

In [0]:
from google.colab import drive
drive.mount('/gdrive')

# with open('/gdrive/My Drive/foo.txt', 'w') as f:
#   f.write('Hello Google Drive!')
# !cat '/gdrive/My Drive/foo.txt'


Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).


#For following code 

Functions to organize and save training data in chunks of 'n', and get testing output data based on those chunks (n+1th row)

In [0]:
def getTrain(data,chunk):
    train = []
    for dataVal in range(len(data) - chunk):
        train.append(data[dataVal:dataVal+chunk])
    return array(train)

def getTest(data,chunk):
    test = []
    for dataVal in range(chunk,len(data)):
        test.append(data[dataVal])
    return array(test)

#For following code 

Functions to import data. Data is input from 'Basel_Data.csv'. Which is then stored into variable named 'df' short for data-frame


In [0]:
def getDateFromData(rowVal):
    strVal = str(int(rowVal['Year'])) + "-"  + str(int(rowVal['Month']))  +"-"  + str(int(rowVal['Day'])) + " "  + str(int(rowVal['Hour'])) + ":" + str(int(rowVal['Minute'])) + ":00"
    return to_datetime(strVal)

def convertToDateTime(file):
  todelete = ['Year','Month','Day','Hour','Minute']
  file['DateTime'] = file.apply(lambda row:getDateFromData(row), axis =1)
  cols = file.columns.tolist()
  cols = cols[-1:] + cols[:-1]
  file = file[cols]
  return file.drop(todelete, axis=1)

def importFile():
  return read_csv("/gdrive/My Drive/MLProj/Basel_Data.csv")

def fixColumns(df):
  df.columns = [
    'DateTime',
    'temperature_2m',
    'relative_humidity_2m',
    'mean_sea_level_pressure',
    'total_precipitation_highres_sfc',
    'total_precipitation_lowres_sfc',
    'snowfall_highres_sfc',
    'snowfall_lowres_sfc',
    'total_cloud_cover_sfc',
    'high_cloud_cover_sfc',
    'medium_cloud_cover_sfc',
    'low_cloud_cover_sfc',
    'sunshine_duration_sfc',
    'shortwave_duration_sfc',
    'wind_speed_10m',
    'wind_direction_10m',
    'wind_speed_80m',
    'wind_direction_80m',
    'wind_speed_900mb',
    'wind_direction_900mb',
    'wind_gust_sfc'
  ]
  return df

df = importFile()
df = convertToDateTime(df)
df = fixColumns(df)
df.fillna(0,inplace=True)


#For following code 

checking if data input is correct and without issues.

line 2 removes dateTime column from input data, as it will not affect the training


In [0]:
dataset = df
values = dataset.values[:,1:]
print(values)

[[0.99 96 1006.5 ... 20.46 265.42 11.9]
 [1.0 97 1006.0 ... 20.18 264.72 11.9]
 [1.05 97 1005.3 ... 18.86 263.89 11.9]
 ...
 [11.16 66 1018.4 ... 11.5 89.5 5.8]
 [10.72 66 1018.6 ... 12.71 92.7 4.4]
 [10.33 68 1018.8 ... 13.99 96.57 5.2]]


#For following code 

chunk is the number of values to consider as input to attain 1 row of output. Currently assuming 3 days, i.e. 72 hours as input

spilting training and validation data based on 30 years worth data. (line 3-9)

Line 11 checks the data format and its dimensions


Here we have data input as an array of chunks 'n' of data, each succeeding the previous by 1. That way, we have 'n' data input that solves for the n+1th data. The n+1th data is found in the output value 'y'

To Explore : Implement based on hourly+daily+weekly+fortnightly+monthly+quarterly+yearly

In [0]:
chunk_size = 96
# Make chunks in parts. 
n_train_hours = 365 * 24 * 20


train = values[:n_train_hours, :]
test = values[n_train_hours:, :]

(train_X,train_y,test_X,test_y) = (getTrain(train,chunk_size) , getTest(train,chunk_size) , getTrain(test,chunk_size) , getTest(test,chunk_size))
test_y = test_y[:,0]
train_y = train_y[:,0] 
print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)
# train_X = train_X[:,:,[0,1,2,4,11,19]]
# test_X = test_X[:,:,[0,1,2,4,11,19]]

(175104, 96, 20) (175104,) (124896, 96, 20) (124896,)


#For following code 

The following code makes it so that the model training data is stored at every epoch, so that if model is interrupted, it can start at the same instance

In [0]:
checkpoint_path = "/gdrive/My Drive/MLProj/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# Create checkpoint callback
cp_callback = callbacks.ModelCheckpoint(checkpoint_path, 
                                                 save_weights_only=True,
                                                 verbose=1)

#For following code 

[LSTM or Long-Short term memory ](https://en.wikipedia.org/wiki/Long_short-term_memory) based Neural network. Considers chunks of data as input for 1 output (kind of fits my data model). It is a form of Recurrent Neural network. 


Line 1-4 define the model 
Line  6 runs it on input data train_X, expecting output train_y. Runs for 100 iterations on all data in train_X. utilizes batch size of 72, meaning it calculates 72 data rows at once (72 X 4 row input => 1 row output). 
It considers test_X and test_y data as validation dataset, and evaluates it accuracy on them as well for each iteration.

Line 8 extracts the loss obtained in training the data (error of model's inherent weights and parameters with number of epochs, as we can see it gets more accurate with more iterations.) . Creating a graph of iteration vs training loss


Line 9 does the same for validation loss

Line 10-11 prints the plots of loss vs epochs

Line 13 tests the shape of test data (just because i was worried I was mucking up the data)

In [0]:
input_shape=(train_X.shape[1],train_X.shape[2])
print(input_shape)
model = Sequential()
model.add(LSTM(400, input_shape=input_shape, use_bias=True))
# model.add(LSTM(20, use_bias=True, return_sequences=True ))
model.add(Dense(20))
model.add(Dense(1))
model.add(Activation(activations.linear))
model.compile(loss=losses.mse, optimizer='adam')
model.summary()

(96, 20)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_2 (LSTM)                (None, 400)               673600    
_________________________________________________________________
dense_3 (Dense)              (None, 20)                8020      
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 21        
_________________________________________________________________
activation_2 (Activation)    (None, 1)                 0         
Total params: 681,641
Trainable params: 681,641
Non-trainable params: 0
_________________________________________________________________


In [0]:
plot_model(model, to_file='/gdrive/My Drive/MLProj/model_plot.png', show_shapes=True, show_layer_names=True)

The following code should load previous model from gdrive and run it forward.

In [0]:
model.load_weights(checkpoint_path)

print(model.predict(test_X[:5]))
print(test_y[:5])

[[0.00927737]
 [1.6255051 ]
 [0.51656115]
 [1.7935371 ]
 [1.380176  ]]
[1.09 1.24 1.36 1.51 1.67]


In [0]:
history = model.fit(train_X, train_y, epochs=25, batch_size=256, validation_data=(test_X,test_y), verbose=1, shuffle=False,callbacks=[cp_callback])
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

#For the following code
We use the trained model to predict temperature and look at the predicted result


In [0]:
yhat = model.predict(test_X[:10000])

In [0]:
# # calculate RMSE
rmse = sqrt(mean_squared_error(test_y[:10000], yhat))
print('Test RMSE: %.3f' % rmse)

Test RMSE: 1.699
