# Project: Neural Networks
### by Samuel Sovi

In this project I will be attempting to use the max temperature of a given day, the amount of precipitation and the wind speed to predict whether the minimum temperature is realtively cold or hot using a Neural Network.

**Additional note**: a large portion of the code was taken from my Oct6 Project and I have used https://pytorch.org/tutorials/index.html as a reference for pytorch related things.

##### Imports:
I imported pandas for DataFrame access, numpy for calculations and arrays, tabulate for creating tables, and torch related imports for torch's neural network-related imports

In [None]:
import pandas as pd
import numpy as np
from tabulate import tabulate
import torch
import torchvision
from torch import nn
from torch.utils.data import DataLoader, TensorDataset, random_split
import torch.nn.functional as f

I once again decided to use a github raw csv link for my data which I got from MeteoStat's API on RapidAPI last semester.

In [None]:
data_url = "https://raw.githubusercontent.com/samps7/CSPC323_Files/main/weather_cleaned.csv"

data_df = pd.read_csv(data_url)

In [None]:
print("original data")
print(data_df)

original data
           date  tavg  tmin  tmax   prcp   wdir  wspd    pres
0    2021-02-21  52.5  42.1  64.0  0.000  344.0   4.2  1030.0
1    2021-02-22  56.5  45.0  72.0  0.000  354.0   3.6  1025.0
2    2021-02-23  59.5  46.0  78.1  0.000  337.0   5.0  1020.0
3    2021-02-24  57.6  43.0  71.1  0.000  351.0   6.8  1021.7
4    2021-02-25  56.7  46.9  66.9  0.000  321.0   9.2  1023.7
..          ...   ...   ...   ...    ...    ...   ...     ...
360  2022-02-16  57.6  44.6  69.8  0.000  317.0   8.8  1017.4
361  2022-02-17  58.1  48.2  68.0  0.000  295.0   5.3  1024.8
362  2022-02-18  54.3  42.8  68.0  0.000  155.0   4.3  1024.3
363  2022-02-19  56.7  44.6  69.8  0.000  317.0   4.4  1019.8
364  2022-02-20  51.8  42.8  62.6  0.004  330.0   6.5  1015.4

[365 rows x 8 columns]


I start collecting my data from the dataset by obtaining the four columns that I am interested in: maximum temperature, precipitation, wind speed and minimum temperature (of each day).

In [None]:
# getting x and y values into arrays
X_vals = data_df[["tmax", "prcp", "wspd"]].copy().values
y_vals = data_df[["tmin"]].values

colder_threshold = np.median(y_vals) #used later for differentiating "warm" vs cold temperatures

I calculate 80% of the total number of days to take for training and randomly sample from the whole set

In [None]:
batch_size = 50

tensor_data = TensorDataset(torch.tensor(X_vals, dtype=torch.float32), torch.tensor(y_vals, dtype=torch.float32))

row_count = X_vals.shape[0]

train_percent = 0.2 # 20% used for testing

test_size = int(row_count * train_percent) # size of testing set
train_size = row_count - test_size # size of training set

X_size = X_vals.shape[1]
y_size = y_vals.shape[1]

train_data, test_data = random_split(tensor_data, (train_size, test_size))

train_loader = DataLoader(train_data, batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size*2)

print("training size:", train_size,"testing size:", test_size)



training size: 292 testing size: 73


My Model class is for using torch's nn.Module as a subclass and training + testing values using a neural network

In [None]:

class Model(nn.Module):
  def __init__(self):
    super().__init__()
    self.linear = nn.Linear(X_size,y_size)
  
  def forward(self, xb):
    output = self.linear(xb)
    return output
  
  def train_step(self, batch):
    inputs, targets = batch
    output = self(inputs)
    loss = f.l1_loss(output, targets)
    return loss

  def test_step(self, batch):
    inputs, targets = batch
    output = self(inputs)
    loss = f.l1_loss(output, targets)
    return {'loss' : loss.detach()}

  def test_epoch_end(self, outputs):
    batch_loss = [x['loss'] for x in outputs]
    epoch_loss = torch.stack(batch_loss).mean()
    return {'loss': epoch_loss.item()}

  def epoch_end(self, epoch, result, num_epochs):
    if (epoch+1) % 10 == 0 or epoch == num_epochs - 1:
      print("Epoch #{} has loss: {:.4f}".format(epoch, result['loss']))

  


In [None]:
def show_loss(model, test_loader):
    outputs = [model.test_step(batch) for batch in test_loader]
    return model.test_epoch_end(outputs)
  
def fit(epochs, learn_rate, model, train_loader, test_loader, optimize_ver=torch.optim.SGD):
    hist = []
    optimizer = optimize_ver(model.parameters(), learn_rate)
    for epoch in range(epochs): 
        for batch in train_loader:
            loss = model.train_step(batch)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
        result = show_loss(model, test_loader)
        model.epoch_end(epoch, result,epochs)
        hist.append(result)
    return hist


In [None]:
model = Model()
print("Original Loss: \n", show_loss(model, test_loader))
learning_rate=1e-4 #said in class was best val?
hist = fit(100, learning_rate, model, train_loader, test_loader)

Original Loss: 
 {'loss': 62.33405303955078}
Epoch #9 has loss: 31.6444
Epoch #19 has loss: 4.7446
Epoch #29 has loss: 4.1981
Epoch #39 has loss: 4.1951
Epoch #49 has loss: 4.2022
Epoch #59 has loss: 4.1982
Epoch #69 has loss: 4.1891
Epoch #79 has loss: 4.1700
Epoch #89 has loss: 4.1648
Epoch #99 has loss: 4.1762


The following is timing of my model's fitting (it also trains the model on the same data around 7 more times)

In [None]:
%%timeit
hist = fit(100, learning_rate, model, train_loader, test_loader)

Epoch #9 has loss: 4.1627
Epoch #19 has loss: 4.1592
Epoch #29 has loss: 4.1520
Epoch #39 has loss: 4.1481
Epoch #49 has loss: 4.1456
Epoch #59 has loss: 4.1393
Epoch #69 has loss: 4.1356
Epoch #79 has loss: 4.1173
Epoch #89 has loss: 4.1228
Epoch #99 has loss: 4.1159
Epoch #9 has loss: 4.1210
Epoch #19 has loss: 4.1068
Epoch #29 has loss: 4.1021
Epoch #39 has loss: 4.0978
Epoch #49 has loss: 4.1008
Epoch #59 has loss: 4.0989
Epoch #69 has loss: 4.0888
Epoch #79 has loss: 4.0791
Epoch #89 has loss: 4.0904
Epoch #99 has loss: 4.0815
Epoch #9 has loss: 4.0739
Epoch #19 has loss: 4.0762
Epoch #29 has loss: 4.0685
Epoch #39 has loss: 4.0652
Epoch #49 has loss: 4.0617
Epoch #59 has loss: 4.0655
Epoch #69 has loss: 4.0565
Epoch #79 has loss: 4.0489
Epoch #89 has loss: 4.0498
Epoch #99 has loss: 4.0466
Epoch #9 has loss: 4.0431
Epoch #19 has loss: 4.0385
Epoch #29 has loss: 4.0388
Epoch #39 has loss: 4.0419
Epoch #49 has loss: 4.0331
Epoch #59 has loss: 4.0274
Epoch #69 has loss: 4.0230
Epoch

prediction method for use when predicting values later: (for my Confusion Matrix)

In [None]:
def predict(x, model):
    xb = x.unsqueeze(0)
    return model(x).item()

In [None]:
# sample of one testing case:

x, target = test_data[0]
pred = predict(x, model)
print("")
print("Inputs (Max Temperature, Precipitation, Wind Speed): ", x)
print("")
print("Actual Target Min Temperature: ", target.item())
print("")
print("Model Prediction:", pred)
print("")


Inputs (Max Temperature, Precipitation, Wind Speed):  tensor([70.0000,  0.0000,  2.1000])

Actual Target Min Temperature:  43.0

Model Prediction: 47.1186637878418



The following is timing of one predicion:

In [None]:
%%timeit
pred = predict(x, model)

47.1 µs ± 3.83 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


The following is getting all my predictions while using the same for loop to store all the actual values for later comparison:

In [None]:
predictions = []
actual = []

for i in range(len(test_data)):
  x, target = test_data[i]
  pred = predict(x, model)

  if target.item() > colder_threshold:
    actual.append(False)
  else:
    actual.append(True)
  
  if pred > colder_threshold:
    predictions.append(False)
  else:
    predictions.append(True)

In [None]:
print(actual)

[True, True, True, False, False, False, True, True, False, True, True, False, True, False, False, False, False, True, False, False, True, True, False, True, True, False, False, True, False, True, False, True, True, True, True, True, False, False, True, True, True, False, True, True, True, True, True, False, True, True, True, False, True, True, False, True, True, False, False, False, True, False, True, True, False, False, True, True, False, False, False, False, True]


In [None]:
print(predictions)

[True, True, True, False, False, True, True, True, False, True, True, True, True, False, False, False, False, True, False, False, True, True, False, True, True, False, False, True, False, True, False, True, False, True, True, True, False, False, True, True, True, False, True, False, True, False, False, False, True, True, True, False, True, True, False, True, True, False, False, False, True, False, False, True, False, False, True, False, True, False, True, True, True]


Next, I count the number of true/false positives and true/false negatives by iterating through both arrays

In [None]:
test_length = len(test_data)

true_positive = 0
false_positive = 0
true_negative = 0
false_negative = 0

# counting true/false positve and true/false negatives:

for i in range(test_length):
  if actual[i] == False and predictions[i] == False:
    true_negative = true_negative + 1
  elif actual[i] == True and predictions[i] == True:
    true_positive = true_positive + 1
  elif actual[i] == False and predictions[i] == True:
    false_positive = false_positive + 1
  elif actual[i] == True and predictions[i] == False:
    false_negative = false_negative + 1

Tabulating data to make a Confusion Matrix:

In [None]:
print("Confusion Matrix: for size", test_length)

result_table = [["","PREDICTED COLDER = True", "PREDICTED COLDER = False"], ["ACTUAL COLDER = True", true_positive, false_negative],
                ["ACTUAL COLDER = False", false_positive, true_negative]]
print(tabulate(result_table))

Confusion Matrix: for size 73
---------------------  -----------------------  ------------------------
                       PREDICTED COLDER = True  PREDICTED COLDER = False
ACTUAL COLDER = True   35                       6
ACTUAL COLDER = False  5                        27
---------------------  -----------------------  ------------------------


Calculating accuracy:

In [None]:
accuracy = (true_positive + true_negative) / test_length

print("accuracy =", accuracy)

accuracy = 0.8493150684931506


Calculating precision and recall:

In [None]:
precision = true_positive / (true_positive + false_positive)

recall = true_positive / (true_positive + false_negative)

print("precision = ", precision)

print("recall = ", recall)

precision =  0.875
recall =  0.8536585365853658


Calculating $F_1$:

In [None]:
f_1 = 2 / (1/recall + 1/precision)
print("f_1 =", f_1)

f_1 = 0.8641975308641976
