# **Predicting Sea Surface Temperatures**

---

Group 4: Bennett Blanco, Jenn Hong, Setu Shah

## Data Source

---

**Data Source:** National Oceanic and Atmospheric Administration

The ICOADS dataset contains global marine data from ships (merchant, navy, research) and buoys, each capturing details according to the current weather or ocean conditions (wave height, sea temperature, wind speed, and so on). Each record contains the exact location of the observation which is great for visualizations. The historical depth of the data is quite comprehensive — there are records going back to 1662.

We picked the most recent year of data with good quality data available, which was 2015. We grouped by month and day and picked average values for all variables due to the large amount of missing values in the dataset.

**Access Links:**

[International Comprehensive Ocean-Atmosphere Data Set (ICOADS)](https://console.cloud.google.com/marketplace/product/noaa-public/icoads)

[2015 Big Query Dataset](bigquery-public-data.noaa_icoads.icoads_core_2015)



### **Column Description:**


| Column     | Description |
| ----------- | ----------- |
| month   |   Month of the year when the barometer is read |
| day                       | Day of the year when the barometer is read |
| timestamp                 | Converted UTC timestamp for the actual time of observation at which the barometer is read |
| latitude                 | Position to hundredths of a degree +N or –S and +E or −W |
| longitude                 | Position to hundredths of a degree +N or –S and +E or −W |
| avg_sea_surface_temp      | Sea Surface Temperature (°C)|
| avg_wind_direction_true   | The direction (true) from which wind is blowing (or will blow), stored in whole degrees (range: 1-360°) |
| avg_wind_speed            | Wind speed which is stored in tenths of a meter per second |
| avg_visibility            | Visibility (horizontal visibility at the surface in kilometers) according to WMO Code 4377 |
| avg_sea_level_pressure    | Amount of pressure tendency at station level during the three hours preceding the time of observation in tenths of hPa (i.e., millibars) |
| avg_air_temperature       | Air Temperature (°C) |
| avg_wetbulb_temperature   | Wetbulb Temperature (°C) |
| avg_dewpoint_temperature | Average Dewpoint Temperature (°C) |
| avg_total_cloud_amount | Codes 0 to 9 (WMO Code 2700) show the total fraction of the celestial dome covered by clouds |


|  InvoiceNo     | 6 digit numbers that are being used to keep track of each invoice being generated at the business      |
|  StockCode  | Alphanumeric 5-digit number uniquely assigned to each distinct product|
|     Description  |  Product Name     |
|       Quantity| The quantities of each product (item) per transaction      |
| InvoiceDate      |  The day and time when each transaction was generated     |
| UnitPrice      | Product Price Per Unit (in £)   |
| CustomerID      | 5-digit integral number uniquely assigned to each customer     |
| Country | The name of the country where each customer resides||

## Data Preparation

---

In [1]:
# Download files from Google Drive
!gdown 1I9H_vD1lvNgsQ0dK1JA3pRA-uHm7tmVw
!gdown 1KBZZbtFqRdr7HuQyncVa9Byw7nobEhcT

Downloading...
From: https://drive.google.com/uc?id=1I9H_vD1lvNgsQ0dK1JA3pRA-uHm7tmVw
To: /content/Avg_Variables.csv
100% 30.0M/30.0M [00:00<00:00, 140MB/s]
Downloading...
From: https://drive.google.com/uc?id=1KBZZbtFqRdr7HuQyncVa9Byw7nobEhcT
To: /content/Average_Sea_Surface_Temps.csv
100% 15.6M/15.6M [00:00<00:00, 133MB/s]


In [2]:
import numpy as np
import pandas as pd

# Reading the data
sea_temp = pd.read_csv("Average_Sea_Surface_Temps.csv")
sea_variables = pd.read_csv("Avg_Variables.csv")

In [None]:
# Data overview
sea_temp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 345646 entries, 0 to 345645
Data columns (total 4 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   month                 345646 non-null  int64  
 1   day                   345646 non-null  int64  
 2   timestamp             345646 non-null  object 
 3   avg_sea_surface_temp  345646 non-null  float64
dtypes: float64(1), int64(2), object(1)
memory usage: 10.5+ MB


In [None]:
sea_temp.head(5)

Unnamed: 0,month,day,timestamp,avg_sea_surface_temp
0,1,1,2015-01-01 00:00:00 UTC,17.304762
1,1,1,2015-01-01 00:01:00 UTC,18.765
2,1,1,2015-01-01 00:03:00 UTC,20.3125
3,1,1,2015-01-01 00:04:00 UTC,13.773333
4,1,1,2015-01-01 00:06:00 UTC,12.241818


In [3]:
# Grouping by month and day
sea_temp = sea_temp.groupby(['month', 'day']).agg({'avg_sea_surface_temp':'mean'}).reset_index()

In [None]:
sea_temp.head(3)

Unnamed: 0,month,day,avg_sea_surface_temp
0,1,1,18.23316
1,1,2,18.347285
2,1,3,18.337298


In [None]:
# Check summary statistics
sea_temp.describe()

Unnamed: 0,month,day,avg_sea_surface_temp
count,365.0,365.0,365.0
mean,6.526027,15.720548,20.238121
std,3.452584,8.808321,1.949204
min,1.0,1.0,17.048895
25%,4.0,8.0,18.416053
50%,7.0,16.0,20.030956
75%,10.0,23.0,22.114404
max,12.0,31.0,23.342618


## Univariate MLP

---

As our baseline deep learning model, we chose a univariate multi-layer perceptron model for predicting daily average global sea surface temperatures.

We preprocessed the data by normalizing input features using Min-Max scaler. By doing so, all features will be transformed into the range [0,1].

Next, we created sequences of the data that would be fed into the MLP model for prediction. In other words, we feed data sequenced by n days and the model would predict the temperature for each (n+1)th day.

We chose MSE as our training criterion because we wanted to penalize wrong predictions (outliers) of temperature. We also experimented with MAE for greater interpretability. However, since MAE only calculates the magnitude of the error and does take into account the direction of the error, we stuck with MSE.


In [4]:
# Splitting the data
split_ratio = 0.8
train_size = int(len(sea_temp) * split_ratio)

In [5]:
# Scaling the data

from sklearn.preprocessing import MinMaxScaler

X = sea_temp[['avg_sea_surface_temp']]

scaler = MinMaxScaler()

scaled_data = scaler.fit_transform(X)

# Splitting the data
split_ratio = 0.8
train_size = int(len(sea_temp) * split_ratio)

### Model 1

lr = 0.0001
momentum= 0.9
epochs = 500
sequence_length = 49

In [None]:
# Creating sequences
sequence_length = 49

def create_sequences(data, sequence_length):

  total_sequence=[]
  prices = []

  for day in range(0, (len(data) - sequence_length)):
    start_index = day
    end_index = sequence_length + day
    day_sequence = data[start_index:end_index]
    day_price = data[end_index]

    total_sequence.append(day_sequence)
    prices.append(day_price)
  return np.array(total_sequence), np.array(prices)

sequences = create_sequences(scaled_data, sequence_length)
X = sequences[0]
y = sequences[1]

In [None]:
import torch

# Split the data into training and testing sets
train_size = int(X.shape[0]* split_ratio)
X_train = torch.tensor(X[:train_size]).float()
y_train = torch.tensor(y[:train_size]).float()
X_test = torch.tensor(X[train_size:]).float()
y_test = torch.tensor(y[train_size:]).float()

In [None]:
import torch
import torch.nn as nn

# Define the MLP model
class MLP(nn.Module):
    def __init__(self, input_size):
      super(MLP, self).__init__()
      self.model = nn.Sequential(
          nn.Linear(input_size, 20),
          nn.Linear(20, 10),
          nn.Linear(10, 1)
      )

    def forward(self, x):
      x = x.view(-1, input_size)
      x = self.model(x)
      return x

# Instantiate the model
input_size = sequence_length
model = MLP(input_size)

In [None]:
# Model 1 Parameters

lr = 0.0001
momentum= 0.9
epochs = 500

In [None]:
import torch.optim as optim

criterion_training = nn.MSELoss()
criterion_testing = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

In [None]:
# Putting the y data in the correct format

y_test_actual = torch.tensor(scaler.inverse_transform(y_test))
y_train_actual = torch.tensor(scaler.inverse_transform(y_train))

# Initializing a best model for later use
best_test_error = torch.tensor(float(500))
best_model_state = None

# Train the model
for epoch in range(epochs):

    optimizer.zero_grad()
    model.train()
    pred = model(X_train)

    predictions_train_actual = torch.tensor(scaler.inverse_transform(pred.detach()))
    actual_train_error = criterion_training(predictions_train_actual, y_train_actual)

    loss = criterion_training(pred, y_train)
    # Update model here based on error
    loss.backward()
    optimizer.step()

    model.eval()
    # Evaluate the model on the test data
    with torch.no_grad():
      # Evaluate the model here.
      pred_test = model(X_test)
      loss_test = criterion_testing(pred_test, y_test)

      predictions_test_actual = torch.tensor(scaler.inverse_transform(pred_test.detach()))
      actual_test_error = criterion_testing(predictions_test_actual, y_test_actual)

    #defining a best model
    if actual_test_error < best_test_error:
        best_test_error = actual_test_error
        best_model_state = model.state_dict()

    if epoch %10 == 0:
      print(f"Epoch {epoch}: Training Actual Error= {actual_train_error}, Test Actual Error= {actual_test_error}")

Epoch 0: Training Actual Error= 5.167929815198863, Test Actual Error= 0.5823485687110341
Epoch 10: Training Actual Error= 4.908772276348545, Test Actual Error= 0.5593122137279737
Epoch 20: Training Actual Error= 4.437197723867715, Test Actual Error= 0.5389265145008185
Epoch 30: Training Actual Error= 3.933556464327463, Test Actual Error= 0.5441267352705441
Epoch 40: Training Actual Error= 3.4595329768188496, Test Actual Error= 0.5789099109348933
Epoch 50: Training Actual Error= 3.0331806567092006, Test Actual Error= 0.6407053773307493
Epoch 60: Training Actual Error= 2.6572152302110235, Test Actual Error= 0.725277923926257
Epoch 70: Training Actual Error= 2.329171692020241, Test Actual Error= 0.8282549021466902
Epoch 80: Training Actual Error= 2.044977309195873, Test Actual Error= 0.9455372311170892
Epoch 90: Training Actual Error= 1.8002020874911073, Test Actual Error= 1.0733914912949454
Epoch 100: Training Actual Error= 1.590492988009319, Test Actual Error= 1.208467501025093
Epoch 11

### Model 2

lr = 0.01
momentum= 0.9
epochs = 500
sequence_length = 49

In [None]:
# Creating sequences
sequence_length = 49

def create_sequences(data, sequence_length):

  total_sequence=[]
  prices = []

  for day in range(0, (len(data) - sequence_length)):
    start_index = day
    end_index = sequence_length + day
    day_sequence = data[start_index:end_index]
    day_price = data[end_index]

    total_sequence.append(day_sequence)
    prices.append(day_price)
  return np.array(total_sequence), np.array(prices)

sequences = create_sequences(scaled_data, sequence_length)
X = sequences[0]
y = sequences[1]

In [None]:
import torch

# Split the data into training and testing sets
train_size = int(X.shape[0]* split_ratio)
X_train = torch.tensor(X[:train_size]).float()
y_train = torch.tensor(y[:train_size]).float()
X_test = torch.tensor(X[train_size:]).float()
y_test = torch.tensor(y[train_size:]).float()

In [None]:
import torch
import torch.nn as nn

# Define the MLP model
class MLP(nn.Module):
    def __init__(self, input_size):
      super(MLP, self).__init__()
      self.model = nn.Sequential(
          nn.Linear(input_size, 20),
          nn.Linear(20, 10),
          nn.Linear(10, 1)
      )

    def forward(self, x):
      x = x.view(-1, input_size)
      x = self.model(x)
      return x

# Instantiate the model
input_size = sequence_length
model = MLP(input_size)

In [None]:
# Model 2 Parameters

lr = 0.01
momentum= 0.9
epochs = 500

In [None]:
import torch.optim as optim

criterion_training = nn.MSELoss()
criterion_testing = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

In [None]:
# Putting the y data in the correct format

y_test_actual = torch.tensor(scaler.inverse_transform(y_test))
y_train_actual = torch.tensor(scaler.inverse_transform(y_train))

# Initializing a best model for later use
best_test_error = torch.tensor(float(500))
best_model_state = None

# Train the model
for epoch in range(epochs):

    optimizer.zero_grad()
    model.train()
    pred = model(X_train)

    predictions_train_actual = torch.tensor(scaler.inverse_transform(pred.detach()))
    actual_train_error = criterion_training(predictions_train_actual, y_train_actual)

    loss = criterion_training(pred, y_train)
    # Update model here based on error
    loss.backward()
    optimizer.step()

    model.eval()
    # Evaluate the model on the test data
    with torch.no_grad():
      # Evaluate the model here.
      pred_test = model(X_test)
      loss_test = criterion_testing(pred_test, y_test)

      predictions_test_actual = torch.tensor(scaler.inverse_transform(pred_test.detach()))
      actual_test_error = criterion_testing(predictions_test_actual, y_test_actual)

    #defining a best model
    if actual_test_error < best_test_error:
        best_test_error = actual_test_error
        best_model_state = model.state_dict()

    if epoch %10 == 0:
      print(f"Epoch {epoch}: Training Actual Error= {actual_train_error}, Test Actual Error= {actual_test_error}")

Epoch 0: Training Actual Error= 39.937707457575186, Test Actual Error= 12.406951747605152
Epoch 10: Training Actual Error= 4.033297905749451, Test Actual Error= 11.330390892920786
Epoch 20: Training Actual Error= 2.68662899767544, Test Actual Error= 2.7492317045798753
Epoch 30: Training Actual Error= 0.988305121061186, Test Actual Error= 3.180025163055138
Epoch 40: Training Actual Error= 0.695640214918255, Test Actual Error= 4.602713378066791
Epoch 50: Training Actual Error= 0.4254336050327315, Test Actual Error= 2.4069909456868843
Epoch 60: Training Actual Error= 0.35621673159535466, Test Actual Error= 2.01085774227552
Epoch 70: Training Actual Error= 0.3370610007286515, Test Actual Error= 2.2198621762812003
Epoch 80: Training Actual Error= 0.30310765205507584, Test Actual Error= 1.9457976069409468
Epoch 90: Training Actual Error= 0.28052536754217416, Test Actual Error= 1.6657164298088378
Epoch 100: Training Actual Error= 0.25863354429309243, Test Actual Error= 1.5634845352786924
Epoc

### Model 3

lr = 0.01
momentum= 0.9
epochs = 500
sequence_length = 100

In [None]:
# Creating sequences
sequence_length = 100

def create_sequences(data, sequence_length):

  total_sequence=[]
  prices = []

  for day in range(0, (len(data) - sequence_length)):
    start_index = day
    end_index = sequence_length + day
    day_sequence = data[start_index:end_index]
    day_price = data[end_index]

    total_sequence.append(day_sequence)
    prices.append(day_price)
  return np.array(total_sequence), np.array(prices)

sequences = create_sequences(scaled_data, sequence_length)
X = sequences[0]
y = sequences[1]

In [None]:
import torch

# Split the data into training and testing sets
train_size = int(X.shape[0]* split_ratio)
X_train = torch.tensor(X[:train_size]).float()
y_train = torch.tensor(y[:train_size]).float()
X_test = torch.tensor(X[train_size:]).float()
y_test = torch.tensor(y[train_size:]).float()

In [None]:
import torch
import torch.nn as nn

# Define the MLP model
class MLP(nn.Module):
    def __init__(self, input_size):
      super(MLP, self).__init__()
      self.model = nn.Sequential(
          nn.Linear(input_size, 20),
          nn.Linear(20, 10),
          nn.Linear(10, 1)
      )

    def forward(self, x):
      x = x.view(-1, input_size)
      x = self.model(x)
      return x

# Instantiate the model
input_size = sequence_length
model = MLP(input_size)

In [None]:
# Model 3 Parameters

lr = 0.01
momentum= 0.9
epochs = 500

In [None]:
import torch.optim as optim

criterion_training = nn.MSELoss()
criterion_testing = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

In [None]:
# Putting the y data in the correct format

y_test_actual = torch.tensor(scaler.inverse_transform(y_test))
y_train_actual = torch.tensor(scaler.inverse_transform(y_train))

# Initializing a best model for later use
best_test_error = torch.tensor(float(500))
best_model_state = None

# Train the model
for epoch in range(epochs):

    optimizer.zero_grad()
    model.train()
    pred = model(X_train)

    predictions_train_actual = torch.tensor(scaler.inverse_transform(pred.detach()))
    actual_train_error = criterion_training(predictions_train_actual, y_train_actual)

    loss = criterion_training(pred, y_train)
    # Update model here based on error
    loss.backward()
    optimizer.step()

    model.eval()
    # Evaluate the model on the test data
    with torch.no_grad():
      # Evaluate the model here.
      pred_test = model(X_test)
      loss_test = criterion_testing(pred_test, y_test)

      predictions_test_actual = torch.tensor(scaler.inverse_transform(pred_test.detach()))
      actual_test_error = criterion_testing(predictions_test_actual, y_test_actual)

    #defining a best model
    if actual_test_error < best_test_error:
        best_test_error = actual_test_error
        best_model_state = model.state_dict()

    if epoch %10 == 0:
      print(f"Epoch {epoch}: Training Actual Error= {actual_train_error}, Test Actual Error= {actual_test_error}")

Epoch 0: Training Actual Error= 7.5660515347793105, Test Actual Error= 0.532142219237169
Epoch 10: Training Actual Error= 0.9316664464133589, Test Actual Error= 6.581909437919834
Epoch 20: Training Actual Error= 0.5369087650789158, Test Actual Error= 5.716014176113783
Epoch 30: Training Actual Error= 0.4504496428280754, Test Actual Error= 3.912008688068836
Epoch 40: Training Actual Error= 0.398149157824146, Test Actual Error= 4.165180228542856
Epoch 50: Training Actual Error= 0.34405123140543725, Test Actual Error= 2.7181432746419767
Epoch 60: Training Actual Error= 0.28819804624676193, Test Actual Error= 2.5532513932400156
Epoch 70: Training Actual Error= 0.23711757792883686, Test Actual Error= 1.6150049337722192
Epoch 80: Training Actual Error= 0.19329570203122126, Test Actual Error= 1.3015347596891578
Epoch 90: Training Actual Error= 0.15869615583535177, Test Actual Error= 0.8497572851295566
Epoch 100: Training Actual Error= 0.13323956491594702, Test Actual Error= 0.5715397000472395

So far this is the best model

### Model 4

lr = 0.01
momentum= 0.9
epochs = 500
sequence_length = 100

layers changed:
          nn.Linear(input_size, 40),
          nn.Linear(40, 20),
          nn.Linear(20, 1)

In [None]:
# Creating sequences
sequence_length = 100

def create_sequences(data, sequence_length):

  total_sequence=[]
  prices = []

  for day in range(0, (len(data) - sequence_length)):
    start_index = day
    end_index = sequence_length + day
    day_sequence = data[start_index:end_index]
    day_price = data[end_index]

    total_sequence.append(day_sequence)
    prices.append(day_price)
  return np.array(total_sequence), np.array(prices)

sequences = create_sequences(scaled_data, sequence_length)
X = sequences[0]
y = sequences[1]

In [None]:
import torch

# Split the data into training and testing sets
train_size = int(X.shape[0]* split_ratio)
X_train = torch.tensor(X[:train_size]).float()
y_train = torch.tensor(y[:train_size]).float()
X_test = torch.tensor(X[train_size:]).float()
y_test = torch.tensor(y[train_size:]).float()

In [None]:
import torch
import torch.nn as nn

# Define the MLP model
class MLP(nn.Module):
    def __init__(self, input_size):
      super(MLP, self).__init__()
      self.model = nn.Sequential(
          nn.Linear(input_size, 40),
          nn.Linear(40, 20),
          nn.Linear(20, 1)
      )

    def forward(self, x):
      x = x.view(-1, input_size)
      x = self.model(x)
      return x

# Instantiate the model
input_size = sequence_length
model = MLP(input_size)

In [None]:
# Model 4 Parameters

lr = 0.01
momentum= 0.9
epochs = 500

In [None]:
import torch.optim as optim

criterion_training = nn.MSELoss()
criterion_testing = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

In [None]:
# Putting the y data in the correct format

y_test_actual = torch.tensor(scaler.inverse_transform(y_test))
y_train_actual = torch.tensor(scaler.inverse_transform(y_train))

# Initializing a best model for later use
best_test_error = torch.tensor(float(500))
best_model_state = None

# Train the model
for epoch in range(epochs):

    optimizer.zero_grad()
    model.train()
    pred = model(X_train)

    predictions_train_actual = torch.tensor(scaler.inverse_transform(pred.detach()))
    actual_train_error = criterion_training(predictions_train_actual, y_train_actual)

    loss = criterion_training(pred, y_train)
    # Update model here based on error
    loss.backward()
    optimizer.step()

    model.eval()
    # Evaluate the model on the test data
    with torch.no_grad():
      # Evaluate the model here.
      pred_test = model(X_test)
      loss_test = criterion_testing(pred_test, y_test)

      predictions_test_actual = torch.tensor(scaler.inverse_transform(pred_test.detach()))
      actual_test_error = criterion_testing(predictions_test_actual, y_test_actual)

    #defining a best model
    if actual_test_error < best_test_error:
        best_test_error = actual_test_error
        best_model_state = model.state_dict()

    if epoch %10 == 0:
      print(f"Epoch {epoch}: Training Actual Error= {actual_train_error}, Test Actual Error= {actual_test_error}")

Epoch 0: Training Actual Error= 13.384886755575767, Test Actual Error= 0.5906319692526074
Epoch 10: Training Actual Error= 1.4968146718862017, Test Actual Error= 8.903274573248481
Epoch 20: Training Actual Error= 1.1905304812426722, Test Actual Error= 4.694598788975136
Epoch 30: Training Actual Error= 0.850545754072595, Test Actual Error= 7.507205503053895
Epoch 40: Training Actual Error= 0.6965286319613155, Test Actual Error= 4.430349794452536
Epoch 50: Training Actual Error= 0.558708442992374, Test Actual Error= 5.369325860787195
Epoch 60: Training Actual Error= 0.4548795407009266, Test Actual Error= 3.6390455401232438
Epoch 70: Training Actual Error= 0.3747167497586957, Test Actual Error= 3.4451971224566584
Epoch 80: Training Actual Error= 0.3087268464205595, Test Actual Error= 2.502137808368712
Epoch 90: Training Actual Error= 0.2539792220691466, Test Actual Error= 1.9723863130790171
Epoch 100: Training Actual Error= 0.2089146331178367, Test Actual Error= 1.4264960638281456
Epoch 1

### Model 5

lr = 0.01
momentum= 0.9
epochs = 500
sequence_length = 100

layers changed:
          nn.Linear(input_size, 80),
          nn.Linear(80, 60),
          nn.Linear(60, 1)

In [None]:
# Creating sequences
sequence_length = 100

def create_sequences(data, sequence_length):

  total_sequence=[]
  prices = []

  for day in range(0, (len(data) - sequence_length)):
    start_index = day
    end_index = sequence_length + day
    day_sequence = data[start_index:end_index]
    day_price = data[end_index]

    total_sequence.append(day_sequence)
    prices.append(day_price)
  return np.array(total_sequence), np.array(prices)

sequences = create_sequences(scaled_data, sequence_length)
X = sequences[0]
y = sequences[1]

In [None]:
import torch

# Split the data into training and testing sets
train_size = int(X.shape[0]* split_ratio)
X_train = torch.tensor(X[:train_size]).float()
y_train = torch.tensor(y[:train_size]).float()
X_test = torch.tensor(X[train_size:]).float()
y_test = torch.tensor(y[train_size:]).float()

In [None]:
import torch
import torch.nn as nn

# Define the MLP model
class MLP(nn.Module):
    def __init__(self, input_size):
      super(MLP, self).__init__()
      self.model = nn.Sequential(
          nn.Linear(input_size, 80),
          nn.Linear(80, 60),
          nn.Linear(60, 1)
      )

    def forward(self, x):
      x = x.view(-1, input_size)
      x = self.model(x)
      return x

# Instantiate the model
input_size = sequence_length
model = MLP(input_size)

In [None]:
# Model 4 Parameters

lr = 0.01
momentum= 0.9
epochs = 500

In [None]:
import torch.optim as optim

criterion_training = nn.MSELoss()
criterion_testing = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

In [None]:
# Putting the y data in the correct format

y_test_actual = torch.tensor(scaler.inverse_transform(y_test))
y_train_actual = torch.tensor(scaler.inverse_transform(y_train))

# Initializing a best model for later use
best_test_error = torch.tensor(float(500))
best_model_state = None

# Train the model
for epoch in range(epochs):

    optimizer.zero_grad()
    model.train()
    pred = model(X_train)

    predictions_train_actual = torch.tensor(scaler.inverse_transform(pred.detach()))
    actual_train_error = criterion_training(predictions_train_actual, y_train_actual)

    loss = criterion_training(pred, y_train)
    # Update model here based on error
    loss.backward()
    optimizer.step()

    model.eval()
    # Evaluate the model on the test data
    with torch.no_grad():
      # Evaluate the model here.
      pred_test = model(X_test)
      loss_test = criterion_testing(pred_test, y_test)

      predictions_test_actual = torch.tensor(scaler.inverse_transform(pred_test.detach()))
      actual_test_error = criterion_testing(predictions_test_actual, y_test_actual)

    #defining a best model
    if actual_test_error < best_test_error:
        best_test_error = actual_test_error
        best_model_state = model.state_dict()

    if epoch %10 == 0:
      print(f"Epoch {epoch}: Training Actual Error= {actual_train_error}, Test Actual Error= {actual_test_error}")

Epoch 0: Training Actual Error= 31.72359025699594, Test Actual Error= 0.6606966466368782
Epoch 10: Training Actual Error= 8.18684515660177, Test Actual Error= 0.3527293166058925
Epoch 20: Training Actual Error= 1.1015302833391287, Test Actual Error= 3.4857479730148047
Epoch 30: Training Actual Error= 0.3391865146608457, Test Actual Error= 2.1127063499315906
Epoch 40: Training Actual Error= 0.33115677651736586, Test Actual Error= 1.067613024211882
Epoch 50: Training Actual Error= 0.2996492769592937, Test Actual Error= 1.948866475443947
Epoch 60: Training Actual Error= 0.22976368507327136, Test Actual Error= 0.9219832156950766
Epoch 70: Training Actual Error= 0.16791783731980917, Test Actual Error= 0.9610569849230213
Epoch 80: Training Actual Error= 0.13321203595583123, Test Actual Error= 0.6778540984530278
Epoch 90: Training Actual Error= 0.11341530918671869, Test Actual Error= 0.42226932777109644
Epoch 100: Training Actual Error= 0.09871721116692724, Test Actual Error= 0.32722823353997

### Model 6

lr = 0.01
momentum= 0.9
epochs = 500
sequence_length = 100

layers changed:
          nn.Linear(input_size, 20),
          nn.Linear(20, 10),
          nn.Linear(10, 1)

Adding relu (it performs worse potentially because of dead neurons problem)

In [None]:
# Creating sequences
sequence_length = 100

def create_sequences(data, sequence_length):

  total_sequence=[]
  prices = []

  for day in range(0, (len(data) - sequence_length)):
    start_index = day
    end_index = sequence_length + day
    day_sequence = data[start_index:end_index]
    day_price = data[end_index]

    total_sequence.append(day_sequence)
    prices.append(day_price)
  return np.array(total_sequence), np.array(prices)

sequences = create_sequences(scaled_data, sequence_length)
X = sequences[0]
y = sequences[1]

In [None]:
import torch

# Split the data into training and testing sets
train_size = int(X.shape[0]* split_ratio)
X_train = torch.tensor(X[:train_size]).float()
y_train = torch.tensor(y[:train_size]).float()
X_test = torch.tensor(X[train_size:]).float()
y_test = torch.tensor(y[train_size:]).float()

In [None]:
import torch
import torch.nn as nn

# Define the MLP model
class MLP(nn.Module):
    def __init__(self, input_size):
      super(MLP, self).__init__()
      self.model = nn.Sequential(
          nn.Linear(input_size, 40),
          nn.ReLU(),
          nn.Linear(40, 20),
          nn.ReLU(),
          nn.Linear(20, 1)
      )

    def forward(self, x):
      x = x.view(-1, input_size)
      x = self.model(x)
      return x

# Instantiate the model
input_size = sequence_length
model = MLP(input_size)

In [None]:
# Model 4 Parameters

lr = 0.01
momentum= 0.9
epochs = 500

In [None]:
import torch.optim as optim

criterion_training = nn.MSELoss()
criterion_testing = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

In [None]:
# Putting the y data in the correct format

y_test_actual = torch.tensor(scaler.inverse_transform(y_test))
y_train_actual = torch.tensor(scaler.inverse_transform(y_train))

# Initializing a best model for later use
best_test_error = torch.tensor(float(500))
best_model_state = None

# Train the model
for epoch in range(epochs):

    optimizer.zero_grad()
    model.train()
    pred = model(X_train)

    predictions_train_actual = torch.tensor(scaler.inverse_transform(pred.detach()))
    actual_train_error = criterion_training(predictions_train_actual, y_train_actual)

    loss = criterion_training(pred, y_train)
    # Update model here based on error
    loss.backward()
    optimizer.step()

    model.eval()
    # Evaluate the model on the test data
    with torch.no_grad():
      # Evaluate the model here.
      pred_test = model(X_test)
      loss_test = criterion_testing(pred_test, y_test)

      predictions_test_actual = torch.tensor(scaler.inverse_transform(pred_test.detach()))
      actual_test_error = criterion_testing(predictions_test_actual, y_test_actual)

    #defining a best model
    if actual_test_error < best_test_error:
        best_test_error = actual_test_error
        best_model_state = model.state_dict()

    if epoch %10 == 0:
      print(f"Epoch {epoch}: Training Actual Error= {actual_train_error}, Test Actual Error= {actual_test_error}")

Epoch 0: Training Actual Error= 22.999304338080844, Test Actual Error= 2.786146913342259
Epoch 10: Training Actual Error= 6.229840872154818, Test Actual Error= 25.912361464260933
Epoch 20: Training Actual Error= 2.015951338987597, Test Actual Error= 3.59780542539857
Epoch 30: Training Actual Error= 1.0022704120648214, Test Actual Error= 10.970772315153113
Epoch 40: Training Actual Error= 0.8066110104449932, Test Actual Error= 7.035282108767556
Epoch 50: Training Actual Error= 0.7489957418918355, Test Actual Error= 7.080752632265583
Epoch 60: Training Actual Error= 0.679516689145636, Test Actual Error= 7.344703110862274
Epoch 70: Training Actual Error= 0.6069675475045129, Test Actual Error= 6.095274400634
Epoch 80: Training Actual Error= 0.5442073157486783, Test Actual Error= 6.081179747676663
Epoch 90: Training Actual Error= 0.4884245384745342, Test Actual Error= 5.211223810684511
Epoch 100: Training Actual Error= 0.4368845860999709, Test Actual Error= 4.830818639498538
Epoch 110: Trai

### Model 7

lr = 0.01
momentum= 0.9
epochs = 500
sequence_length = 100

layers changed:
          nn.Linear(input_size, 20),
          nn.Linear(20, 10),
          nn.Linear(10, 1)

Using Adam as optimizer

In [None]:
# Creating sequences
sequence_length = 100

def create_sequences(data, sequence_length):

  total_sequence=[]
  prices = []

  for day in range(0, (len(data) - sequence_length)):
    start_index = day
    end_index = sequence_length + day
    day_sequence = data[start_index:end_index]
    day_price = data[end_index]

    total_sequence.append(day_sequence)
    prices.append(day_price)
  return np.array(total_sequence), np.array(prices)

sequences = create_sequences(scaled_data, sequence_length)
X = sequences[0]
y = sequences[1]

In [None]:
import torch

# Split the data into training and testing sets
train_size = int(X.shape[0]* split_ratio)
X_train = torch.tensor(X[:train_size]).float()
y_train = torch.tensor(y[:train_size]).float()
X_test = torch.tensor(X[train_size:]).float()
y_test = torch.tensor(y[train_size:]).float()

In [None]:
import torch
import torch.nn as nn

# Define the MLP model
class MLP(nn.Module):
    def __init__(self, input_size):
      super(MLP, self).__init__()
      self.model = nn.Sequential(
          nn.Linear(input_size, 40),
          nn.ReLU(),
          nn.Linear(40, 20),
          nn.ReLU(),
          nn.Linear(20, 1)
      )

    def forward(self, x):
      x = x.view(-1, input_size)
      x = self.model(x)
      return x

# Instantiate the model
input_size = sequence_length
model = MLP(input_size)

In [None]:
# Model 4 Parameters

lr = 0.01
momentum= 0.9
epochs = 500

In [None]:
import torch.optim as optim

criterion_training = nn.MSELoss()
criterion_testing = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

In [None]:
# Putting the y data in the correct format

y_test_actual = torch.tensor(scaler.inverse_transform(y_test))
y_train_actual = torch.tensor(scaler.inverse_transform(y_train))

# Initializing a best model for later use
best_test_error = torch.tensor(float(500))
best_model_state = None

# Train the model
for epoch in range(epochs):

    optimizer.zero_grad()
    model.train()
    pred = model(X_train)

    predictions_train_actual = torch.tensor(scaler.inverse_transform(pred.detach()))
    actual_train_error = criterion_training(predictions_train_actual, y_train_actual)

    loss = criterion_training(pred, y_train)
    # Update model here based on error
    loss.backward()
    optimizer.step()

    model.eval()
    # Evaluate the model on the test data
    with torch.no_grad():
      # Evaluate the model here.
      pred_test = model(X_test)
      loss_test = criterion_testing(pred_test, y_test)

      predictions_test_actual = torch.tensor(scaler.inverse_transform(pred_test.detach()))
      actual_test_error = criterion_testing(predictions_test_actual, y_test_actual)

    #defining a best model
    if actual_test_error < best_test_error:
        best_test_error = actual_test_error
        best_model_state = model.state_dict()

    if epoch %10 == 0:
      print(f"Epoch {epoch}: Training Actual Error= {actual_train_error}, Test Actual Error= {actual_test_error}")

Epoch 0: Training Actual Error= 12.914581475556421, Test Actual Error= 2.832257011320409
Epoch 10: Training Actual Error= 0.7085818598438066, Test Actual Error= 7.881571335411787
Epoch 20: Training Actual Error= 0.7748806010551391, Test Actual Error= 3.169282001145288
Epoch 30: Training Actual Error= 0.3560419983214852, Test Actual Error= 2.205023113767295
Epoch 40: Training Actual Error= 0.1955736256199082, Test Actual Error= 1.3678633930867787
Epoch 50: Training Actual Error= 0.17116201804387787, Test Actual Error= 1.1272666173463348
Epoch 60: Training Actual Error= 0.151878931141413, Test Actual Error= 0.9044436104078185
Epoch 70: Training Actual Error= 0.12999591626268442, Test Actual Error= 0.8199593938560137
Epoch 80: Training Actual Error= 0.11330457942064498, Test Actual Error= 0.745171943019413
Epoch 90: Training Actual Error= 0.10018033260687334, Test Actual Error= 0.6433958715874512
Epoch 100: Training Actual Error= 0.08988993794461901, Test Actual Error= 0.5451162354403402


### Model 8

lr = 0.01
momentum= 0.9
epochs = 500
sequence_length = 7

In [None]:
# Creating sequences
sequence_length = 7

def create_sequences(data, sequence_length):

  total_sequence=[]
  prices = []

  for day in range(0, (len(data) - sequence_length)):
    start_index = day
    end_index = sequence_length + day
    day_sequence = data[start_index:end_index]
    day_price = data[end_index]

    total_sequence.append(day_sequence)
    prices.append(day_price)
  return np.array(total_sequence), np.array(prices)

sequences = create_sequences(scaled_data, sequence_length)
X = sequences[0]
y = sequences[1]

In [None]:
import torch

# Split the data into training and testing sets
train_size = int(X.shape[0]* split_ratio)
X_train = torch.tensor(X[:train_size]).float()
y_train = torch.tensor(y[:train_size]).float()
X_test = torch.tensor(X[train_size:]).float()
y_test = torch.tensor(y[train_size:]).float()

In [None]:
import torch
import torch.nn as nn

# Define the MLP model
class MLP(nn.Module):
    def __init__(self, input_size):
      super(MLP, self).__init__()
      self.model = nn.Sequential(
          nn.Linear(input_size, 80),
          nn.Linear(80, 60),
          nn.Linear(60, 1)
      )

    def forward(self, x):
      x = x.view(-1, input_size)
      x = self.model(x)
      return x

# Instantiate the model
input_size = sequence_length
model = MLP(input_size)

In [None]:
# Model 4 Parameters

lr = 0.01
momentum= 0.9
epochs = 500

In [None]:
import torch.optim as optim

criterion_training = nn.MSELoss()
criterion_testing = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

In [None]:
# Putting the y data in the correct format

y_test_actual = torch.tensor(scaler.inverse_transform(y_test))
y_train_actual = torch.tensor(scaler.inverse_transform(y_train))

# Initializing a best model for later use
best_test_error = torch.tensor(float(500))
best_model_state = None

# Train the model
for epoch in range(epochs):

    optimizer.zero_grad()
    model.train()
    pred = model(X_train)

    predictions_train_actual = torch.tensor(scaler.inverse_transform(pred.detach()))
    actual_train_error = criterion_training(predictions_train_actual, y_train_actual)

    loss = criterion_training(pred, y_train)
    # Update model here based on error
    loss.backward()
    optimizer.step()

    model.eval()
    # Evaluate the model on the test data
    with torch.no_grad():
      # Evaluate the model here.
      pred_test = model(X_test)
      loss_test = criterion_testing(pred_test, y_test)

      predictions_test_actual = torch.tensor(scaler.inverse_transform(pred_test.detach()))
      actual_test_error = criterion_testing(predictions_test_actual, y_test_actual)

    #defining a best model
    if actual_test_error < best_test_error:
        best_test_error = actual_test_error
        best_model_state = model.state_dict()

    if epoch %10 == 0:
      print(f"Epoch {epoch}: Training Actual Error= {actual_train_error}, Test Actual Error= {actual_test_error}")

Epoch 0: Training Actual Error= 23.433751588309434, Test Actual Error= 6.254822939859325
Epoch 10: Training Actual Error= 0.7911931177186886, Test Actual Error= 0.2819495338648579
Epoch 20: Training Actual Error= 0.17487365239838934, Test Actual Error= 0.2750017616931901
Epoch 30: Training Actual Error= 0.10051518898668084, Test Actual Error= 0.09070144642585093
Epoch 40: Training Actual Error= 0.08630989035256505, Test Actual Error= 0.2504438958106248
Epoch 50: Training Actual Error= 0.08773861686879729, Test Actual Error= 0.0859669390246972
Epoch 60: Training Actual Error= 0.08051890715155993, Test Actual Error= 0.18380858043311182
Epoch 70: Training Actual Error= 0.07744471091420901, Test Actual Error= 0.10659977387153205
Epoch 80: Training Actual Error= 0.07543137973705874, Test Actual Error= 0.1472098753921709
Epoch 90: Training Actual Error= 0.07446121061765908, Test Actual Error= 0.11727531199354929
Epoch 100: Training Actual Error= 0.07391772040528019, Test Actual Error= 0.1336

### WandB MLP

---

Next, we wanted to perform a grid search over the hyperparameters and log the results in WandB for tracking the experiments.

[WandB Report Link](https://api.wandb.ai/links/ba-865/x2b0xmxp)

In [None]:
# Install WandB
%%capture
!pip install wandb

In [None]:
import wandb
from wandb.keras import WandbMetricsLogger
import torch
from sklearn.preprocessing import MinMaxScaler
import torch.nn as nn
import torch.optim as optim

# Scaling the data
X = sea_temp[['avg_sea_surface_temp']]

scaler = MinMaxScaler()

scaled_data = scaler.fit_transform(X)

# Splitting the data
split_ratio = 0.8
train_size = int(len(sea_temp) * split_ratio)

# Creating sequences
sequence_length = 7

def create_sequences(data, sequence_length):

  total_sequence=[]
  prices = []

  for day in range(0, (len(data) - sequence_length)):
    start_index = day
    end_index = sequence_length + day
    day_sequence = data[start_index:end_index]
    day_price = data[end_index]

    total_sequence.append(day_sequence)
    prices.append(day_price)
  return np.array(total_sequence), np.array(prices)

sequences = create_sequences(scaled_data, sequence_length)
X = sequences[0]
y = sequences[1]

# Split the data into training and testing sets
train_size = int(X.shape[0]* split_ratio)
X_train = torch.tensor(X[:train_size]).float()
y_train = torch.tensor(y[:train_size]).float()
X_test = torch.tensor(X[train_size:]).float()
y_test = torch.tensor(y[train_size:]).float()


# Step 1: Define your sweep configs here
sweep_configs = {
    "name": "BA 865 Project",
    "method": "grid",
    "metric": {"goal": "minimize", "name": "test_accuracy"},
    "parameters": {
        "learning_rate": {"values": [1e-1, 1e-2, 1e-3]},
        "momentum": {"values": [0.85, 0.9, 0.95]},
        "epochs": {"values": [100, 500, 1000]}},
    }

# Step 2: Write a function that contains the code necessary for running a single experiment.
def main():
    # initialize WandB
  #run = wandb.init()

    # 2.1 Define the model
  class MLP(nn.Module):
    def __init__(self, input_size):
      super(MLP, self).__init__()
      self.model = nn.Sequential(
          nn.Linear(input_size, 80),
          nn.Linear(80, 60),
          nn.Linear(60, 1)
      )

    def forward(self, x):
      x = x.view(-1, input_size)
      x = self.model(x)
      return x

      # Instantiate the model
  input_size = sequence_length
  model = MLP(input_size)

    # 2.2 Compile the model
  def train_model(config=None):
    with wandb.init(config=config):
      config = wandb.config
      criterion_training = nn.MSELoss()
      criterion_testing = nn.MSELoss()
      optimizer = optim.SGD(model.parameters(), lr=config.learning_rate, momentum=config.momentum)
      callback = [WandbMetricsLogger()]

     # Putting the y data in the correct format
      y_test_actual = torch.tensor(scaler.inverse_transform(y_test))
      y_train_actual = torch.tensor(scaler.inverse_transform(y_train))

      # Train the model
      for epoch in range(config.epochs):

          optimizer.zero_grad()
          model.train()
          pred = model(X_train)

          predictions_train_actual = torch.tensor(scaler.inverse_transform(pred.detach()))
          actual_train_error = criterion_training(predictions_train_actual, y_train_actual)

          loss = criterion_training(pred, y_train)
          wandb.log({"train_loss": loss, "train_accuracy": actual_train_error})
          # Update model here based on error
          loss.backward()
          optimizer.step()

          model.eval()
          # Evaluate the model on the test data
          with torch.no_grad():
            # Evaluate the model here.
            pred_test = model(X_test)
            loss_test = criterion_testing(pred_test, y_test)

            predictions_test_actual = torch.tensor(scaler.inverse_transform(pred_test.detach()))
            actual_test_error = criterion_testing(predictions_test_actual, y_test_actual)
            wandb.log({"test_loss": loss_test, "test_accuracy": actual_test_error})

            wandb.run.summary["test_accuracy"] = actual_test_error

# Step 3: Initialize sweep by passing in the config.
  sweep_id = wandb.sweep(sweep_configs, project = 'project test 2')
  wandb.agent(sweep_id, function=train_model)

# Step 4: Start sweep job.
main()

Create sweep with ID: yje34zks
Sweep URL: https://wandb.ai/ba-865/project%20test%202/sweeps/yje34zks


[34m[1mwandb[0m: Agent Starting Run: eb5gqj0k with config:
[34m[1mwandb[0m: 	epochs: 500
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	momentum: 0.85
[34m[1mwandb[0m: Currently logged in as: [33msetushah[0m ([33mba-865[0m). Use [1m`wandb login --relogin`[0m to force relogin


VBox(children=(Label(value='0.001 MB of 0.011 MB uploaded\r'), FloatProgress(value=0.11025938140654247, max=1.…

0,1
test_accuracy,█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_loss,█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_accuracy,█▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_loss,█▇▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.07041
test_loss,0.00178
train_accuracy,0.05451
train_loss,0.00138


[34m[1mwandb[0m: Agent Starting Run: m77sha6p with config:
[34m[1mwandb[0m: 	epochs: 500
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	momentum: 0.9


VBox(children=(Label(value='0.001 MB of 0.002 MB uploaded\r'), FloatProgress(value=0.5399737876802096, max=1.0…

0,1
test_accuracy,██▇▇▇▆▆▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁
test_loss,██▇▇▇▆▆▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁
train_accuracy,██▇▇▆▆▆▅▅▅▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁
train_loss,██▇▇▆▆▆▅▅▅▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.0625
test_loss,0.00158
train_accuracy,0.05294
train_loss,0.00134


[34m[1mwandb[0m: Agent Starting Run: 1hpx8cs3 with config:
[34m[1mwandb[0m: 	epochs: 500
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	momentum: 0.95


VBox(children=(Label(value='0.001 MB of 0.002 MB uploaded\r'), FloatProgress(value=0.5394679459223725, max=1.0…

0,1
test_accuracy,██▇▇▇▆▆▅▅▅▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁
test_loss,██▇▇▇▆▆▅▅▅▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁
train_accuracy,██▇▇▆▆▅▅▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_loss,██▇▇▆▆▅▅▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.06109
test_loss,0.00154
train_accuracy,0.05279
train_loss,0.00133


[34m[1mwandb[0m: Agent Starting Run: 2h7lyoho with config:
[34m[1mwandb[0m: 	epochs: 500
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	momentum: 0.85


VBox(children=(Label(value='0.001 MB of 0.011 MB uploaded\r'), FloatProgress(value=0.10902322283283106, max=1.…

0,1
test_accuracy,████▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁
test_loss,████▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁
train_accuracy,███▇▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁
train_loss,███▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁

0,1
test_accuracy,0.06108
test_loss,0.00154
train_accuracy,0.05279
train_loss,0.00133


[34m[1mwandb[0m: Agent Starting Run: ofqevxls with config:
[34m[1mwandb[0m: 	epochs: 500
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	momentum: 0.9


VBox(children=(Label(value='0.001 MB of 0.002 MB uploaded\r'), FloatProgress(value=0.539232781168265, max=1.0)…

0,1
test_accuracy,████▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▁▁▁
test_loss,████▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▁▁▁
train_accuracy,████▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁
train_loss,███▇▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▁▁▁

0,1
test_accuracy,0.06107
test_loss,0.00154
train_accuracy,0.05279
train_loss,0.00133


[34m[1mwandb[0m: Agent Starting Run: zvy4vbj8 with config:
[34m[1mwandb[0m: 	epochs: 500
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	momentum: 0.95


VBox(children=(Label(value='0.001 MB of 0.002 MB uploaded\r'), FloatProgress(value=0.5390318360226777, max=1.0…

0,1
test_accuracy,████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▁▁▁
test_loss,████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▁▁▁
train_accuracy,█████▇▇▇▇▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁
train_loss,████▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁

0,1
test_accuracy,0.06105
test_loss,0.00154
train_accuracy,0.05279
train_loss,0.00133


[34m[1mwandb[0m: Agent Starting Run: cmytkiun with config:
[34m[1mwandb[0m: 	epochs: 500
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	momentum: 0.85


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▆▇▇████████████▇▇▇▇▆▆▆▆▅▅▅▅▄▄▄▃▃▃▃▃▂▂▂▁▁
test_loss,▆▇▇▇██████████▇▇▇▇▆▆▆▆▆▅▅▅▅▄▄▄▃▃▃▃▃▂▂▂▂▁
train_accuracy,▇▇▇▆██▆▆▇▇▆▅▅▅▅▃▄▄▅▅▅▅▃▄▃▄▄▄▃▅▂▂▃▃▃▂▂▂▃▁
train_loss,▆▇▇▅██▅▆▇█▅▅▅▅▅▂▄▅▅▅▆▅▃▄▃▃▃▄▃▅▄▃▃▂▄▂▂▃▃▁

0,1
test_accuracy,0.06105
test_loss,0.00154
train_accuracy,0.05279
train_loss,0.00133


[34m[1mwandb[0m: Agent Starting Run: rfxyci0i with config:
[34m[1mwandb[0m: 	epochs: 500
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	momentum: 0.9


VBox(children=(Label(value='0.001 MB of 0.011 MB uploaded\r'), FloatProgress(value=0.10943730615861763, max=1.…

0,1
test_accuracy,██▇▇▆▅▅▅▅▅▄▄▄▄▄▄▄▄▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁
test_loss,██▇▇▆▅▅▅▅▅▄▄▄▄▄▄▄▄▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁
train_accuracy,▇██▇▇▆▆▆▆▆▆▆▅▅▅▅▅▅▅▄▅▄▅▅▃▃▃▅▄▃▂▂▂▂▃▃▂▂▃▁
train_loss,▆█▇▇▇▅▅▅▇▅▆▆▅▅▅▅▅▅▅▄▅▅▅▅▄▂▃▅▅▃▂▂▂▂▃▂▂▁▃▂

0,1
test_accuracy,0.06105
test_loss,0.00154
train_accuracy,0.05279
train_loss,0.00133


[34m[1mwandb[0m: Agent Starting Run: 0tgxhmlz with config:
[34m[1mwandb[0m: 	epochs: 500
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	momentum: 0.95


VBox(children=(Label(value='0.001 MB of 0.011 MB uploaded\r'), FloatProgress(value=0.10957569315262645, max=1.…

0,1
test_accuracy,███████▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▁▁
test_loss,███████▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▁▁
train_accuracy,█▇████▇▇▇▇▆▆▆▅▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▂▂▃▃▂▁▁▂▁▁▂
train_loss,█▇▇▇▇▇▇▆▆▆▆▆▆▅▆▅▅▅▅▅▄▄▃▃▄▄▄▃▃▂▃▃▂▂▁▁▁▁▁▁

0,1
test_accuracy,0.06105
test_loss,0.00154
train_accuracy,0.05279
train_loss,0.00133


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Sweep Agent: Exiting.


The WandB grid search experiments did not show improvement over our best baseline model number 5.

## Univariate RNN/LSTM

---

### RNN

----

Next, we implement a more advanced technique called Recurrent Neural Networks. RNNs are specialized neural networks that maintain an internal memory that stores information from past inputs and updates it as new inputs arrive.

RNNs accept sequences as input, with each element representing a time step. They feature recurrent connections that enable updates at each time step based on the current input and the previous hidden state.

During training, RNNs are optimized using backpropagation, which involves unfolding the network over time and updating the weights to minimize prediction errors.







In [9]:
# Creating sequences
sequence_length = 7

def create_sequences(data, sequence_length):

  total_sequence=[]
  prices = []

  for day in range(0, (len(data) - sequence_length)):
    start_index = day
    end_index = sequence_length + day
    day_sequence = data[start_index:end_index]
    day_price = data[end_index]

    total_sequence.append(day_sequence)
    prices.append(day_price)
  return np.array(total_sequence), np.array(prices)

sequences = create_sequences(scaled_data, sequence_length)
X = sequences[0]
y = sequences[1]

# Split the data into training and testing sets
train_size = int(X.shape[0]* split_ratio)
X_train = torch.tensor(X[:train_size]).float()
y_train = torch.tensor(y[:train_size]).float()
X_test = torch.tensor(X[train_size:]).float()
y_test = torch.tensor(y[train_size:]).float()

In [10]:
import torch
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)

        # Forward propagate RNN
        out, _ = self.rnn(x, h0)

        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        return out

# Example usage:
# Define input parameters
input_size = 1  # Number of features in input data (e.g., time series)
hidden_size = 64  # Number of hidden units in the RNN
num_layers = 2  # Number of RNN layers
output_size = 1  # Number of features in output data (e.g., regression target)

# Instantiate the model
model = RNN(input_size, hidden_size, num_layers, output_size)

# Print model architecture
print(model)

RNN(
  (rnn): RNN(1, 64, num_layers=2, batch_first=True)
  (fc): Linear(in_features=64, out_features=1, bias=True)
)


In [11]:
import torch.optim as optim

# Define the loss function
criterion = nn.MSELoss()

# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 50
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_train)
    loss = criterion(outputs, y_train)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Calculate training accuracy
    train_accuracy = torch.mean(torch.abs(outputs - y_train)).item()

    # Evaluate on test set
    with torch.no_grad():
        model.eval()  # Set the model to evaluation mode
        test_outputs = model(X_test)
        test_accuracy = torch.mean(torch.abs(test_outputs - y_test)).item()

    # Print progress
    print(f'Epoch [{epoch+1}/{num_epochs}], Training Accuracy (MSE): {train_accuracy:.4f}, Test Accuracy (MSE): {test_accuracy:.4f}')

Epoch [1/50], Training Accuracy (MSE): 0.5085, Test Accuracy (MSE): 0.2251
Epoch [2/50], Training Accuracy (MSE): 0.4084, Test Accuracy (MSE): 0.1516
Epoch [3/50], Training Accuracy (MSE): 0.3391, Test Accuracy (MSE): 0.1450
Epoch [4/50], Training Accuracy (MSE): 0.2884, Test Accuracy (MSE): 0.1776
Epoch [5/50], Training Accuracy (MSE): 0.2588, Test Accuracy (MSE): 0.2493
Epoch [6/50], Training Accuracy (MSE): 0.2483, Test Accuracy (MSE): 0.3387
Epoch [7/50], Training Accuracy (MSE): 0.2582, Test Accuracy (MSE): 0.3713
Epoch [8/50], Training Accuracy (MSE): 0.2657, Test Accuracy (MSE): 0.3552
Epoch [9/50], Training Accuracy (MSE): 0.2566, Test Accuracy (MSE): 0.3104
Epoch [10/50], Training Accuracy (MSE): 0.2391, Test Accuracy (MSE): 0.2538
Epoch [11/50], Training Accuracy (MSE): 0.2249, Test Accuracy (MSE): 0.2010
Epoch [12/50], Training Accuracy (MSE): 0.2179, Test Accuracy (MSE): 0.1679
Epoch [13/50], Training Accuracy (MSE): 0.2166, Test Accuracy (MSE): 0.1460
Epoch [14/50], Traini

This is our best model so far. The RNN model is able to capture the complex relationship between the sequenced data that is fed into it.

### LSTM

----

Long Short-Term Memory (LSTM) networks are a type of RNN designed to address the vanishing gradient problem and capture long-term dependencies in sequential data.

Unlike traditional RNNs, LSTMs have specialized memory cells with self-gating mechanisms that regulate the flow of information over time. These memory cells maintain a constant error flow, allowing them to retain information over long sequences without suffering from vanishing gradients.

By selectively updating and forgetting information, LSTMs can effectively capture dependencies across multiple time steps, making them well-suited for our task involving sequential data for sea temperature prediction.

In [12]:
import torch.optim as optim
import torch.nn as nn

# Define the LSTM model
class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(LSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out[:, -1, :])
        return out

# Example usage:
# Define input parameters
input_size = 1  # Number of features in input data (e.g., time series)
hidden_size = 64  # Number of hidden units in the LSTM
num_layers = 2  # Number of LSTM layers
output_size = 1  # Number of features in output data (e.g., regression target)

# Instantiate the model
model = LSTM(input_size, hidden_size, num_layers, output_size)

# Print model architecture
print(model)

LSTM(
  (lstm): LSTM(1, 64, num_layers=2, batch_first=True)
  (fc): Linear(in_features=64, out_features=1, bias=True)
)


In [13]:
# Define the loss function
criterion = nn.MSELoss()

# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 50
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_train)
    loss = criterion(outputs, y_train)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Calculate training accuracy
    train_accuracy = torch.mean(torch.abs(outputs - y_train)).item()

    # Evaluate on test set
    with torch.no_grad():
        model.eval()  # Set the model to evaluation mode
        test_outputs = model(X_test)
        test_accuracy = torch.mean(torch.abs(test_outputs - y_test)).item()

    # Print progress
    print(f'Epoch [{epoch+1}/{num_epochs}], Training MSE: {train_accuracy:.4f}, Test MSE: {test_accuracy:.4f}')

Epoch [1/50], Training MSE: 0.5295, Test MSE: 0.3262
Epoch [2/50], Training MSE: 0.5063, Test MSE: 0.3030
Epoch [3/50], Training MSE: 0.4840, Test MSE: 0.2795
Epoch [4/50], Training MSE: 0.4619, Test MSE: 0.2553
Epoch [5/50], Training MSE: 0.4407, Test MSE: 0.2303
Epoch [6/50], Training MSE: 0.4210, Test MSE: 0.2051
Epoch [7/50], Training MSE: 0.4023, Test MSE: 0.1844
Epoch [8/50], Training MSE: 0.3840, Test MSE: 0.1670
Epoch [9/50], Training MSE: 0.3653, Test MSE: 0.1543
Epoch [10/50], Training MSE: 0.3471, Test MSE: 0.1483
Epoch [11/50], Training MSE: 0.3295, Test MSE: 0.1486
Epoch [12/50], Training MSE: 0.3111, Test MSE: 0.1503
Epoch [13/50], Training MSE: 0.2921, Test MSE: 0.1600
Epoch [14/50], Training MSE: 0.2772, Test MSE: 0.1795
Epoch [15/50], Training MSE: 0.2644, Test MSE: 0.2077
Epoch [16/50], Training MSE: 0.2549, Test MSE: 0.2522
Epoch [17/50], Training MSE: 0.2494, Test MSE: 0.3053
Epoch [18/50], Training MSE: 0.2514, Test MSE: 0.3384
Epoch [19/50], Training MSE: 0.2563, 

From the above, we observe that the baseline LSTM model performs worse than the RNN.

Hence, we decided to increase the model complexity by introducing more layers, normalization, and bidirectionality (and dropout layers to prevent overfitting).
Bidirectionality refers to the ability of the LSTM model to process input sequences in both forward and backward directions. By incorporating information from both past and future contexts, we can now capture dependencies that may not be evident in a unidirectional model.

In [14]:
# This code is from ChatGPT

class ComplexLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, bidirectional=True, dropout=0.0):
        super(ComplexLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.bidirectional = bidirectional

        # LSTM layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=bidirectional)

        # Fully connected layer
        self.fc = nn.Linear(hidden_size * (2 if bidirectional else 1), output_size)

        # Layer normalization
        self.layer_norm = nn.LayerNorm(hidden_size * (2 if bidirectional else 1))

        # Dropout
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        # Initialize hidden state and cell state
        h0 = torch.zeros(self.num_layers * (2 if self.bidirectional else 1), x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(self.num_layers * (2 if self.bidirectional else 1), x.size(0), self.hidden_size).to(x.device)

        # LSTM layer
        lstm_out, _ = self.lstm(x, (h0, c0))

        # Apply layer normalization
        lstm_out = self.layer_norm(lstm_out)

        # Apply dropout
        lstm_out = self.dropout(lstm_out)

        # Decode the hidden state of the last time step
        out = self.fc(lstm_out[:, -1, :])
        return out

# Define input parameters
input_size = 1  # Number of features in input data
hidden_size = 64  # Number of hidden units in the LSTM
num_layers = 2  # Number of LSTM layers
output_size = 1  # Number of features in output data

# Instantiate the model
model = ComplexLSTM(input_size, hidden_size, num_layers, output_size, bidirectional=True, dropout=0.2)

# Print model architecture
print(model)

ComplexLSTM(
  (lstm): LSTM(1, 64, num_layers=2, batch_first=True, bidirectional=True)
  (fc): Linear(in_features=128, out_features=1, bias=True)
  (layer_norm): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
  (dropout): Dropout(p=0.2, inplace=False)
)


In [15]:
# Define the loss function
criterion = nn.MSELoss()

# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 50
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_train)
    loss = criterion(outputs, y_train)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Calculate training accuracy
    train_accuracy = torch.mean(torch.abs(outputs - y_train)).item()

    # Evaluate on test set
    with torch.no_grad():
        model.eval()  # Set the model to evaluation mode
        test_outputs = model(X_test)
        test_accuracy = torch.mean(torch.abs(test_outputs - y_test)).item()

    # Print progress
    print(f'Epoch [{epoch+1}/{num_epochs}], Training MSE: {train_accuracy:.4f}, Test MSE: {test_accuracy:.4f}')

Epoch [1/50], Training MSE: 0.4246, Test MSE: 0.1859
Epoch [2/50], Training MSE: 0.4080, Test MSE: 0.1549
Epoch [3/50], Training MSE: 0.2991, Test MSE: 0.2640
Epoch [4/50], Training MSE: 0.2401, Test MSE: 0.3773
Epoch [5/50], Training MSE: 0.2566, Test MSE: 0.3184
Epoch [6/50], Training MSE: 0.2144, Test MSE: 0.1662
Epoch [7/50], Training MSE: 0.1393, Test MSE: 0.0625
Epoch [8/50], Training MSE: 0.1277, Test MSE: 0.1070
Epoch [9/50], Training MSE: 0.1763, Test MSE: 0.1148
Epoch [10/50], Training MSE: 0.1672, Test MSE: 0.0526
Epoch [11/50], Training MSE: 0.0864, Test MSE: 0.0565
Epoch [12/50], Training MSE: 0.0386, Test MSE: 0.1285
Epoch [13/50], Training MSE: 0.1211, Test MSE: 0.1421
Epoch [14/50], Training MSE: 0.1444, Test MSE: 0.0930
Epoch [15/50], Training MSE: 0.1063, Test MSE: 0.0525
Epoch [16/50], Training MSE: 0.0725, Test MSE: 0.0866
Epoch [17/50], Training MSE: 0.0867, Test MSE: 0.1284
Epoch [18/50], Training MSE: 0.1183, Test MSE: 0.1202
Epoch [19/50], Training MSE: 0.1115, 

Our model performance improves significantly than the baseline. However, we notice that there is some bouncing around during the convergence. Our test MSE in the final epoch is actually higher than some previous epochs.

Notice that this is our best overall model now. We made a marginal improvement over our RNN model in the previous section.

## Random Forest

---

In the next phase, we wanted used Random Forest as a means to get feature importance for the other variables in our dataset. It would also work as a baseline comparison to our MLP model, to verify whether using an MLP (and RNN/LSTM) model is necessary or useful.

In [None]:
# Checking dataframe
sea_variables.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 345646 entries, 0 to 345645
Data columns (total 12 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   month                     345646 non-null  int64  
 1   day                       345646 non-null  int64  
 2   timestamp                 345646 non-null  object 
 3   avg_sea_surface_temp      345646 non-null  float64
 4   avg_wind_direction_true   200954 non-null  float64
 5   avg_wind_speed            200622 non-null  float64
 6   avg_visibility            18775 non-null   float64
 7   avg_sea_level_pressure    198507 non-null  float64
 8   avg_air_temperature       204296 non-null  float64
 9   avg_wetbulb_temperature   15669 non-null   float64
 10  avg_dewpoint_temperature  141826 non-null  float64
 11  avg_total_cloud_amount    11877 non-null   float64
dtypes: float64(9), int64(2), object(1)
memory usage: 31.6+ MB


In [None]:
# Checking for missing values
sea_variables.isna().sum() / len(sea_variables)

month                       0.000000
day                         0.000000
timestamp                   0.000000
avg_sea_surface_temp        0.000000
avg_wind_direction_true     0.418613
avg_wind_speed              0.419574
avg_visibility              0.945681
avg_sea_level_pressure      0.425693
avg_air_temperature         0.408944
avg_wetbulb_temperature     0.954667
avg_dewpoint_temperature    0.589678
avg_total_cloud_amount      0.965638
dtype: float64

In [None]:
# Dropping timestamp column
sea_variables = sea_variables.drop(columns=["timestamp"])

In [None]:
# Aggregating by month and day
day_agg_df = sea_variables.groupby(["month", "day"]).mean().reset_index()

In [None]:
# Rechecking for missing values
day_agg_df.isna().sum() / len(day_agg_df)

month                       0.0
day                         0.0
avg_sea_surface_temp        0.0
avg_wind_direction_true     0.0
avg_wind_speed              0.0
avg_visibility              0.0
avg_sea_level_pressure      0.0
avg_air_temperature         0.0
avg_wetbulb_temperature     0.0
avg_dewpoint_temperature    0.0
avg_total_cloud_amount      0.0
dtype: float64

In [None]:
# Dropping columns for modeling
day_agg_df = day_agg_df.drop(columns=["month", "day"])

In [None]:
# Lagging the data by one day so today's features are predicting tomorrow's temperature
day_agg_df["avg_sea_surface_temp"] = day_agg_df["avg_sea_surface_temp"].shift(periods=1)

In [None]:
# Drop missing values
day_agg_df = day_agg_df.dropna()

In [None]:
from sklearn.model_selection import train_test_split

# Splitting dataset for training and testing
X = day_agg_df.drop("avg_sea_surface_temp", axis=1)
y = day_agg_df["avg_sea_surface_temp"].copy()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline
from sklearn.metrics import mean_absolute_error

# Preprocessing
std_scaler = StandardScaler()
forest_reg = make_pipeline(std_scaler, RandomForestRegressor(random_state=42))

# Fitting Random Forest model
forest_reg.fit(X_train, y_train)
y_train_predictions = forest_reg.predict(X_train)
forest_mae = mean_absolute_error(y_train, y_train_predictions)

print(f"The training data MAE is {forest_mae} or about {(forest_mae/y_train.mean()*100):.2f}% error")

The training data MAE is 0.17013065637858463 or about 0.84% error


In [None]:
# Testing model
y_test_predictions = forest_reg.predict(X_test)
forest_test_mae = mean_absolute_error(y_test, y_test_predictions)

print(f"The test data MAE is {forest_test_mae} or about {(forest_test_mae/y_test.mean()*100):.2f}% error")

The test data MAE is 0.4447973378657917 or about 2.22% error


In [None]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

# Performing Random Search
param_distribs = {'randomforestregressor__max_depth': randint(low=1, high=50),
                  'randomforestregressor__min_samples_leaf': randint(low=1, high=20)}

rnd_search = RandomizedSearchCV(
    forest_reg, param_distributions=param_distribs, n_iter=50, cv=3,
    scoring='neg_mean_absolute_error', random_state=42)

rnd_search.fit(X_train, y_train)

In [None]:
# Print results of random search
rnd_res = pd.DataFrame(rnd_search.cv_results_)
rnd_res.sort_values(by="mean_test_score", ascending=False, inplace=True)
rnd_res.filter(regex = '(^param_|mean_test_score)', axis=1).head(10)

Unnamed: 0,param_randomforestregressor__max_depth,param_randomforestregressor__min_samples_leaf,mean_test_score
15,44,3,-0.451236
5,40,3,-0.451236
46,35,3,-0.451236
6,22,2,-0.45393
20,9,2,-0.454246
42,9,1,-0.454271
40,45,1,-0.455363
25,17,4,-0.455672
4,11,4,-0.455672
28,42,4,-0.455672


We observe that a `max depth` of 35 and `min samples leaf` of 3 gave us the best results from our random search.

A max depth of 40 and 44 also give us the same results, but we choose the simpler (less complex) model to hopefully make it more generalizable and less prone to overfit.

In [None]:
# Get the best estimator from the Random Search
best_pipeline = rnd_search.best_estimator_

# Get the last estimator (RandomForestRegressor) from the pipeline
best_rf = best_pipeline.named_steps['randomforestregressor']

# Get feature importances
feature_importances = best_rf.feature_importances_

# Create a dictionary mapping feature names to importance scores
feature_importance_dict = dict(zip(X.columns, feature_importances))

# Sort the dictionary by importance scores (optional)
sorted_feature_importance = dict(sorted(feature_importance_dict.items(), key=lambda item: item[1], reverse=True))

# Print or visualize feature importance
print(sorted_feature_importance)

{'avg_air_temperature': 0.8208387869343166, 'avg_dewpoint_temperature': 0.10440176338677022, 'avg_wetbulb_temperature': 0.028733797098179973, 'avg_visibility': 0.014904682783444116, 'avg_wind_speed': 0.01101926903110611, 'avg_sea_level_pressure': 0.008243619775776687, 'avg_total_cloud_amount': 0.007772792837108387, 'avg_wind_direction_true': 0.004085288153297883}


From the feature importances above, we observe that only `average air temperature` is a good predictor of average sea temperatures. The other variables in our model add marginal predictive power for our target variable.

## Multivariable RNN and LSTM

---

In our next phase, we wanted to add average air temperature to our model to see whether it would improve our predictive ability.

### RNN

---

In [None]:
# Dropping additional columns based on feature importances
sea_variables.drop(columns = ['avg_wind_direction_true',	'avg_wind_speed',
                        'avg_visibility',	'avg_sea_level_pressure',	'avg_wetbulb_temperature',
                              'avg_dewpoint_temperature',	'avg_total_cloud_amount'], inplace = True)

In [None]:
# Group by month and day
sea_variables = sea_variables.groupby(['month', 'day']).agg('mean').reset_index()

In [None]:
from sklearn.preprocessing import MinMaxScaler
# Scaling the data

X = sea_variables[['avg_sea_surface_temp', 'avg_air_temperature']]

scaler = MinMaxScaler()

scaled_data = scaler.fit_transform(X)

# Splitting the data
split_ratio = 0.8
train_size = int(len(sea_temp) * split_ratio)

In [None]:
# Creating sequences
sequence_length = 7

def create_sequences(data, sequence_length):

  total_sequence=[]
  prices = []

  for day in range(0, (len(data) - sequence_length)):
    start_index = day
    end_index = sequence_length + day
    day_sequence = data[start_index:end_index]
    day_price = data[end_index]

    total_sequence.append(day_sequence)
    prices.append(day_price)
  return np.array(total_sequence), np.array(prices)

sequences = create_sequences(scaled_data, sequence_length)
X = sequences[0]
y = sequences[1]

In [None]:
import torch

# Split the data into training and testing sets
train_size = int(X.shape[0]* split_ratio)
X_train = torch.tensor(X[:train_size]).float()
y_train = torch.tensor(y[:train_size]).float()
X_test = torch.tensor(X[train_size:]).float()
y_test = torch.tensor(y[train_size:]).float()

In [None]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

(torch.Size([286, 7, 2]),
 torch.Size([72, 7, 2]),
 torch.Size([286, 2]),
 torch.Size([72, 2]))

In [None]:
import torch
import torch.nn as nn

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)

        # Forward propagate RNN
        out, _ = self.rnn(x, h0)

        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        return out

# Example usage:
# Define input parameters
input_size = 2  # Number of features in input data (e.g., time series)
hidden_size = 64  # Number of hidden units in the RNN
num_layers = 2 # Number of RNN layers
output_size = 2  # Number of features in output data (e.g., regression target)

# Instantiate the model
model = RNN(input_size, hidden_size, num_layers, output_size)

# Print model architecture
print(model)

RNN(
  (rnn): RNN(2, 64, num_layers=2, batch_first=True)
  (fc): Linear(in_features=64, out_features=2, bias=True)
)


In [None]:
import torch.optim as optim

# Define the loss function
criterion = nn.MSELoss()

# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
num_epochs = 50
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_train)
    loss = criterion(outputs, y_train)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Calculate training accuracy (MAE)
    train_accuracy = torch.mean(torch.abs(outputs - y_train)).item()

    # Evaluate on test set
    with torch.no_grad():
        model.eval()  # Set the model to evaluation mode
        test_outputs = model(X_test)
        test_accuracy = torch.mean(torch.abs(test_outputs - y_test)).item()

    # Print progress
    print(f'Epoch [{epoch+1}/{num_epochs}], Training Accuracy (MSE): {train_accuracy:.4f}, Test Accuracy (MSE): {test_accuracy:.4f}')

Epoch [1/50], Training Accuracy (MSE): 0.6655, Test Accuracy (MSE): 0.1124
Epoch [2/50], Training Accuracy (MSE): 0.2071, Test Accuracy (MSE): 1.0762
Epoch [3/50], Training Accuracy (MSE): 0.9787, Test Accuracy (MSE): 0.2670
Epoch [4/50], Training Accuracy (MSE): 0.2979, Test Accuracy (MSE): 0.2848
Epoch [5/50], Training Accuracy (MSE): 0.3957, Test Accuracy (MSE): 0.2402
Epoch [6/50], Training Accuracy (MSE): 0.3638, Test Accuracy (MSE): 0.1358
Epoch [7/50], Training Accuracy (MSE): 0.2823, Test Accuracy (MSE): 0.1252
Epoch [8/50], Training Accuracy (MSE): 0.2107, Test Accuracy (MSE): 0.2691
Epoch [9/50], Training Accuracy (MSE): 0.2230, Test Accuracy (MSE): 0.3305
Epoch [10/50], Training Accuracy (MSE): 0.2482, Test Accuracy (MSE): 0.2475
Epoch [11/50], Training Accuracy (MSE): 0.2181, Test Accuracy (MSE): 0.1279
Epoch [12/50], Training Accuracy (MSE): 0.2053, Test Accuracy (MSE): 0.1130
Epoch [13/50], Training Accuracy (MSE): 0.2335, Test Accuracy (MSE): 0.0985
Epoch [14/50], Traini

We observe that our multivariate RNN model is actually slightly worse than our univariate RNN model. We did not find the expected improvement in predictive power.

### LSTM

---

In [None]:
import torch.optim as optim
import torch.nn as nn

# Define the LSTM model
class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(LSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out[:, -1, :])
        return out

# Example usage:
# Define input parameters
input_size = 2  # Number of features in input data (e.g., time series)
hidden_size = 64  # Number of hidden units in the LSTM
num_layers = 2  # Number of LSTM layers
output_size = 2  # Number of features in output data (e.g., regression target)

# Instantiate the model
model = LSTM(input_size, hidden_size, num_layers, output_size)

# Print model architecture
print(model)

LSTM(
  (lstm): LSTM(2, 64, num_layers=2, batch_first=True)
  (fc): Linear(in_features=64, out_features=2, bias=True)
)


In [None]:
# Define the loss function
criterion = nn.MSELoss()

# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
num_epochs = 50
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_train)
    loss = criterion(outputs, y_train)

    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Calculate training accuracy
    train_accuracy = torch.mean(torch.abs(outputs - y_train)).item()

    # Evaluate on test set
    with torch.no_grad():
        model.eval()  # Set the model to evaluation mode
        test_outputs = model(X_test)
        test_accuracy = torch.mean(torch.abs(test_outputs - y_test)).item()

    # Print progress
    print(f'Epoch [{epoch+1}/{num_epochs}], Training MSE: {train_accuracy:.4f}, Test MSE: {test_accuracy:.4f}')

Epoch [1/50], Training MSE: 0.6287, Test MSE: 0.3574
Epoch [2/50], Training MSE: 0.4795, Test MSE: 0.1719
Epoch [3/50], Training MSE: 0.2684, Test MSE: 0.7042
Epoch [4/50], Training MSE: 0.6153, Test MSE: 0.2377
Epoch [5/50], Training MSE: 0.2104, Test MSE: 0.1717
Epoch [6/50], Training MSE: 0.2657, Test MSE: 0.2335
Epoch [7/50], Training MSE: 0.3339, Test MSE: 0.2534
Epoch [8/50], Training MSE: 0.3569, Test MSE: 0.2502
Epoch [9/50], Training MSE: 0.3524, Test MSE: 0.2334
Epoch [10/50], Training MSE: 0.3317, Test MSE: 0.2072
Epoch [11/50], Training MSE: 0.3014, Test MSE: 0.1766
Epoch [12/50], Training MSE: 0.2669, Test MSE: 0.1468
Epoch [13/50], Training MSE: 0.2295, Test MSE: 0.1374
Epoch [14/50], Training MSE: 0.2001, Test MSE: 0.1962
Epoch [15/50], Training MSE: 0.1947, Test MSE: 0.2580
Epoch [16/50], Training MSE: 0.2177, Test MSE: 0.2739
Epoch [17/50], Training MSE: 0.2308, Test MSE: 0.2476
Epoch [18/50], Training MSE: 0.2160, Test MSE: 0.1967
Epoch [19/50], Training MSE: 0.1904, 

Similar to the case with RNN, the multivariate LSTM model did not perform better than the univariate model.

These results are interesting and perhaps they would change if we had more data or domain expertise to fine tune the models.

### WandB Multivariable LSTM

---

Next, we wanted to perform a grid search over the hyperparameters and log the results in WandB for tracking the experiments.

[WandB Report](https://api.wandb.ai/links/ba-865/rvp6vcr0)

In [None]:
# Install WandB
%%capture
!pip install wandb

In [None]:
import wandb
from wandb.keras import WandbMetricsLogger
import torch
from sklearn.preprocessing import MinMaxScaler
import torch.nn as nn
import torch.optim as optim

# Scaling the data
X = sea_variables[['avg_sea_surface_temp', 'avg_air_temperature']]

scaler = MinMaxScaler()

scaled_data = scaler.fit_transform(X)

# Splitting the data
split_ratio = 0.8
train_size = int(len(sea_temp) * split_ratio)

# Creating sequences
sequence_length = 7

def create_sequences(data, sequence_length):

  total_sequence=[]
  prices = []

  for day in range(0, (len(data) - sequence_length)):
    start_index = day
    end_index = sequence_length + day
    day_sequence = data[start_index:end_index]
    day_price = data[end_index]

    total_sequence.append(day_sequence)
    prices.append(day_price)
  return np.array(total_sequence), np.array(prices)

sequences = create_sequences(scaled_data, sequence_length)
X = sequences[0]
y = sequences[1]

# Split the data into training and testing sets
train_size = int(X.shape[0]* split_ratio)
X_train = torch.tensor(X[:train_size]).float()
y_train = torch.tensor(y[:train_size]).float()
X_test = torch.tensor(X[train_size:]).float()
y_test = torch.tensor(y[train_size:]).float()


# Step 1: Define your sweep configs here
sweep_configs = {
    "name": "BA 865 Project",
    "method": "grid",
    "metric": {"goal": "minimize", "name": "test_accuracy"},
    "parameters": {
        "learning_rate": {"values": [1e-1, 1e-2, 1e-3]},
        "momentum": {"values": [0.85, 0.9, 0.95]},
        "epochs": {"values": [20, 50, 100]}},
    }

# Step 2: Write a function that contains the code necessary for running a single experiment.
def main():
    # 2.1 Define the model
  class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size):
        super(LSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out[:, -1, :])
        return out

  # Define input parameters
  input_size = 2  # Number of features in input data (e.g., time series)
  hidden_size = 64  # Number of hidden units in the LSTM
  num_layers = 2  # Number of LSTM layers
  output_size = 2  # Number of features in output data (e.g., regression target)

  # Instantiate the model
  model = LSTM(input_size, hidden_size, num_layers, output_size)

    # 2.2 Compile the model
  def train_model(config=None):
    with wandb.init(config=config):
      config = wandb.config
      # Define the loss function
      criterion = nn.MSELoss()

      # Define the optimizer
      optimizer = optim.Adam(model.parameters(), lr=config.learning_rate)

      # Training loop
      for epoch in range(config.epochs):
          # Forward pass
          outputs = model(X_train)
          loss = criterion(outputs, y_train)

          # Backward pass and optimization
          optimizer.zero_grad()
          loss.backward()
          optimizer.step()

          # Calculate training accuracy
          train_accuracy = torch.mean(torch.abs(outputs - y_train)).item()
          wandb.log({"train_loss": loss, "train_accuracy": train_accuracy})


          # Evaluate on test set
          with torch.no_grad():
              model.eval()  # Set the model to evaluation mode
              test_outputs = model(X_test)
              test_loss = criterion(test_outputs, y_test)
              test_accuracy = torch.mean(torch.abs(test_outputs - y_test)).item()
              wandb.log({"test_loss": test_loss, "test_accuracy": test_accuracy})

              wandb.run.summary["test_accuracy"] = test_accuracy

# Step 3: Initialize sweep by passing in the config.
  sweep_id = wandb.sweep(sweep_configs, project = 'project LSTM')
  wandb.agent(sweep_id, function=train_model)

# Step 4: Start sweep job.
main()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Create sweep with ID: 99jb6ut9
Sweep URL: https://wandb.ai/ba-865/project%20LSTM/sweeps/99jb6ut9


[34m[1mwandb[0m: Agent Starting Run: metf66z5 with config:
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	momentum: 0.85
[34m[1mwandb[0m: Currently logged in as: [33msetushah[0m ([33mba-865[0m). Use [1m`wandb login --relogin`[0m to force relogin


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,█▁▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁
test_loss,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_accuracy,▂█▁▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁
train_loss,▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.32529
test_loss,0.1259
train_accuracy,0.30349
train_loss,0.14403


[34m[1mwandb[0m: Agent Starting Run: wzj4k2ep with config:
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	momentum: 0.9


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▁▅▅▆██▆▃▁▂▃▄▄▄▄▄▄▃▂▂
test_loss,▁▃▅▇█▇▄▂▁▂▃▅▅▅▄▃▃▂▂▂
train_accuracy,▆▅█▅▂▁▂▂▂▂▃▄▃▂▁▁▁▁▁▂
train_loss,█▅▆▃▃▃▃▃▂▂▂▂▂▂▁▁▁▂▁▁

0,1
test_accuracy,0.14304
test_loss,0.03484
train_accuracy,0.24943
train_loss,0.08253


[34m[1mwandb[0m: Agent Starting Run: 3fyop2q3 with config:
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	momentum: 0.95


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,█▃▁▂▂▁▂▄▅▄▃▂▁▁▁▁▁▂▂▃
test_loss,█▂▁▂▁▁▂▃▄▄▃▂▁▁▁▁▁▁▂▂
train_accuracy,▂█▁▄▇▅▂▁▁▂▂▁▁▂▃▃▂▂▁▁
train_loss,▁█▁▂▅▄▂▁▂▃▃▂▁▁▂▂▂▁▁▁

0,1
test_accuracy,0.21181
test_loss,0.06779
train_accuracy,0.24484
train_loss,0.08577


[34m[1mwandb[0m: Agent Starting Run: rzk9ba6b with config:
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	momentum: 0.85


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,█▅▃▁▁▁▁▁▂▃▃▄▅▅▅▅▅▄▄▃
test_loss,█▅▃▂▁▁▁▂▂▃▄▅▅▆▆▅▅▄▄▃
train_accuracy,▂▁▂▄▆▇██▇▆▄▃▃▂▂▂▂▂▂▃
train_loss,█▄▁▁▂▃▃▃▂▁▁▁▁▁▂▂▂▁▁▁

0,1
test_accuracy,0.1487
test_loss,0.03652
train_accuracy,0.24751
train_loss,0.08128


[34m[1mwandb[0m: Agent Starting Run: 4d997izo with config:
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	momentum: 0.9


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,█▃▂▃▃▂▃▄▄▃▂▁▁▂▃▃▃▃▃▂
test_loss,█▄▂▁▁▃▄▅▆▅▄▃▂▂▂▃▄▄▅▄
train_accuracy,▃█▄▂▁▁▂▄▅▅▅▄▃▂▂▂▂▃▄▄
train_loss,▁█▂▂▅▃▁▁▂▃▂▁▁▂▂▂▁▁▁▂

0,1
test_accuracy,0.14948
test_loss,0.0386
train_accuracy,0.2492
train_loss,0.08128


[34m[1mwandb[0m: Agent Starting Run: 3290n2sn with config:
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	momentum: 0.95


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▃▂▆█▆▄▂▂▁▁▂▃▅▅▅▄▃▂▁▁
test_loss,▁▃▆█▇▅▄▃▂▃▄▅▆▆▆▅▄▃▃▃
train_accuracy,▆▁▃▆█▇▅▃▂▂▃▄▅▆▆▅▄▃▃▃
train_loss,▂█▂▂▅▃▁▁▃▃▂▁▁▂▂▁▁▁▂▂

0,1
test_accuracy,0.14875
test_loss,0.03656
train_accuracy,0.24804
train_loss,0.08127


[34m[1mwandb[0m: Agent Starting Run: fk7qbouk with config:
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	momentum: 0.85


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▂▃▅▇███▇▅▄▂▁▁▁▂▄▅▆▇█
test_loss,▁▄▇██▇▆▄▃▂▁▁▁▂▃▄▅▆▆▆
train_accuracy,▁▄▇█▇▆▅▃▃▃▃▄▅▆▆▇▆▆▅▄
train_loss,█▂▁▃▄▃▂▁▁▂▂▂▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14977
test_loss,0.03774
train_accuracy,0.2485
train_loss,0.0812


[34m[1mwandb[0m: Agent Starting Run: exzwzcjp with config:
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	momentum: 0.9


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▁▃▆██▆▄▃▃▄▅▆▇▇▆▅▄▄▄▄
test_loss,▁▃▆██▆▄▃▃▄▅▆▇▇▆▅▄▄▄▅
train_accuracy,▃█▅▂▁▂▃▅▆▆▅▄▃▂▂▃▄▄▅▅
train_loss,▂█▂▂▄▃▁▁▃▃▂▁▁▂▂▁▁▁▁▂

0,1
test_accuracy,0.1491
test_loss,0.0374
train_accuracy,0.24863
train_loss,0.08119


[34m[1mwandb[0m: Agent Starting Run: vyxp1jwx with config:
[34m[1mwandb[0m: 	epochs: 20
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	momentum: 0.95


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,█▅▂▁▂▃▄▆▆▆▅▄▃▃▃▃▄▄▅▅
test_loss,█▅▂▁▁▃▄▆▆▆▅▄▃▃▂▃▃▄▅▅
train_accuracy,▆▁▄▇█▇▆▄▃▃▃▄▆▆▇▆▆▅▄▄
train_loss,▁█▁▂▅▄▁▁▂▃▂▁▁▁▂▂▁▁▁▁

0,1
test_accuracy,0.14965
test_loss,0.03765
train_accuracy,0.24845
train_loss,0.08119


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: hfwgvtua with config:
[34m[1mwandb[0m: 	epochs: 50
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	momentum: 0.85


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▄▁▅█▄▁▁▂▁▁▂▃▄▃▂▁▁▁▁▁▂▃▃▂▁▁▁▁▂▂▂▂▁▁▁▁▁▂▂▂
test_loss,▄▁▅█▄▂▁▁▁▁▂▃▄▃▂▁▁▁▁▁▂▃▃▂▁▁▁▁▂▂▂▂▂▁▁▁▂▂▂▂
train_accuracy,▁█▂▂▃▁▁▂▄▃▂▁▁▁▁▁▂▂▂▂▁▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁
train_loss,▁█▁▂▄▂▁▂▃▂▁▁▂▂▁▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.1591
test_loss,0.04255
train_accuracy,0.24625
train_loss,0.08194


[34m[1mwandb[0m: Agent Starting Run: z8j4p3zi with config:
[34m[1mwandb[0m: 	epochs: 50
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	momentum: 0.9


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▄▁▅█▄▁▁▂▁▁▃▄▄▂▁▁▁▁▁▂▃▃▂▂▁▁▁▁▂▂▂▂▁▁▁▁▂▂▂▁
test_loss,▄▁▅█▃▁▁▁▁▁▃▄▄▂▂▁▁▁▁▂▃▃▂▂▁▁▁▁▂▂▂▂▁▁▁▁▂▂▂▂
train_accuracy,▁█▂▂▃▁▁▃▃▂▁▁▁▁▁▁▂▂▂▁▁▁▁▁▁▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁
train_loss,▁█▁▃▄▂▁▂▃▂▁▁▂▂▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14947
test_loss,0.03755
train_accuracy,0.24716
train_loss,0.08144


[34m[1mwandb[0m: Agent Starting Run: nw5l5k1b with config:
[34m[1mwandb[0m: 	epochs: 50
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	momentum: 0.95


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▄▁▅█▄▁▁▂▁▁▂▃▄▃▂▁▁▁▁▁▂▃▃▂▁▁▁▁▂▂▂▂▁▁▁▁▁▂▂▂
test_loss,▄▁▅█▄▂▁▁▁▁▂▃▄▃▂▁▁▁▁▁▂▃▃▂▁▁▁▁▂▂▂▂▂▁▁▁▁▂▂▂
train_accuracy,▁█▂▂▃▁▁▂▄▃▂▁▁▁▁▁▂▂▂▂▁▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁
train_loss,▁█▁▂▄▂▁▂▃▂▁▁▂▂▁▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.15927
test_loss,0.04264
train_accuracy,0.24625
train_loss,0.08195


[34m[1mwandb[0m: Agent Starting Run: quv4spgp with config:
[34m[1mwandb[0m: 	epochs: 50
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	momentum: 0.85


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▁▁▄▇█▆▄▃▃▅▆▇▆▅▄▃▅▆▆▆▅▄▄▄▅▆▆▅▄▄▅▅▆▅▅▅▅▅▅▅
test_loss,▁▁▄▇█▆▄▃▃▅▆▇▆▅▄▃▅▆▆▆▅▄▄▄▅▆▆▅▄▄▄▅▅▅▅▅▅▅▅▅
train_accuracy,▁▇█▅▂▂▃▅▆▆▄▃▃▄▄▅▅▅▄▃▄▄▅▅▄▄▄▄▄▅▅▅▄▄▄▄▅▅▄▄
train_loss,█▅▇▂▅▄▁▁▃▂▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14989
test_loss,0.03776
train_accuracy,0.24839
train_loss,0.0812


[34m[1mwandb[0m: Agent Starting Run: 0ydgmyh5 with config:
[34m[1mwandb[0m: 	epochs: 50
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	momentum: 0.9


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▁▃▆█▆▄▃▂▃▄▅▆▆▅▅▄▃▃▄▄▅▅▅▅▄▄▄▄▅▅▅▅▄▄▄▄▅▅▅▄
test_loss,▁▃▇█▆▄▃▂▃▄▅▆▆▅▅▄▃▄▄▅▅▅▅▅▄▄▄▄▅▅▅▅▄▄▄▄▅▅▅▅
train_accuracy,▃█▄▂▁▂▃▅▅▄▃▃▂▂▂▃▄▄▄▄▃▂▂▃▃▄▄▄▃▃▃▃▃▃▃▃▃▃▃▃
train_loss,▁█▁▃▄▂▁▂▃▂▁▁▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14965
test_loss,0.03764
train_accuracy,0.24832
train_loss,0.0812


[34m[1mwandb[0m: Agent Starting Run: ojczo6ek with config:
[34m[1mwandb[0m: 	epochs: 50
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	momentum: 0.95


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▁▃▆█▆▄▃▂▃▄▅▆▆▅▅▄▃▃▄▄▅▅▅▅▄▄▄▄▅▅▅▅▄▄▄▄▄▅▅▅
test_loss,▁▃▇█▆▄▃▂▃▄▅▆▆▅▅▄▃▃▄▄▅▅▅▅▄▄▄▄▅▅▅▅▄▄▄▄▅▅▅▅
train_accuracy,▃█▄▂▁▂▃▅▅▅▄▃▂▂▂▃▄▄▄▄▃▂▂▂▃▄▄▄▃▃▃▃▃▃▃▃▃▃▃▃
train_loss,▁█▁▃▄▂▁▂▃▂▁▁▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14991
test_loss,0.03777
train_accuracy,0.24829
train_loss,0.0812


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 5i94wa8y with config:
[34m[1mwandb[0m: 	epochs: 50
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	momentum: 0.85


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▁▃▆█▆▄▃▃▅▆▇▇▅▄▄▄▆▆▆▆▄▄▄▅▆▆▅▅▅▅▅▅▆▅▅▅▅▅▅▅
test_loss,▁▃▆█▆▄▃▃▅▆▇▇▅▄▄▄▆▆▆▆▅▄▄▅▆▆▅▅▅▅▅▅▆▅▅▅▅▅▅▅
train_accuracy,▁█▆▃▁▃▅▆▅▄▂▂▃▄▅▅▄▃▃▃▄▄▅▄▃▃▃▃▄▄▄▄▃▃▄▄▄▄▃▃
train_loss,▄█▃▁▄▁▁▃▂▁▂▂▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.1493
test_loss,0.03746
train_accuracy,0.24853
train_loss,0.08119


[34m[1mwandb[0m: Agent Starting Run: tmwwtzhg with config:
[34m[1mwandb[0m: 	epochs: 50
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	momentum: 0.9


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▁▄▇█▆▅▄▃▃▄▅▆▆▆▅▅▄▄▄▅▆▆▆▅▅▄▄▄▅▅▅▅▅▅▅▅▅▅▅▅
test_loss,▁▄▇█▆▅▄▃▃▄▅▆▆▆▅▅▄▄▄▅▆▆▆▅▅▄▄▄▅▅▅▅▅▅▅▅▅▅▅▅
train_accuracy,▄█▅▂▁▂▄▅▆▅▄▃▂▂▃▃▅▅▅▄▃▃▃▃▄▄▄▄▄▄▃▃▃▄▄▄▄▄▄▃
train_loss,▁█▁▃▄▂▁▂▃▂▁▁▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14936
test_loss,0.03749
train_accuracy,0.24851
train_loss,0.08119


[34m[1mwandb[0m: Agent Starting Run: 7auxvxd0 with config:
[34m[1mwandb[0m: 	epochs: 50
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	momentum: 0.95


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▁▄▇█▆▅▃▃▄▅▆▆▆▆▅▄▄▄▅▅▆▆▆▅▄▄▄▅▅▅▅▅▅▅▅▅▅▅▅▅
test_loss,▁▄▇█▆▅▃▃▄▅▆▆▆▆▅▄▄▄▅▅▆▆▆▅▄▄▄▅▅▅▅▅▅▅▅▅▅▅▅▅
train_accuracy,▃█▅▂▁▂▄▅▆▅▄▃▂▂▃▄▅▅▅▄▃▃▃▃▄▄▄▄▄▃▃▃▄▄▄▄▄▄▃▃
train_loss,▁█▁▃▄▂▁▂▃▂▁▁▂▂▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.1493
test_loss,0.03746
train_accuracy,0.24852
train_loss,0.08119


[34m[1mwandb[0m: Agent Starting Run: cqj4c77j with config:
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	momentum: 0.85


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▇█▆▁▁▃▆▃▁▁▄▄▁▁▂▃▂▁▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂
test_loss,▇█▆▁▁▃▆▃▁▁▄▄▂▁▂▃▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂
train_accuracy,▁▄▆▂█▂▂▁▃▄▁▁▁▃▂▁▁▂▂▁▁▂▂▁▁▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁
train_loss,▁▂█▁▅▁▃▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14838
test_loss,0.03699
train_accuracy,0.24872
train_loss,0.0812


[34m[1mwandb[0m: Agent Starting Run: ff08wh7n with config:
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	momentum: 0.9


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,█▁▁▃▃▁▁▁▃▂▁▁▂▂▁▁▁▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
test_loss,█▁▁▃▃▁▁▁▂▂▁▁▂▂▁▁▁▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_accuracy,▁▁█▁▃▁▄▃▁▁▂▃▁▁▁▂▂▁▁▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train_loss,▁▂█▁▅▁▃▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.15015
test_loss,0.0379
train_accuracy,0.24834
train_loss,0.0812


[34m[1mwandb[0m: Agent Starting Run: qxhn2duf with config:
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	learning_rate: 0.1
[34m[1mwandb[0m: 	momentum: 0.95


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▇█▆▁▁▃▆▂▁▁▄▄▁▁▂▃▂▁▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂
test_loss,▇█▆▁▁▃▆▃▁▁▄▄▂▁▂▃▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂
train_accuracy,▁▄▆▂█▂▂▁▃▄▁▁▁▃▂▁▁▂▂▁▁▂▂▁▁▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁
train_loss,▁▂█▁▅▁▃▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14842
test_loss,0.03701
train_accuracy,0.24874
train_loss,0.0812


[34m[1mwandb[0m: Agent Starting Run: 5xwzbfl1 with config:
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	momentum: 0.85


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,█▁▂▅▄▂▂▃▄▃▂▂▄▃▂▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
test_loss,█▁▂▅▄▂▂▃▄▃▂▂▄▃▂▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
train_accuracy,▄▂█▃▁▃▆▅▂▂▅▅▃▃▄▄▄▃▄▄▄▃▄▄▄▃▄▄▃▄▄▄▄▄▄▄▄▄▄▄
train_loss,▁▂█▁▅▁▄▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14924
test_loss,0.03743
train_accuracy,0.24854
train_loss,0.08119


[34m[1mwandb[0m: Agent Starting Run: uu2paxig with config:
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	momentum: 0.9


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,█▁▁▄▅▂▁▂▄▄▂▂▃▃▃▂▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
test_loss,█▁▁▄▅▂▁▂▄▄▂▂▃▃▃▂▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
train_accuracy,▄▂█▃▁▃▆▅▂▂▄▅▄▃▃▄▄▃▃▄▄▃▃▄▄▃▄▄▄▃▄▄▄▄▄▄▄▄▄▄
train_loss,▁▂█▁▅▁▃▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14937
test_loss,0.0375
train_accuracy,0.24852
train_loss,0.08119


[34m[1mwandb[0m: Agent Starting Run: 7eq5qoan with config:
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	learning_rate: 0.01
[34m[1mwandb[0m: 	momentum: 0.95


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▁█▇▄▃▆▇▆▄▄▆▆▅▅▅▆▅▅▅▆▅▅▅▆▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅
test_loss,▁█▇▄▄▆▇▆▄▅▆▆▅▅▆▆▅▅▅▆▅▅▅▆▅▅▅▆▅▅▅▅▅▅▅▅▅▅▅▅
train_accuracy,▄▆▁▅█▅▂▃▆▆▄▃▄▅▅▄▄▅▅▄▄▅▅▄▄▅▄▄▄▅▄▄▅▅▄▄▅▄▄▄
train_loss,▁▂█▁▅▁▃▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.1492
test_loss,0.03741
train_accuracy,0.24856
train_loss,0.08119


[34m[1mwandb[0m: Agent Starting Run: oukqwxpz with config:
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	momentum: 0.85


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,█▁▂▅▅▂▂▃▅▄▂▃▄▄▃▃▄▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
test_loss,█▁▂▅▅▂▂▃▅▄▂▃▄▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
train_accuracy,▄▂█▄▁▄▆▅▂▃▅▅▄▃▄▅▄▃▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
train_loss,▁▂█▁▅▁▄▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14928
test_loss,0.03745
train_accuracy,0.24854
train_loss,0.08119


[34m[1mwandb[0m: Agent Starting Run: 9ahyicpc with config:
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	momentum: 0.9


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,█▁▁▅▅▃▁▃▅▄▂▂▄▄▃▃▃▄▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
test_loss,█▁▁▅▅▃▁▃▅▄▂▂▄▄▃▃▃▄▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃
train_accuracy,▄▃█▄▁▃▆▆▃▂▅▅▄▃▄▅▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
train_loss,▁▂█▁▅▁▃▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14929
test_loss,0.03745
train_accuracy,0.24854
train_loss,0.08119


[34m[1mwandb[0m: Agent Starting Run: 8h4nkdin with config:
[34m[1mwandb[0m: 	epochs: 100
[34m[1mwandb[0m: 	learning_rate: 0.001
[34m[1mwandb[0m: 	momentum: 0.95


VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
test_accuracy,▁█▇▄▄▆█▆▄▅▇▇▅▅▆▆▆▅▆▆▆▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆
test_loss,▁█▇▄▄▆█▆▄▅▇▇▅▅▆▆▆▅▆▆▆▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆
train_accuracy,▅▆▁▅█▆▃▃▆▆▄▄▅▆▅▄▄▅▅▄▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅
train_loss,▁▂█▁▅▁▃▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
test_accuracy,0.14927
test_loss,0.03745
train_accuracy,0.24854
train_loss,0.08119


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Sweep Agent: Exiting.


Similar to the case with the previous WandB grid search, we did not improve our predictions as our baseline was already pretty good.