# Simple Model Exploration

This is not mean to be a rigurous training regiment, but rather a simple exploration of how to build a model, and plug in everything. More Complicated model training will come later

In [1]:
import torch
import gpytorch
from gpytorch.models import ExactGP
from gpytorch.means import ConstantMean
from gpytorch.kernels import RBFKernel
from gpytorch.distributions import MultivariateNormal

from utils import partition_data

In [2]:
DEVICE=torch.device("cuda" if torch.cuda.is_available() else "cpu")
DTYPE=torch.float32

## Make sure DataLoader Works

Load data that was process by the `data_explore.ipynb`. This created a snappy.parquet file that stored all the data.


In [3]:
marine_data = '/home/squirt/Documents/data/ncei/marine/marine_data/marine_climate_data.snappy.parquet'

Split into train and test data

In [4]:
train_dl, test_dl = partition_data(marine_data, 0.3)

In [5]:
print(len(train_dl))
print(len(test_dl))

34435
14757


We get some samples, thats good enough, maybe clean up null values to 0? 

## Build Gaussian Model

Maybe use [Link](https://github.com/cornellius-gp/gpytorch/blob/master/examples/00_Basic_Usage/Implementing_a_custom_Kernel.ipynb) as an example?

I just used ChatGPT:
<div style="border:1px solid black; padding:10px;">
   <p>I want to use GPytorch to build a gaussian process that takes in a vector of values and then predicts a single value, how do I define this model?
</div>

Define Model

In [6]:
class GaussianMarineModel(gpytorch.models.ExactGP):
    def __init__(self, likelihood):
        # Initialize with None data - data will be set later
        super(GaussianMarineModel, self).__init__(None, None, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.RBFKernel()

    def set_train_data(self, train_x, train_y):
        self.train_inputs = (train_x,) 
        self.train_targets = train_y 

    def forward(self, x, print_shape=False):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        if print_shape:
            print(x.shape)
            print(mean_x.shape)
            print(covar_x.shape)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

Init Model

In [7]:
likelihood = gpytorch.likelihoods.GaussianLikelihood()
likelihood = likelihood.to(DEVICE, DTYPE)

model = GaussianMarineModel(likelihood=likelihood)
model = model.to(DEVICE, DTYPE)

mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
mll = mll.to(DEVICE, DTYPE)

Define Optimizer

In [8]:
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

## Try Training Model

In [9]:
num_epochs = 20

Train Model 

In [10]:
# Training loop
model.train()
likelihood.train()

for epoch in range(num_epochs):
    total_loss = 0
    for batch_x, batch_y in train_dl:
        batch_x = batch_x.to(DEVICE, DTYPE)
        batch_y = batch_y.to(DEVICE, DTYPE)

        model.set_train_data(batch_x, batch_y)
        optimizer.zero_grad()
        
        # This is a key part: call your model on a batch
        output = model(batch_x, print_shape=False)
        loss = -mll(output, batch_y).sum()
        total_loss += loss.item()
        loss.backward()
        
        optimizer.step()
    print(f"Epoch {epoch+1}/{num_epochs} - Mean Loss: {total_loss/(1. * len(train_dl))}")

Epoch 1/20 - Mean Loss: 11240.061572527951
Epoch 2/20 - Mean Loss: 11196.666520255554
Epoch 3/20 - Mean Loss: 11152.942522143168
Epoch 4/20 - Mean Loss: 11105.075686075214
Epoch 5/20 - Mean Loss: 11058.374851168868
Epoch 6/20 - Mean Loss: 11012.612603455786
Epoch 7/20 - Mean Loss: 10966.56452374038
Epoch 8/20 - Mean Loss: 10921.574132423406
Epoch 9/20 - Mean Loss: 10877.908327283287
Epoch 10/20 - Mean Loss: 10830.594954261653
Epoch 11/20 - Mean Loss: 10784.598943661971
Epoch 12/20 - Mean Loss: 10740.36915928561
Epoch 13/20 - Mean Loss: 10696.70766298824
Epoch 14/20 - Mean Loss: 10650.688086249456
Epoch 15/20 - Mean Loss: 10607.041672716712
Epoch 16/20 - Mean Loss: 10561.149048932772
Epoch 17/20 - Mean Loss: 10519.023566139103
Epoch 18/20 - Mean Loss: 10474.184536082474
Epoch 19/20 - Mean Loss: 10426.64941193553
Epoch 20/20 - Mean Loss: 10386.227697110498


## Eval

Did we learn anything?

In [11]:
# Switch model and likelihood to evaluation mode
model.eval()
likelihood.eval()

with torch.no_grad():
    total_loss = 0
    model.set_train_data(None, None) # We don't need to track gradients here

    for batch_x, batch_y in test_dl:  
        batch_x = batch_x.to(DEVICE, DTYPE)
        batch_y = batch_y.to(DEVICE, DTYPE)

        # This is a key part: call your model on a batch
        output = model(batch_x)
        loss = -mll(output, batch_y).sum()
        total_loss += loss.item()

    print(f"Validation Loss: {total_loss/(1. * len(test_dl))}")


Validation Loss: 10458.279240360507


I guess so?  I do think shapes are messed up.