# Simple Model Exploration

This is not mean to be a rigurous training regiment, but rather a simple exploration of how to build a model, and plug in everything. More Complicated model training will come later

In [1]:
import torch

import pandas as pd

import pyro
import pyro.contrib.gp as gp

from utils import partition_df

In [2]:
DEVICE=torch.device('cpu') #torch.device("cuda" if torch.cuda.is_available() else "cpu")
DTYPE=torch.float32

pyro.set_rng_seed(0)

## Load Data 

Load data that was process by the `data_explore.ipynb`. This created a snappy.parquet file that stored all the data.


In [3]:
marine_data = '/home/squirt/Documents/data/ncei/marine/marine_data/marine_climate_data.snappy.parquet'
marine_df = pd.read_parquet(marine_data)

Originally wanted to subsample data, but just going to load all of it

In [4]:
train_df, test_df = partition_df(marine_df)

Make training data

In [5]:
X = train_df[['WindSpeed','WetTemp','SeaTemp','CloudAmount']].values
y = train_df[['AirTemp']].values

X = torch.tensor(X).to(DEVICE, dtype=DTYPE)
y = torch.tensor(y).to(DEVICE, dtype=DTYPE)

Make eval data

In [6]:
X_eval = test_df[['WindSpeed','WetTemp','SeaTemp','CloudAmount']].values
y_eval = test_df[['AirTemp']].values

X_eval = torch.tensor(X_eval).to(DEVICE, dtype=DTYPE)
y_eval = torch.tensor(y_eval).to(DEVICE, dtype=DTYPE)

In [7]:
print(f'X shape: {X.shape}\ty shape: {y.shape}')

X shape: torch.Size([34435, 4])	y shape: torch.Size([34435, 1])


## Gaussian Model

Trying to dig deeper into gaussian models, lets try pyro. [Link](https://pyro.ai/examples/gp.html) 

At some point create one from scratch?

Create search space for hyperparameters. Going to search from 0.001 to 100 taking log steps. No idea what our data will look like

In [8]:
variance_search_space = torch.logspace(-3, 2, steps=10)  
lengthscale_search_space = torch.logspace(-3, 2, steps=10)  

In [9]:
loss_fn = pyro.infer.Trace_ELBO().differentiable_loss
results = {}
y = y.squeeze(-1)
for variance in variance_search_space:
    for lengthscale in lengthscale_search_space:
        # Set the hyperparameters
        kernel = gp.kernels.RBF(input_dim=4, variance=variance, lengthscale=lengthscale)
        gpr = gp.models.GPRegression(X, y, kernel, noise=torch.tensor(0.1), jitter=1.0e-4)
        gpr = gpr.to(DEVICE, dtype=DTYPE)

        optimizer = torch.optim.Adam(gpr.parameters(), lr=0.005)
        losses = []
        for i in range(30):
            optimizer.zero_grad()
            loss = loss_fn(gpr.model, gpr.guide)
            loss.backward()
            optimizer.step()
            losses.append(loss.item())

        results[(variance, lengthscale)] = losses
        

### Make Training Plots