# DeepSensor QuickStart

This material is from the DeepSensor QuickStart documentation from the GitHub page. 

Here we will demonstrate a simple example of training a convolutional conditional neural process (ConvCNP) to spatially interpolate random grid cells of NCEP reanalysis air temperature data over the US. First, pip install the package. In this case we will use the PyTorch backend (note: follow the PyTorch installation instructions if you want GPU support).

Feel free to experiment by changing parameters, experimenting with different datasets, and so on! You can also try out any of the other material that you find in the DeepSensor documentation:

https://alan-turing-institute.github.io/deepsensor/

## Install required package

DeepSensor is a pip installable package, which makes our job much easier

In [1]:
!pip install deepsensor

Defaulting to user installation because normal site-packages is not writeable


## Load required packages

This includes aspects of deepsensor, as well as xarray, pandas, numpy, and tqdm.

In [2]:
import deepsensor.torch
from deepsensor.data import DataProcessor, TaskLoader
from deepsensor.model import ConvNP
from deepsensor.train import Trainer

import xarray as xr
import pandas as pd
import numpy as np
from tqdm import tqdm

## Load context dataset

DeepSensor comes pre-packaged with some tutorial datasets, including an air temperature dataset that covers North America.

In [3]:
# Load raw data
ds_raw = xr.tutorial.open_dataset("air_temperature")
print(ds_raw)

<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...


## Normalize the data

Next we create an instance of the DataProcessor class, which handles the normalization of xarray and pandas data for DeepSensor models. 

https://alan-turing-institute.github.io/deepsensor/reference/data/processor.html

In [4]:
# Normalise data
data_processor = DataProcessor(x1_name="lat", x2_name="lon")
ds = data_processor(ds_raw)
print(ds)

<xarray.Dataset>
Dimensions:  (x1: 25, time: 2920, x2: 53)
Coordinates:
  * x1       (x1) float32 0.4615 0.4423 0.4231 0.4038 ... 0.03846 0.01923 0.0
  * x2       (x2) float32 0.0 0.01923 0.03846 0.05769 ... 0.9615 0.9808 1.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, x1, x2) float32 -2.456 -2.377 -2.315 ... 0.9159 0.8852
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...


## Get the task loader ready

Here we create an instance of the TaskLoader class, which provides a suite of sampling methods for generating Task objects for different kinds of predictions, such as: spatial interpolation, forecasting, downscaling, or some combination of these. Notice that the TaskLoader is defined by the context set and the target set, which consist of sets of variables with dimensions. 

https://alan-turing-institute.github.io/deepsensor/reference/data/loader.html

In [5]:
# Set up task loader
task_loader = TaskLoader(context=ds, target=ds)
task_loader

TaskLoader(1 context sets, 1 target sets)
Context variable IDs: (('air',),)
Target variable IDs: (('air',),)

Context data dimensions: (1,)
Target data dimensions: (1,)

## Creating the ConvNP model

There are a number of different possible modeling approaches. Here we are using the convolutional neural process regression model:

https://alan-turing-institute.github.io/deepsensor/reference/model/convnp.html

In [6]:
# Set up model
model = ConvNP(data_processor, task_loader)
model

dim_yc inferred from TaskLoader: (1,)
dim_yt inferred from TaskLoader: 1
dim_aux_t inferred from TaskLoader: 0
internal_density inferred from TaskLoader: 52
encoder_scales inferred from TaskLoader: [0.009615384042263031]
decoder_scale inferred from TaskLoader: 0.019230769230769232


<deepsensor.model.convnp.ConvNP at 0x149811969510>

## Generate training tasks

Our context set consists of values from 100 random grid cells in the date range indicated below. The `task_loader` that we created above can then be used to generate the individual training tasks. 

In [7]:
# Generate training tasks with up 100 grid cells as context and all grid cells
#   as targets
train_tasks = []
for date in pd.date_range("2013-01-01", "2014-11-30")[::7]:
    N_context = np.random.randint(0, 100)
    task = task_loader(date, context_sampling=N_context, target_sampling="all")
    train_tasks.append(task)
    
train_tasks

[time: Timestamp/2013-01-01 00:00:00
 ops: []
 X_c: ['ndarray/float32/(2, 29)']
 Y_c: ['ndarray/float32/(1, 29)']
 X_t: [('ndarray/float32/(1, 25)', 'ndarray/float32/(1, 53)')]
 Y_t: ['ndarray/float32/(1, 25, 53)'],
 time: Timestamp/2013-01-08 00:00:00
 ops: []
 X_c: ['ndarray/float32/(2, 5)']
 Y_c: ['ndarray/float32/(1, 5)']
 X_t: [('ndarray/float32/(1, 25)', 'ndarray/float32/(1, 53)')]
 Y_t: ['ndarray/float32/(1, 25, 53)'],
 time: Timestamp/2013-01-15 00:00:00
 ops: []
 X_c: ['ndarray/float32/(2, 60)']
 Y_c: ['ndarray/float32/(1, 60)']
 X_t: [('ndarray/float32/(1, 25)', 'ndarray/float32/(1, 53)')]
 Y_t: ['ndarray/float32/(1, 25, 53)'],
 time: Timestamp/2013-01-22 00:00:00
 ops: []
 X_c: ['ndarray/float32/(2, 54)']
 Y_c: ['ndarray/float32/(1, 54)']
 X_t: [('ndarray/float32/(1, 25)', 'ndarray/float32/(1, 53)')]
 Y_t: ['ndarray/float32/(1, 25, 53)'],
 time: Timestamp/2013-01-29 00:00:00
 ops: []
 X_c: ['ndarray/float32/(2, 80)']
 Y_c: ['ndarray/float32/(1, 80)']
 X_t: [('ndarray/float32

## Train the model 

The Trainer class will actually carry out the training of the ConvNP model, using the tasks generated above. (Note that the hyperparameter `lr` is a learning rate - I misremembered it as a regularization parameter.)

https://alan-turing-institute.github.io/deepsensor/reference/train/train.html

In [8]:
# Train model
trainer = Trainer(model, lr=5e-5)
for epoch in tqdm(range(10)):
    batch_losses = trainer(train_tasks)

100%|██████████| 10/10 [04:18<00:00, 25.81s/it]


## Make a prediction using the fitted ConvNP model

Finally, we can create a testing/prediction task to predict the air temperature on a date that was excluded from the training data. Note that the model outputs mean and standard deviation.

https://alan-turing-institute.github.io/deepsensor/reference/model/pred.html

In [9]:
# Predict on new task with 50 context points and a dense grid of target points
test_task = task_loader("2014-12-31", context_sampling=50)
pred = model.predict(test_task, X_t=ds_raw)
pred["air"]