<a href="https://colab.research.google.com/github/mariomorvan/nam21-astro-ts-physics-dl/blob/main/NAM2021_workshop_astro_ts_physics_dl.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Modelling astrophysical time series with physics-based deep learning
---
### NAM 2021: *Machine Learning Methods for Research in Astrophysics*
- *author*: Mario Morvan
- *collaborators*: Nikolaos Nikolaou, Angelos Tsiaras, Ingo Waldmann, Gordon Yip, ... & UCL exo group
- *contact*: mario.morvan.18@ucl.ac.uk 

To run this notebook with a GPU, go to **Runime** > **change runtime type** and select **GPU**


In [None]:
# Imports
import numbers
import tqdm
import numpy as np
import matplotlib.pylab as plt
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import DataLoader

# device-agostic notebook
if torch.cuda.is_available():
  device = 'cuda'
  print('Device name used:', torch.cuda.get_device_name())
else:
  device = 'cpu'

In [None]:
# Matplotlib default params 
plt.rcParams["figure.figsize"] = (12, 6)
plt.rcParams["font.size"] = 14

## I) Modelling Astronomical Time Series with RNN

This section aims at presenting and experimenting with simple tool: an LSTM architecture slightly tweaked for imputing missing values in dataset of time series, opening various applications for datasets of astronomical time series. 

Let's first define a dummy dataset made of the (additive) composition of a random walk and a sine with random offset and period.

In [None]:
class DummyDataset(torch.utils.data.Dataset):
  """A simple torch dataset combining random walk and sine processes"""
  def __init__(self, seq_length, size=100, seed=None):
    """Define a DummyDataset object
    
    Args:
      seq_length: int
        lenght of the time series to generate
      size: int
        number of samples for the dataset
      seed: int
        manual seed to define for reproducibility (default None)
    """
    super().__init__()
    self.seq_length = seq_length
    self.size = size
    if seed is not None:
      torch.manual_seed(seed)
    
  def __len__(self):
    return self.size
    
  def __getitem__(self, index):
    sine = torch.sin(torch.linspace(0, 50 * torch.rand(1).item(), self.seq_length) 
                      + torch.rand(1).item()*np.pi)
    random_walk = (torch.rand(self.seq_length) - 0.5).cumsum(0)/3
    gaussian_noise = 0  # torch.randn(self.seq_length) / 10
    out = (sine + random_walk + gaussian_noise).unsqueeze(-1)
    #return (out - out.mean(0, keepdims=True)) / 2 / (out.std(0, keepdims=True)[0] - out.min(0, keepdims=True)[0])
    return (out - out[:1].repeat(self.seq_length, 1)) / 2 / (out.max(0, keepdims=True)[0] - out.min(0, keepdims=True)[0]) + 0.5

    return (out - out[:1].repeat(self.seq_length, 1)) / 2 / (out.max(0, keepdims=True)[0] - out.min(0, keepdims=True)[0])

item = DummyDataset(300, size=100)[0]     
plt.figure()   
plt.plot(item)
plt.xlabel('time index')
plt.ylabel('value')
plt.title('Random sample from our DummyDataset')

In [None]:
# Create train a test dummy datasets
seq_length = 200
batch_size = 64

dataset = DummyDataset(seq_length, size=256, seed=0)
dataset_test = DummyDataset(seq_length, size=64, seed=1)

loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
loader_test = DataLoader(dataset_test, batch_size=len(dataset_test), shuffle=False)
batch_test = next(iter(loader_test)).to(device)

#### Simple forecasting LSTM

Now let's teach first a basic stacked LSTM to forecast the next step in this dataset

In [None]:
hidden_size = 32
num_layers = 2

model = nn.LSTM(input_size=1, hidden_size=hidden_size, num_layers=num_layers, 
                batch_first=True)


model = model.to(device)
optimiser = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = lambda y, pred: F.mse_loss(y, pred)

def train_forecaster(model, loader, optimiser, criterion, epochs=1, device=None):
  """Train a basic forecasting pytorch RNN"""
  model.train()
  losses = []

  for epoch in tqdm.tqdm(range(1, 1+epochs)):
    epoch_loss = 0
    for x in loader:
      x = x.to(device)
      optimiser.zero_grad()
      with torch.enable_grad():
        pred, _ = model(x)
        loss = criterion(x[:, 1:], pred[:, :-1])   # x_{t+1} ~ f(x_t)
      loss.backward()
      optimiser.step()
      epoch_loss += loss.item()
    losses.append(epoch_loss / len(loader))
  return losses  


In [None]:
losses = train_forecaster(model, loader, optimiser, criterion, epochs=200, 
                          device=device)
plt.plot(losses)
plt.yscale('log')

In [None]:
model.eval()
pred, (h_n, c_n) = model(batch_test)

i = np.random.randint(len(pred))
plt.title('Prediction example on test set')
plt.xlabel('Time')
plt.plot(batch_test[i,1:,0].detach().cpu().T)
plt.plot(pred[i,:-1,0].detach().cpu().T)
plt.show()

plt.scatter(batch_test[:, 1:,0].cpu().detach(), pred[:,:-1,0].detach().cpu(), s=5)
plt.plot([batch_test.min().item(), batch_test.max().item()], 
         [batch_test.min().item(), batch_test.max().item()], color='red',)
plt.ylabel('Test predictions')
plt.xlabel('Test targets')
plt.show()

#### Applications:


- forecasting
- anomaly detection
- encoding latent representation

#### Improvements and ideas to explore together:

- loss
- window predictions
- dropout
- visualisation of latent representation with TSNE

### Imputing LSTM

In [None]:
class LSTMI(nn.Module):
    def __init__(self, input_size, hidden_size, output_size=None, num_layers=1, dropout=0.):
        """Define an LSTM Imputer network


        Args:
          input_size: dimensionality of the input sequences
          hidden_size: Number of units for the LSTM cells
          output_size: dimensionality of the output sequences.
                       If default (None) will be set as input_size.
          num_layers: number of LSTM layers
          dropout: dropout rate to apply after all-but-last layers
        Returns:
          pytorch module 

        """
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size if output_size is not None else input_size
        self.num_layers = num_layers
        self.dropout = float(dropout)

        self.lstm_cells = nn.ModuleList([nn.LSTMCell(input_size=self.input_size, hidden_size=self.hidden_size)])
        self.lstm_cells.extend([nn.LSTMCell(input_size=self.hidden_size, hidden_size=self.hidden_size)
                                for _ in range(self.num_layers - 1)])
        if dropout != 0:
            self.dropout_layer = nn.Dropout(self.dropout)

        self.fc = nn.Linear(self.hidden_size, self.output_size)

        self.h_t = None
        self.c_t = None
        self.out_t = None

    def init_state(self, batch_size, device=None, dtype=None):
      """Initialise the network's states"""
      self.h_t = [torch.randn(batch_size, self.hidden_size, device=device, dtype=dtype)
                  for _ in range(self.num_layers)]
      self.c_t = [torch.randn(batch_size, self.hidden_size, device=device, dtype=dtype)
                  for _ in range(self.num_layers)]
      self.out_t = torch.randn(batch_size, self.output_size, device=device, dtype=dtype)

    def init_state_like(self, x):
      """Initialise the network's on the model of a given input"""
      self.init_state(len(x), x.device, x.dtype)

    def forward(self, x, z=None, m=None, mask_nans=True):
      """Performs a forward pass

      :param x: Input vector
      :param m: Imputation mask - 1/True for keeping input and 0/False for forcing dynamic imputation
      :return: Output vector with imputed values
      """

      # Checking - casting inputs
      if m is not None:
        if x.shape != m.shape:
          mess = f'x and m must have the same shape ({x.shape} != {m.shape})'
          raise RuntimeError(mess)
      else:
        m = torch.ones_like(x)

      if mask_nans:
        m *= (x == x)

      if len(x.shape) == 2:
        warnings.warn('input has only 2 dims')
        x = x.unsqueeze(-1)
        m = m.unsqueeze(-1)
      elif len(x.shape) != 3:
        print(x.shape)
        raise ValueError('wrong shape')

      batch_size, len_seq, n_dim = x.shape

      if z is not None:
        zdim = z.shape[-1]
      else:
        zdim = 0
      if n_dim + zdim != self.input_size:
        mess = f'input (dim={n_dim}) and covariate (dim={zdim}) dimensions must' \
                + f'add to input_size param ({self.input_size})'
        raise RuntimeError(mess)

      m = m.type(x.dtype)
      inverted_m = (torch.ones_like(m, device=x.device) - m)

      # Preparing input
      if z is not None:
        input_masked = torch.cat((x * m, z), dim=-1)
      else:
        input_masked = x * m
      outputs = []

      for t in range(len_seq):
        pred_t = self.out_t
        input_t = input_masked[:, t] + F.pad(pred_t * inverted_m[:, t], (0, zdim))
        for k, lstm_cell in enumerate(self.lstm_cells):
          if k == 0:
            self.h_t[0], self.c_t[0] = lstm_cell(input_t, (self.h_t[0], self.c_t[0]))
          else:
            self.h_t[k], self.c_t[k] = lstm_cell(self.h_t[k - 1], (self.h_t[k], self.c_t[k]))
          if self.dropout != 0 and k < self.num_layers - 1:
            self.h_t[k] = self.dropout_layer(self.h_t[k])
            self.c_t[k] = self.dropout_layer(self.c_t[k])
        self.out_t = self.fc(self.h_t[-1])
        outputs.append(self.out_t.unsqueeze(1))
      return torch.cat(outputs, 1)

def train_lstmi_batch(model, loader, optimiser, criterion, epochs=1, device=None):
  """Train a LSTMI module"""
  model.train()
  losses = []

  for epoch in tqdm.tqdm(range(1, 1+epochs)):
    epoch_loss = 0
    for x in loader:
      optimiser.zero_grad()
      x = x.to(device)
      model.init_state_like(x)
      with torch.enable_grad():
        pred = model(x)
        loss = criterion(x[:, 1:], pred[:, :-1])   # x_{t+1} ~ f(x_t)
      loss.backward()
      optimiser.step()
      epoch_loss += loss.item()
    losses.append(epoch_loss / len(loader))
  return losses  

In [None]:
model = LSTMI(1, hidden_size=hidden_size, num_layers=num_layers).to(device)
optimiser = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = lambda y, pred: F.mse_loss(y, pred)
losses = train_lstmi_batch(model, loader, optimiser, criterion, epochs=200, device=device)

In [None]:
# learning curve
plt.plot(losses)
plt.yscale('log')
plt.title('learning curve')
plt.xlabel('Training loss')
plt.ylabel('MSE')
plt.show()

In [None]:
# Check of predictions bias
model.eval()
model.init_state_like(batch_test.to(device))
pred = model(batch_test.to(device)).cpu()

i = np.random.randint(len(pred))
plt.plot(batch_test[i,1:,0].detach().cpu().T, label='data')
plt.plot(pred[i,:-1,0].detach().T, label='predictions')
plt.scatter(np.where((batch_test[i,:-1,0]==0).detach().to('cpu'))[0],
            pred[i,:-1,0][batch_test[i,:-1,0]==0].detach(), color='red',
            label='imputed')
plt.legend()

plt.show()

plt.scatter(batch_test[:, 1:,0].detach().cpu(), pred[:,:-1,0].detach().cpu(), s=5)
plt.plot([batch_test.min().item(), batch_test.max().item()], 
         [batch_test.min().item(), batch_test.max().item()], color='red')
plt.ylabel('Test predictions')
plt.xlabel('Test targets')
plt.show()
plt.show()

### Applications
- Anomaly detections
- Imputing
- Modelling and fiting gaps
- More flexible learning (include in the training loss)


### Suggestions:
- Gaussian Loss 
- Gap imputing metric
- See improvements with growing dataset

## II) Embed differentiable physics model in DL framework



Why hard-coding the physics model in a DL framework? 
- Computational efficiency, with automatic differenciation and GPU acceleration
- Combine with NNs

Physics model requirements:
- implemented in DL framework (here Pytorch)
- vectorised to process batches of samples

DL & Differential Programming
- Tensorflow and GradientApe API 
- Pytorch and autograd 
- Julia & $\partial{P}$

Let's define a simple flare model in Pytorch: 

In [None]:
# Define your physics model and dataset class

def compute_flare(time, a_0, tau_g, tau_e, t_0=50):
  """Compute a flare model following https://academic.oup.com/mnras/article/445/3/2268/2907951

  Args:
    time: tensor
    a_0: tensor
    tau_g: tensor
    tau_e: tensor
    t_0: float or tensor
  return:
    flare intensity as a function of time
  """
  batch_size = len(a_0)
  time = time[None,:].repeat(batch_size, 1)
  a_0 = a_0.reshape(batch_size,1)
  tau_g = tau_g.reshape(batch_size,1)
  tau_e = tau_e.reshape(batch_size,1)
  if isinstance(t_0, torch.Tensor):
    t_0 = t_0.reshape(batch_size,1)

  return a_0 * torch.exp(-((time - t_0) * ((time < t_0) / 2*tau_g 
                                           + (time > t_0) / tau_e))**2)


class PhysicsDataset():
  """torch dataset subclass for simulated physics time series"""
  def __init__(self, physics_model, bounds, target_params, 
               seq_length=200, size=100, noise_level=1e-4, seed=0):
    """Create an instance of PhysicsDataset clsas

    Args:
      physics_model: function
        function taking time tensor as first arg and returning a torch time series
      bounds: dict
        dictionnary of uniform bounds for physics_model params
      target_params: list
        list of params to produce as outputs
      seq_length: int
        time series length (default 200)
      size: int
        size of the dataset (default 100)
      noise_level: double
        gassian standard deviation for the additive noise (default 1e-4)
      seed: int
        random seed (default 0)
    """
    super().__init__()
    self.physics_model = physics_model
    self.bounds = bounds
    self.target_params = target_params
    self.seq_length = seq_length
    self.size = size
    self.noise_level = noise_level
    if seed is not None:
        torch.manual_seed(seed)
    
    self.params = dict()
    self._sample_priors()
    self.time = torch.linspace(0, 100, self.seq_length)
    self.noise = torch.randn(self.size, self.seq_length) * self.noise_level
    self.physics = self.physics_model(self.time, **self.params)

  def _sample_priors(self):
    """Sample parameters inside object's defined bounds with uniform distributions"""
    for par in self.bounds:
      self.params[par] = torch.rand(self.size) * (self.bounds[par][1] - self.bounds[par][0]) + self.bounds[par][0]

  def __len__(self):
      return self.size
  
  def __getitem__(self, index):
    x = self.physics[index] + self.noise[index]
    target = torch.tensor([self.params[par][index] for par in self.target_params])
    return x, target #, additional_params


In [None]:
# define example and plot ssamples

dataset = PhysicsDataset(compute_flare, 
                         bounds={'a_0': [1, 5], 'tau_g':[4, 6], 'tau_e':[1, 5]}, 
                         target_params=['a_0', 'tau_g', 'tau_e'], 
                         noise_level=0.05)

plt.plot(compute_flare(dataset.time, **dataset.params).detach().cpu().T)
plt.xlabel('time')
pass

In [None]:
# Instantiating datasets and loaders

device = 'cpu'  # cpu can sometimes be faster than cuda

# Datasets parameters
target_params = ['a_0', 'tau_g', 'tau_e']
bounds_train = {'a_0': [1, 2], 'tau_g':[4, 6], 'tau_e':[2, 5]}
bounds_val_2 = {'a_0': [0, 3], 'tau_g':[3, 8], 'tau_e':[1, 6]}  # extending the bounds
noise_train = 0.01

# General params
seq_length = 100
size_train = 256  # limited data scenario
batch_size = 64
size_val = 512
size_test = 1024

# Train dataset
dataset_train = PhysicsDataset(compute_flare, bounds_train, target_params, 
                               seq_length=seq_length, size=size_train, 
                               noise_level=noise_train, seed=0)

loader_train = DataLoader(dataset_train, batch_size=batch_size, shuffle=True)

# Val dataset 1 (same parameter space as training)
dataset_val_1 = PhysicsDataset(compute_flare, bounds_train, target_params, 
                                seq_length=seq_length, size=size_val, 
                                noise_level=noise_train, seed=1)
loader_val_1 = DataLoader(dataset_val_1, batch_size=len(dataset_val_1))
batch_val_1 = next(iter(loader_val_1))

# Val dataset 2 (larger parameter space)
dataset_val_2 = PhysicsDataset(compute_flare, bounds_val_2, target_params, 
                                seq_length=seq_length, size=size_val, 
                                noise_level=noise_train, seed=2)
loader_val_2 = DataLoader(dataset_val_2, batch_size=len(dataset_val_2))
batch_val_2 = next(iter(loader_val_2))

# Test dataset 1 (same parameter space as training)
dataset_test_1 = PhysicsDataset(compute_flare, bounds_train, target_params, 
                                seq_length=seq_length, size=size_test, 
                                noise_level=noise_train, seed=3)
loader_test_1 = DataLoader(dataset_test_1, batch_size=len(dataset_test_1))
x_test_1, target_test_1 = next(iter(loader_test_1))

# Test dataset 2 (larger parameter space)
dataset_test_2 = PhysicsDataset(compute_flare, bounds_val_2, target_params, 
                                seq_length=seq_length, size=size_test, 
                                noise_level=noise_train, seed=4)

loader_test_2 = DataLoader(dataset_test_2, batch_size=len(dataset_test_2))
x_test_2, target_test_2 = next(iter(loader_test_2))

Let's define a simple MLP network to solve the inverse problem:

In [None]:
# A simple network to solve the inverse problem data -> params

class Network(nn.Module):
  """Define a simple MLP in pytorch"""
  def __init__(self, seq_length, out_dim):
    super().__init__()
    self.fc1 = nn.Linear(seq_length, 512)
    self.fc2 = nn.Linear(512, 128)
    self.fc3 = nn.Linear(128, 16)
    self.fc4 = nn.Linear(16, out_dim)
      
  def forward(self, x):
    out = F.relu(self.fc1(x))
    out = F.relu(self.fc2(out))
    out = F.relu(self.fc3(out))
    out = self.fc4(out)
    return out


def train_network(model, loader, optimiser, criterion,
                  epochs=1, batch_val={}, metric_val=None, eval_epochs=[]):
  """Train a pytorch module"""
  losses = {'train':[]}
  losses.update({name: [] for name in batch_val})

  for epoch in tqdm.tqdm(range(1, 1+epochs)):
    model.train()
    epoch_loss = 0
    for x, target in loader:
      optimiser.zero_grad()
      x = x.to(device)
      target = target.to(device)
      with torch.enable_grad():
        pred = model(x)
        loss = criterion(x, target, pred)
      loss.backward()
      optimiser.step()
      epoch_loss += loss.item()
    losses['train'].append(epoch_loss / len(loader))
    if metric_val is not None and batch_val is not None and epoch in eval_epochs:
      model.eval()
      for name, batch in batch_val.items():
        x_val, target_val = batch[0].to(device), batch[1].to(device)
        pred_val = model(x_val)
        losses[name].append(metric_val(x_val, target_val, pred_val))
  return losses  

In [None]:
# Define two different losses

# Classic MSE regression loss
def naive_loss(x, target, pred):
  """Wrapper around pytorch mse_loss to add inputs to signature

  Args: 
    x: torch.Tensor 
      input time series of shape (batch_size, T) or (T,). (unused argument)
    target: torch.Tensor
      output targets of shape (batch_size, dim) or (dim,)
    pred: torch.Tensor
      predicted targets of shape (batch_size, dim) or (dim,)
  Return: torch.Tensor
    mean squared error value between target and predictions
  """
  return F.mse_loss(target, pred)

# Hybrid regression and physics reconstruction
def hybrid_loss(dataset, beta=1):
  """Define a hybrid regression & reconstruction loss function

  Args:
    dataset: PhysicsDataset
      torch dataset with arguments target_params, time and physics_model
    beta: 
      weight parameter between regression and reconstruction terms: 
      loss = regression + beta * reconstruction
  Return: function
    loss function associated with provided dataset and beta parameter
  """
  def loss_function(x, target, pred):
    """Compute the hybrid loss

    Args: 
    x: torch.Tensor 
      input time series of shape (batch_size, T) or (T,). (unused argument)
    target: torch.Tensor
      output targets of shape (batch_size, dim) or (dim,)
    pred: torch.Tensor
      predicted targets of shape (batch_size, dim) or (dim,)
    Return: torch.Tensor
     mean squared error value between target and predictions added to mean squared error 
    """
    pred_dict = {dataset.target_params[i]: pred[:,i] for i in range(len(dataset.target_params))}  # Iterate over feature dimension to produce dict of outputs
    reconstructed_time_series = dataset.physics_model(dataset.time.to(device), **pred_dict)
    physics_reconstruction_term = F.mse_loss(x, reconstructed_time_series)
    return naive_loss(x, target, pred) + beta * physics_reconstruction_term
  return loss_function


In [None]:
# Create 2 scenarios with identical networks but different losses
scenarios = ['naive', 'hybrid']
network = {scenario: Network(seq_length, len(target_params)).to(device) for scenario in scenarios}
optimiser = {scenario: torch.optim.Adam(network[scenario].parameters(), lr=0.001) for scenario in scenarios}
loss = {'naive': naive_loss,
        'hybrid': hybrid_loss(dataset_train,beta=0.1)}

In [None]:
# Running experiment for two scenarios
loss_history = dict()
epochs = 2000
eval_epochs = list(range(1, 1+epochs, 10))
for scenario in scenarios:
  loss_history[scenario] = train_network(network[scenario], loader_train, optimiser[scenario], loss[scenario], epochs=epochs, 
                                         batch_val={'val_1': batch_val_1, 'val_2': batch_val_2}, metric_val=naive_loss, eval_epochs=eval_epochs)

In [None]:
# Validation loss curve n°1

linestyle = {'naive': 'solid', 'hybrid': 'dotted'}
alpha = {'naive': 0.7, 'hybrid': 1}
for scenario in scenarios:
  plt.plot(eval_epochs, loss_history[scenario]['val_1'], label=f'Val 1 ({scenario})', 
           c='blue', linestyle=linestyle[scenario], alpha=alpha[scenario])
plt.legend()
plt.yscale('log')
plt.xlabel('Epochs')
plt.ylabel('MSE')

In [None]:
# Validation loss curve n°2

for scenario in scenarios:

  plt.plot(eval_epochs, loss_history[scenario]['val_2'], label=f'Val 2 ({scenario})', 
           c='green', linestyle=linestyle[scenario], alpha=alpha[scenario])
plt.legend()
plt.yscale('log')
plt.xlabel('Epochs')
plt.ylabel('MSE')

In [None]:
# Evaluation on test sets

scores_1 = dict()
scores_2 = dict()

for scenario in scenarios:
  scores_1[scenario] = naive_loss(x_test_1, network[scenario](x_test_1), target_test_1).item()
  scores_2[scenario] = naive_loss(x_test_2, network[scenario](x_test_2), target_test_2).item()
print(f'scores_1: {scores_1}')
print(f'scores_2: {scores_2}')

Pros:
- accuracy
- stability
- generalisability

Cons:
- need the physics model implemented in DL framework
- physics model suceptible of adding complexity to the network's! 
- training may require further tuning to accomodate for different loss terms

Improvements:
- hyperoptimisation for the loss weight $\beta$ 
- hyperoptim for LRs in both cases

To go further:
- use your own physics model! 
- design transfer learning and meta-learning experiments to assess generalisability
- Combine LSTM detrending simultaneously with differentiable physics model
- Gaussian constraints on param space with VAE
- SVI for gaussian inference

### Different examples of hybrid architectures

Example in physics of exoplanetary transits

![alt text](https://raw.githubusercontent.com/mariomorvan/nam21-astro-ts-physics-dl/main/hybrid_architectures_transit_NN.png)

Figure from [*PyLightcurve-torch: a transit modelling package for deep learning applications in PyTorch*](https://arxiv.org/abs/2011.02030)