<div style="text-align: right"> 
<a href = "https://nbviewer.jupyter.org/github/siddharthchaini/Improving-Feature-Extraction-RESSPECT/tree/main/"> 
    View online on<br>
    <img src="https://nbviewer.jupyter.org/static/img/nav_logo.svg"
         alt="nbviewer link" style="height: 30px;"/> 
</a>
</div>

# 2. GP-VAE Approach for RESSPECT
<a href = http://cosmostatistics-initiative.org/resspect/>
<img src="https://cosmostatistics-initiative.org/wp-content/uploads/2019/04/coin_desc_3.png" alt="Logo" style="width: 250px;"/>
</a>

**Authors**: Siddharth Chaini, Johann Cohen-Tanugi

This is the second Jupyter notebook (of a total of 2) describing the different feature extraction methods explored by Siddharth and Johann in June & July, 2021.

For RESSPECT, a light curve is fit to a function and then the fit parameters are then used in the RESSPECT pipeline as features for training. You can read more about this here: https://arxiv.org/pdf/2010.05941.pdf

In this notebook, we will look at a deep learning based approach for feature extraction, GP-VAE.

In [1]:
import datetime
print('Last Updated On:', datetime.datetime.now().strftime("%d %B, %Y"))

Last Updated On: 17 July, 2021


In [2]:
# Some prerequisites

import numpy as np
np.random.seed(42)
np.set_printoptions(suppress=True)
from scipy.optimize import least_squares, curve_fit
from matplotlib import pylab as plt
import pandas as pd
import glob
import time
from tqdm.notebook import tqdm
import os

import warnings
from scipy.optimize import OptimizeWarning
warnings.simplefilter("error", OptimizeWarning)

import seaborn as sns
sns.set()

### 1. About GP-VAE

Gaussian-Process Variational Autoencoders (GP-VAE ; [Fortuin et. al. 2020](https://arxiv.org/pdf/1907.04155.pdf)) is an approach developed for dimensionality reduction and data imputation. It combines Variational Autoencoders (VAEs) with Gaussian Processes (GPs) in the following way:

- Deep VAEs are used to map the original time series data with missing values into a latent space.
- A GP then utilizes latent representations to capture the temporal correlations in the time series.

A detailed description of this is available in [Fortuin et. al. (2020)](https://arxiv.org/pdf/1907.04155.pdf).

For our use case, we wish to use the latent space of the GP-VAE as features for the light curves in 6 passbands. We use code provided by the authors on [GitHub](https://github.com/ratschlab/GP-VAE)<sup>[*](#myfootnote1)</sup>.

<a name="myfootnote1">*</a> Note: The original code was based on TensorFlow 1.15 and so we made small modifications on a [fork](https://github.com/siddharthchaini/GP-VAE) using [TensorFlow's Upgrade Script](https://www.tensorflow.org/guide/upgrade) to make it compatible for TensorFlow 2+.

### 2. Data & Preprocessing

PLAsTiCC/RESSPECT data consists of light-curve data in the form of CSVs with readings at a particular timestamp, indicating the mjd of observation, the passband in which it was observed, and the flux values and flux errors.

For example:

In [3]:
df = pd.read_csv("sample_data/plasticc/130779836_uLens.csv")
df

Unnamed: 0,mjd,band,flux,fluxerr,detected_bool
0,59710.4130,i,571.036438,38.534260,1
1,59715.3319,Y,749.963562,32.526573,1
2,59722.3088,z,811.698425,33.418995,1
3,59728.4313,z,791.284973,46.716145,1
4,59729.2258,i,834.375305,28.976799,1
...,...,...,...,...,...
113,60555.9838,z,-39.881969,46.477093,0
114,60560.0459,g,14.894439,18.947685,0
115,60571.0225,Y,30.593130,50.695290,0
116,60585.9974,z,-23.471439,44.819859,0


This needs to be modified into a conventional time series before being fed to the GP-VAE as input. #ToDo.


---
The code for the above preprocessing and centering can be found on these GitHub repositories:
- https://github.com/siddharthchaini/centering-perfect-sims-resspect
- https://github.com/siddharthchaini/centering-time-series-plasticc

**(NOTE: These will be ported over to the current repository soon. ToDo)**