# Emulating a DRP `Object` Catalog with a Simple Analytic Model

_Ji Won Park, Phil Marshall_

Created: July 19, 2019 at the LSST DESC hack day

Last run: 2019-07-19

The goals for this demo notebook are to:

* Show what the `Analytic` model class does, and 
* Check that its outputs are sensible. 

We'll do this by making an emulated `Object` catalog using a very simple analytic emulator, and making the same plots that we use to evaluate BNN emulator performance. The idea is that the analytic model can serve as the baseline for any ML-based emulator.

### Requirements

For this notebook to run to completion, you will need a copy of the test object dataset, and to have installed the dependencies.

In [None]:
! pip install -r requirements.txt

In [None]:
# ! curl -o obj_master_tract4850.csv "https://drive.google.com/file/d/1bEnSJ6YnkWyhXNaQdyjRWE8x3SS6XtVV/view?usp=sharing"

### Setting-up

We have some standard imports to do, and then the things we need to do in order to use objects from the `torch` library.

In [None]:
import torch
import numpy as np
import json
import matplotlib.pyplot as plt

%load_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
np.random.seed(2809)
torch.manual_seed(2809)
torch.cuda.manual_seed(2809)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if device=='cuda':
    torch.set_default_tensor_type('torch.cuda.FloatTensor')
else:
    torch.set_default_tensor_type('torch.FloatTensor')
print("device: ", device)

## Emulating an `Object` Catalog

The `Analytic` model has the same behavior as the `BNN` models - so we first follow the same steps to get the data in shape.

In [None]:
args = json.load(open("args.txt"))

############
# Data I/O #
############

from derp_data import DerpData
import itertools

# X base columns
truth_cols = list('ugrizy') + ['ra_truth', 'dec_truth', 'redshift', 'star',]
truth_cols += ['mag_true_%s_lsst' %bp for bp in 'ugrizy']
truth_cols += ['size_bulge_true', 'size_minor_bulge_true', 'ellipticity_1_bulge_true', 'ellipticity_2_bulge_true', 'bulge_to_total_ratio_i']
truth_cols += ['size_disk_true', 'size_minor_disk_true', 'ellipticity_1_disk_true', 'ellipticity_2_disk_true',]
opsim_cols = ['m5_flux', 'PSF_sigma2', 'filtSkyBrightness_flux', 'airmass', 'n_obs']
# Y base columns
drp_cols = ['extendedness', 'ra_obs', 'dec_obs', 'Ixx', 'Ixy', 'Iyy', 'IxxPSF', 'IxyPSF', 'IyyPSF', ]
drp_cols_prefix = ['cModelFlux_', 'psFlux_']
drp_cols_suffix = []
#drp_cols_suffix = ['_ext_photometryKron_KronFlux_instFlux', '_base_CircularApertureFlux_70_0_instFlux', 
drp_cols += [t[0] + t[1] for t in list(itertools.product(drp_cols_prefix, list('ugrizy')))]
drp_cols += [t[1] + t[0] for t in list(itertools.product(drp_cols_suffix, list('ugrizy')))]


# Define dataset
data = DerpData(data_path='raw_data/obj_master_tract4850.csv',
    data_path2=None,
    X_base_cols=truth_cols + opsim_cols, 
    Y_base_cols=drp_cols, 
    args=args, ignore_null_rows=True, save_to_disk=True)
if not args['data_already_processed']:
    data.export_metadata_for_eval(device_type=device.type)
# Read metadata if reading processed data from disk:
data_meta = json.load(open("data_meta.txt"))

X_cols = data_meta['X_cols']
Y_cols = data_meta['Y_cols']
train_indices = data_meta['train_indices']
val_indices = data_meta['val_indices']
X_dim = data_meta['X_dim']
Y_dim = data_meta['Y_dim']

from torch.utils.data.sampler import SubsetRandomSampler
from torch.utils.data import DataLoader

# Split train vs. val
train_sampler = SubsetRandomSampler(train_indices)
val_sampler = SubsetRandomSampler(val_indices)

# Define dataloader
kwargs = {'num_workers': 1, 'pin_memory': True} if device=='cuda' else {}
train_loader = DataLoader(data, batch_size=args['batch_size'], sampler=train_sampler, **kwargs)
val_loader = DataLoader(data, batch_size=args['batch_size'], sampler=val_sampler, **kwargs)

Let's take a look at the output columns of the catalog we are aiming to emulate:

In [None]:
data_meta['X_cols']

Now we instantiate the simple `Analytic` model, and have it predict the output catalog. Note that there is no training: the analytic model has a hard-coded astronomy model for the DRP object properties and their errors. So, we just compute the predicted mean properties and the log variances on them, and pass them both to a sampling function to make the emulated table.

In [None]:
import pandas as pd
import models
import solver

X_val = data.X[val_indices, :]

analytic = models.Analytic()
params = analytic(X_val, data_meta)
sample = solver.sample(**params)

The `Analytic` model is very simple, it just adds Gaussian noise to the true parameters according to simple formulae for photometric, astrometric, etc errors. The class docstring contains a brief summary of the formulae used.

In [None]:
help(models.Analytic)

## Visualizing the Emulated Catalog

Let's compare the observed quantities, from the DRP `Object` table, and our simple emulations of them. Both these quantities are noisy - what we would like is for them to have similar noise properties. Simple scatter plots, of `x_emulated` vs `x_observed` may not be very illuminating; plotting the mean and stdev of `x`, in bins of `x` for both `x_emulated` and `x_observed`, should give more insight.