# SNCosmo Chi-Squared Bug

This notebook identifies a bug in `sncosmo` where returned chi-squared values are much larger than expected. This behavior arises from the fact that the `sncosmo` chi-squared calculation will incorporate values that fall outside the range of the model. We outline a workaround for this bug and ensure that the chi-squared calculation in our pipeline does not suffer from the same problem.

#### Table of Contents:
1. <a href='#outlining_behavior'>Outlining sncosmo's Behavior </a>: Outlines the bug mentioned above
1. <a href='#sn91bg'>Fitting with the SN91bg Model</a>: Checks our handling of this bug in our analysis is compatible with the SN91bg model we use.
1. <a href='#out_of_range'>Out of Range Bands</a>: Demonstrates why out of range wavelengths are not a problem even though out of range phases are.


In [1]:
import sys

import numpy as np
import sncosmo
from astropy.table import Table
from matplotlib import pyplot as plt
from sndata.csp import dr3

sys.path.insert(0, '../')
from phot_class import utils
from phot_class import models
from phot_class.fit_funcs import simple_fit

models.register_sources(force=True)
dr3.download_module_data()
dr3.register_filters(force=True)


## Outlining `sncosmo`'s Behavior  <a id='outlining_behavior'></a>

We start by fitting a light curve with Salt2 and visually inspecting the result. Using the fitted model, we then tabulate the modeled flux value at each observed epoch. Note that we are using the `simple_fit` function from our analysis pipeline, which is a wrapper for `sncosmo.fit_lc`. We do this to avoid a separate bug where `sncosmo.fit_lc` mutates its arguments."


In [None]:
test_id = '2005ke'
data = dr3.get_data_for_id(test_id, format_sncosmo=True)

# Remove bands outside our models wavelength range
# Failing to do this will raise an error when fitting
# We will consider how this effects chi-squared in a later section
data = data[data['band'] != 'csp_dr3_Y']
data = data[data['band'] != 'csp_dr3_Ydw']
data = data[data['band'] != 'csp_dr3_J']
data = data[data['band'] != 'csp_dr3_H']

salt2 = sncosmo.Model('salt2')
salt2.set(z=data.meta['redshift'])

result, fitted_model = simple_fit(data, salt2, ['t0', 'x0', 'x1', 'c'])
sncosmo.plot_lc(data, fitted_model)


Note in the above figure that the flux drops to zero at 50 days. This occurs because the range of the salt2 model only extends to 50 days. We can see this behavior when tabulating the modeled flux values.

In [None]:
data['model_flux'] = fitted_model.bandflux(data['band'], data['time'])

# Show data past a phase of 50 days
t0 = fitted_model.parameters[1]
data[data['time'] - t0 >= 50]

Keeping the above table in mind, we calculate the chi-squared value in four different ways:

1. We manually calculate the chi-squared the entire data table (including phases > 50 days)
1. We manually calculate the chi-squared using only data where the modeled flux > 0
1. Using the `sncosmo.chisq` function
1. Using the `utils.calc_model_chisq` function from our analysis pipeline

We will see that the first and third calculation are (almost) the same, indicating `sncosmo` does not take into account the model range when calculating chi-squared. We will also see that the second and fourth values are also close, meaning our analysis pipeline does not suffer from the same bug.

In [None]:
positive_model_flux = data['model_flux'] > 0
bounded_data = data[positive_model_flux]

naive_chisq = sum(((data['flux'] - data['model_flux']) / data['fluxerr']) ** 2)
bounded_chisq = sum(((bounded_data['flux'] - bounded_data['model_flux']) / bounded_data['fluxerr']) ** 2)
sncosmo_chisq = sncosmo.chisq(data, fitted_model)
pipeline_chisq, pipeline_dof = utils.calc_model_chisq(data, result, fitted_model)

print('naive_chisq   :', naive_chisq)
print('bounded_chisq :', bounded_chisq)
print('sncosmo_chisq :', sncosmo_chisq)
print('pipeline_chisq:', pipeline_chisq)


## Fitting with the SN91bg model  <a id='sn91bg'></a>

In addressing the above bug, we rely on the modeled flux being zero outside the range of the model. We quickly check this is a valid assumption for our custom 91bg model.

In [None]:
bg_id = '2005ke'
bg_data = dr3.get_data_for_id(bg_id, format_sncosmo=True)
bg_data = bg_data[bg_data['band'] != 'csp_dr3_Y']
bg_data = bg_data[bg_data['band'] != 'csp_dr3_Ydw']
bg_data = bg_data[bg_data['band'] != 'csp_dr3_J']
bg_data = bg_data[bg_data['band'] != 'csp_dr3_H']

sn91bg = sncosmo.Model('sn91bg')
sn91bg.set(z=bg_data.meta['redshift'])
bg_result, bg_fit = simple_fit(bg_data, sn91bg, ['t0', 'x0', 'x1', 'c'])
sncosmo.plot_lc(bg_data, bg_fit)


We can see in the above plot a visual drop in the modeled flux to zero at 50 days. This means our handling of the chi-squared bug will work for our custom 91bg model. We also calculate the chi-squared values below and note that our fit with the 91bg model performs better than with the salt2 model. This is good news since the target we are using is a 91bg!

In [None]:
bg_data['model_flux'] = bg_fit.bandflux(bg_data['band'], bg_data['time'])
bounded_bg_data = bg_data[bg_data['model_flux'] > 0]

naive_bg_chisq = sum(((bg_data['flux'] - bg_data['model_flux']) / bg_data['fluxerr']) ** 2)
bounded_bg_chisq = sum(((bounded_bg_data['flux'] - bounded_bg_data['model_flux']) / bounded_bg_data['fluxerr']) ** 2)
sncosmo_bg_chisq = sncosmo.chisq(bg_data, bg_fit)
pipeline_bg_chisq, pipeline_dof = utils.calc_model_chisq(bg_data, bg_result, bg_fit)

print('naive_chisq   :', naive_bg_chisq)
print('bounded_chisq :', bounded_bg_chisq)
print('sncosmo_chisq :', sncosmo_bg_chisq)
print('pipeline_chisq:', pipeline_bg_chisq)


## Out of Range Bands <a id='out_of_range'></a>

We have seen above that the limited phase range of a model effects the chi-squared significantly. However, the same is not true for observations in a bandpass that is outside the model's wavelength range, since an error is raised. We demonstrate this below:


In [None]:
all_data = dr3.get_data_for_id('2005kc', format_sncosmo=True)

# Get data that is within the model range
out_of_range_bands = ['csp_dr3_Y', 'csp_dr3_Ydw', 'csp_dr3_J', 'csp_dr3_H']
is_out_range = np.any([all_data['band'] == b for b in out_of_range_bands], axis=0)
in_range_data = all_data[~is_out_range]

salt2 = sncosmo.Model('salt2')
salt2.set(z=data.meta['redshift'])

# Fit using just the in range data
result, fit = simple_fit(in_range_data, salt2, ['t0', 'x0', 'x1', 'c'])
in_range_chisq = sncosmo.chisq(in_range_data, fit)
all_chisq = sncosmo.chisq(all_data, fit)


In [25]:
model = sncosmo.Model('salt2')
t0 = model.parameters[1]
zp_system = 'ab'
zero_point = 25
band_name = 'sdssg'
flux_offset = 100
flux_err_coeff = .1

phase = np.arange(-10, 30) + t0
model_flux = model.bandflux(band_name, phase, zero_point, zp_system) 
flux = model_flux + flux_offset
flux_err = flux * flux_err_coeff
band = np.full(len(phase), band_name)
zpsys = np.full(len(phase), zp_system)
zp = np.full(len(phase), zero_point)

data = Table(
    [phase, band, flux, flux_err, zp, zpsys],
    names=['time', 'band', 'flux', 'fluxerr', 'zp', 'zpsys']
)
result = sncosmo.utils.Result(
    {'vparam_names': ['t0', 'x0', 'x1', 'c']})


In [26]:
data

time,band,flux,fluxerr,zp,zpsys
float64,str5,float64,float64,int64,str2
-10.0,sdssg,252700.5215313203,25270.05215313203,25,ab
-9.0,sdssg,306512.20076980506,30651.220076980506,25,ab
-8.0,sdssg,364897.5753339071,36489.75753339071,25,ab
-7.0,sdssg,420500.3139591927,42050.03139591927,25,ab
-6.0,sdssg,469418.09053297486,46941.80905329749,25,ab
-5.0,sdssg,511037.3156453642,51103.73156453642,25,ab
-4.0,sdssg,544407.7945448775,54440.779454487754,25,ab
-3.0,sdssg,569650.9774658966,56965.097746589665,25,ab
-2.0,sdssg,587865.4809583761,58786.548095837614,25,ab
-1.0,sdssg,598212.86782432,59821.286782432,25,ab


In [27]:
expected_chisq = np.sum(((flux - model_flux) / flux_err) ** 2)
chisq, dof = utils.calc_model_chisq(data, result, model)

expected_chisq == chisq, expected_chisq, chisq

(False, 0.0010807669032814051, 3999.5629168679006)

In [32]:
sncosmo.chisq(data, model)

0.0010807669032814051

In [35]:
dof, len(data) - 4

(36, 36)