In [2]:
%load_ext autoreload
%autoreload 2

*Authors: Ema Donev and dr. Ivezić*

In this notebook we tackle the portion of our stars which are RR Lyraes. We conduct extensive light curve analysis, starting from period calculation using the Lomb-Scargle periodogram. We also calculate the ratio of periods between LINEAR and ZTF, as well as the mean period used later for analysis. From the periodogram we conduct additional analysis of local peak indication of the Blazhko effect. We also fit light curves based on their period and save statistical metrics such as chi2 and rms. We use this data later for BE candidate analysis.

In [3]:
#| code-fold: false
# IMPORTING LIBRARIES
# --------------------

# AstroML
from astroML.datasets import fetch_LINEAR_sample
from astropy.coordinates import SkyCoord
import astropy.units as u

# Basic libraries
import pickle
import os
import sys
from tqdm import tqdm

# Plotting
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib import ticker
import matplotlib.colors as mcolors
from matplotlib.font_manager import FontProperties

# DataFrame analysis
import pandas as pd

# Math libraries
import numpy as np
import scipy as sc
from scipy.stats import norm

# CONFIGURATION
# -------------
sns.set_theme() # setting the theme for plotting
sys.path.insert(0,'../src/')
np.random.seed(42)

# configuring plotting colors
colors = ['#5F6372', '#79A8A4', '#B2AD8F', '#92A186', '#AD8082']
blue = '#5F6372'
turqoise = '#79A8A4'
light_green = '#B2AD8F'
green = '#92A186'
pink = '#AD8082'
muted_colors = sns.set_palette(sns.color_palette(colors))

# configuring fonts for plotting
font = FontProperties()
font.set_family('avenir')
font.set_name('Big Caslon')
font.set_style('normal')
font.set_size('xx-large')


%matplotlib inline  

In [4]:
sys.path.insert(0,'../src/')
import blazhko_analysis
import BE_plotting
import helper
import RR_lyrae_params
import data_access

In [5]:
#| code-fold: false
# LINEAR DATA
dataL = fetch_LINEAR_sample(data_home='../inputs') # fetching the data from astroML data library

# Methods used for analysis of RR Lyrae light curves

### Period determination

The Lomb-Scargle method of computing the period of a variable star using light curves involves fitting multi-level sinusoidal waves to our light curve data over various trial periods, and the period with the most prominent strength or the best fit is the actual period.

This is the formula for the Lomb-Scargle periodogram:

$$P(ω)= \frac{1}{2σ^2}​*(\frac{[∑y_k cosω(t_k−τ)]^2}{∑cos^2 ω(t_k−τ)} + \frac{[∑y_k sin ω(t_k−τ)]^2}{∑ sin^2 ω(t_k−τ)})$$

We use the Lomb-Scargle periodogram because it is the most famous tool for determining the periodic variable stars' period. It combines multiple methods of analysis, with its core being in Fourier analysis, and it operates with non-uniformly spaced data, such is the nature of light curves.

### Light curve folding

**Light curve phasing or folding** is when we condense the entire timeseries of the light curve into one *phase*, or one cycle of variability. Phase is defined with the period of the light curve, as such: $$phase = \frac{time}{period} - int(\frac{time}{period})$$ This function defines the period as a value from 0 to 1 defining how far in is the star in its variability cycle. We can then adjust the `time` array using this function.

### Fitting light curves
**Light curve fitting** requires a sophisticated function which uses Fourier analysis to correctly fit a curve through light curve `time`, `mag` and `magerr` data. We can define how many terms we want our fit to have. *The more terms we use, the more complicated shapes we can graph*. The fitting is an essential part of light curve analysis, as we can then calculate the goodness of fit and monitor how that fit changes over long periods of time, like in the case of `LINEAR` and `ZTF`.

### `Chi-2 test` and `RMS scatter`

#### `RMS scatter`
RMS scatter is a statistical metric that, in this case, measures the goodness of fit versus the actuall data. It is calculated as such: $$RMS = \sqrt{\frac{1}{n}\sum^n_{i=1} x_i^2}$$

#### `Chi-2 test`
This test asseses the "goodness of fit" of light curves, by comparing the fitted values and the actual magnitude values of the light curve. The lower the chi-2 value, the better the correlation between the observed and expected values (based on some model), and the higher the chi-2 value, the poorer the correlation. At which point the value is too high or too low is determined by the statistician, based on the degrees of freedom and the nature of the data. The chi-2 value for every point of a light curve is calculated as such: $$\chi^2 = \sum{\frac{(O_i - E_i)^2}{E_i}}$$

### Periodogram analysis

As a final step of the RR Lyrae analysis process, we need to analyze the periodograms which the Lomb-Scargle method produces. A periodogram is a plot of the power of each frequency, telling us which frequency, or period, fits the light curve best. We can use this periodogram to look for signs of the Blazhko Effect: if there are 2 subsequent local peaks around the maximum, the star is likely a BE star. 

We determine the existence of local peaks by folding the periodogram by the maximum peak, multiplying each folded part to amplify signals so they are easier to detect, and we then search for the next strongest peak. Our program also calculates the 1-year anomaly, making sure that we really find the correct frequencies.

# Analysis of RR Lyrae stars

### Importing data

In [6]:
Lrrlyr = pd.read_csv('../outputs/Lrrlyr_unprocessed.csv')
ztfdata = np.load('../outputs/ZTF_dataset_FINAL_FINAL_FULL.npy', allow_pickle=True)

In [7]:
#| code-fold: false
# CONFIG BEFORE BIG RUN

Lids = Lrrlyr['ID'].to_numpy()
lc_type = Lrrlyr['LCtype'].to_numpy()
rectascension = Lrrlyr['ra'].to_numpy() # list of rectascension coordinates
declination = Lrrlyr['dec'].to_numpy() # list of declination coordinates

end = 'FINAL_FINAL_FULL'

In [10]:
test = len(Lids)
ZTF_data_best = []
periodograms = []
lc_analysis = {}
fits = []

if os.path.isfile('../outputs/LC_ANALYSIS_FINALFINAL_FINAL_FULL.csv'):
    print('Loading data!')
    lightcurve_analysis = pd.read_csv('../outputs/LC_ANALYSIS_FINALFINAL_FINAL_FULL.csv')
    periodogr = pickle.load(open("../outputs/periodograms_"+end+".pkl", "rb"))
    fits = np.load("../outputs/fits_"+end+".npy", allow_pickle=True)
    ztfdata = np.load("../outputs/ztf_data_filter"+end+".npy", allow_pickle=True)
else:
    for i in tqdm(range(test)):
        lc_analysis, periodograms, fits, ZTF_data_best = RR_lyrae_params.RR_lyrae_analysis(end, i, Lids, ztfdata, lc_analysis, ZTF_data_best, fits, periodograms)
        
        if i%10==0:
            # save dictionary to pkl file
            with open('../outputs/lc_analysis_'+end+'.pkl', 'wb') as fp:
                pickle.dump(lc_analysis, fp)
            with open("../outputs/periodograms_"+end+".pkl", "wb") as x:
                pickle.dump(periodograms, x)
            FITS = np.array(fits, dtype=object)
            np.save("../outputs/fits_"+end+".npy", FITS, allow_pickle=True)
            ZTF_data_lca = np.array(ZTF_data_best, dtype=object)
            np.save("../outputs/ztf_data_filter"+end+".npy", ZTF_data_lca, allow_pickle=True)
    # save dictionary to pkl file
    with open('../outputs/lc_analysis_'+end+'.pkl', 'wb') as fp:
        pickle.dump(lc_analysis, fp)
    with open("../outputs/periodograms_"+end+".pkl", "wb") as x:
        pickle.dump(periodograms, x)
    FITS = np.array(fits, dtype=object)
    np.save("../outputs/fits_"+end+".npy", FITS, allow_pickle=True)
    ZTF_data_lca = np.array(ZTF_data_best, dtype=object)
    np.save("../outputs/ztf_data_filter"+end+".npy", ZTF_data_lca, allow_pickle=True)

Loading data!


# Statistical analysis of data

(conduct basic statistical analysis for this part of the notebook)