# Calculation of RR Lyrae light curve periods

*Authors: Ema Donev and dr. Ivezić*

In this notebook we tackle the portion of our stars which are RR Lyraes. We use a fine-grid Lomb Scargle periodogram to calculate the periods of `LINEAR` and `ZTF` light curves, which we will use in later analysis.

In [1]:
# IMPORTING LIBRARIES
# --------------------

# AstroML & Astropy
from astroML.datasets import fetch_LINEAR_sample
from astropy.timeseries import LombScargle
from astroML.datasets import fetch_LINEAR_sample
from astroML.datasets import fetch_LINEAR_geneva
from astropy.timeseries import TimeSeries
from astropy.table import Table

# ZTF
from ztfquery import lightcurve

# Basic libraries
import random
import pickle
import os
import sys
from tqdm import tqdm

# Plotting
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib import ticker
import matplotlib.colors as mcolors
from matplotlib.font_manager import FontProperties

# DataFrame analysis
import pandas as pd
import dask.dataframe as dd 

# Math libraries
import numpy as np
import scipy as sc
from scipy.stats import norm

# Multithreading/multiprocessing libraries
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor
import threading

In [2]:
# CONFIG
sns.set_theme() # setting the theme for plotting
np.random.seed(42)

colors = ['#1A090D', '#D8C99B', '#D8973C', '#BD632F', '#273E47']
cmap = mcolors.ListedColormap(colors)

font = FontProperties()
font.set_family('avenir')
font.set_name('Avenir')
font.set_style('normal')
font.set_size('xx-large')

%matplotlib inline  

In [17]:
# Importing custom libraries
# ----------------------------
sys.path.insert(0,'../src/')
from ZTF_data import *
from config import*
from descriptive_stats import *
from plots import *
from selection import *
from period_calc import *

In [4]:
# DATA
data = fetch_LINEAR_sample(data_home='../inputs') # fetching the data from astroML data library
ZTF_data = data_ztf('ZTF_light_curves.npy')

Loading the data!


# Methods

The Lomb-Scargle method of computing the period of a variable star using light curves involves fitting multi-level sinusoidal waves to our light curve data over various trial periods, and the period with the most prominent strength or the best fit is the actual period.

This is the formula for the Lomb-Scargle periodogram:

$$P(ω)= \frac{1}{2σ^2}​*(\frac{[∑y_k cosω(t_k−τ)]^2}{∑cos^2 ω(t_k−τ)} + \frac{[∑y_k sin ω(t_k−τ)]^2}{∑ sin^2 ω(t_k−τ)})$$

- $ω$ is the frequency we test to see if it matches our light curve.
- $σ^2$ is the variance of the data. We are dividing by it to normalize the data across all distributions.
- $∑y_k cosω(t_k−τ)$ and $∑y_k sin ω(t_k−τ)$: checking how well the fitted cosine and sine waves align with the data. $t_k$ is the particular time point, and $τ$ is the time delay, for efficiency.
- $∑cos^2 ω(t_k−τ)$ and $∑ sin^2 ω(t_k−τ)$: normalization of the fitting in order to eliminate bias towards powerful or weak wave components. 

We use the Lomb-Scargle periodogram because it is the most famous tool for determining the periodic variable stars' period. It combines multiple methods of analysis, with its core being in Fourier analysis, and it operates with non-uniformly spaced data, such is the nature of light curves.

# Selecting RR Lyrae stars

## `LINEAR` database

In order to select the RR Lyrae stars, we need information about their classification type and which stars have enough datapoints for correct period calculation. We access data with the `fetch_LINEAR_geneva()` which contains additional information about the light curves in the **Geneva catalog** of the `LINEAR` variable stars.

In [5]:
L = select_LINEAR('LINEAR_Periods_nterm3')

In [7]:
L.head()

Unnamed: 0,ID,Porig,Pnew,ra,dec,ug,gi,iK,JK,logP,Ampl,skew,kurt,magMed,nObs,LCtype
0,29848,0.557009,0.557019,119.526443,46.96212,1.17,0.37,1.02,0.27,-0.254138,0.62,-0.31,-0.57,16.37,301,1
1,32086,0.569258,0.569266,119.324013,47.095505,1.36,0.52,1.17,0.31,-0.244691,0.71,-0.49,-1.0,15.02,289,1
2,50402,0.643293,0.643286,119.712975,52.149574,1.18,0.39,1.1,0.2,-0.191591,0.49,-0.29,-0.88,16.46,284,1
3,61011,0.662369,0.662376,118.491257,53.168125,0.81,0.55,1.62,0.2,-0.1789,0.69,-0.03,-1.06,14.08,274,1
4,62892,0.530772,0.530764,119.187241,53.379295,1.12,0.21,1.07,0.21,-0.275092,0.62,-0.55,-0.32,16.54,276,1


We have now selected all of the "good" `LINEAR` IDs, now we need to select the RR Lyrae stars.

We first select "1 dip stars" using the coefficient between the original (correct) and our calculated period. Since 1 dip stars are easy to calculate periods for, they should match very well (from a range from 0.99 to 1.01). However, Eclipsing Binaries do not match up well, and often times have a ratio where our calculated period is 2 times bigger than the correct one. This is the easiest way to differentiate the two.

In [8]:
# SELECTING 1 dip STARS
# -------------------------

P_ratio = L['Porig']/L['Pnew']

L_1 = L[(P_ratio>0.99)&(P_ratio<1.01)] # where the ratio between the original (correct) and currently calculated periods are 1:1
L_1.head()

Unnamed: 0,ID,Porig,Pnew,ra,dec,ug,gi,iK,JK,logP,Ampl,skew,kurt,magMed,nObs,LCtype
0,29848,0.557009,0.557019,119.526443,46.96212,1.17,0.37,1.02,0.27,-0.254138,0.62,-0.31,-0.57,16.37,301,1
1,32086,0.569258,0.569266,119.324013,47.095505,1.36,0.52,1.17,0.31,-0.244691,0.71,-0.49,-1.0,15.02,289,1
2,50402,0.643293,0.643286,119.712975,52.149574,1.18,0.39,1.1,0.2,-0.191591,0.49,-0.29,-0.88,16.46,284,1
3,61011,0.662369,0.662376,118.491257,53.168125,0.81,0.55,1.62,0.2,-0.1789,0.69,-0.03,-1.06,14.08,274,1
4,62892,0.530772,0.530764,119.187241,53.379295,1.12,0.21,1.07,0.21,-0.275092,0.62,-0.55,-0.32,16.54,276,1


Now, we select the *RR Lyrae* stars using the color filter values `(g-i)` and the light curve classification `(LCtype)`. 

In [10]:
Lrrlyr = L_1[(L_1['gi']>-0.5)&(L_1['gi']<0.4)&(L_1['LCtype']>0)&(L_1['LCtype']<3)]
print(len(Lrrlyr))
Lrrlyr.head()

2710


Unnamed: 0,ID,Porig,Pnew,ra,dec,ug,gi,iK,JK,logP,Ampl,skew,kurt,magMed,nObs,LCtype
0,29848,0.557009,0.557019,119.526443,46.96212,1.17,0.37,1.02,0.27,-0.254138,0.62,-0.31,-0.57,16.37,301,1
2,50402,0.643293,0.643286,119.712975,52.149574,1.18,0.39,1.1,0.2,-0.191591,0.49,-0.29,-0.88,16.46,284,1
4,62892,0.530772,0.530764,119.187241,53.379295,1.12,0.21,1.07,0.21,-0.275092,0.62,-0.55,-0.32,16.54,276,1
5,91437,0.674728,0.674711,120.29496,40.932457,1.18,0.24,1.09,0.37,-0.170871,0.75,-0.12,-0.93,15.39,177,1
6,95250,0.313869,0.313869,120.124542,40.65662,1.18,-0.14,0.83,0.28,-0.503252,0.55,0.14,-0.65,16.98,222,2


We now have 2710 RR Lyrae stars from the `LINEAR` database. We can differentiate them further into *RR Lyrae AB* and *RR Lyrae C* types, but this can be done later (we just have to conserve the `LCtype` column). The next step is to match the IDs of the RR Lyrae to the `ZTF` IDs.

## `ZTF` database

In [11]:
L_ids = [x for x in data.ids] # LINEAR ids
Z_ids = [x for x in range(7010)] # ZTF ids

In [12]:
matches = [] # list of matches
for i in range(len(L_ids)): # for every value in LINEAR ids
    m = (L_ids[i], Z_ids[i]) # make a tuple connecting the LINEAR id to the ZTF id
    matches.append(m) # append to master list
print(matches[:5])

[(10003298, 0), (10004892, 1), (10013411, 2), (10021274, 3), (10022663, 4)]


In [13]:
ZTF_rrlyrae = [] # list of RR Lyrae ids

for i in Lrrlyr['ID']: # for every id in the table Lrrlyr of LINEAR RR Lyrae stars
    for j in matches: # for every set in matches
        if i == j[0]: # if the LINEAR ids match
            m = j # we found a ZTF id
            ZTF_rrlyrae.append(m) # which we apped to our master list

ZTF_rrlyrae[:5]

[(29848, 4898), (50402, 5523), (62892, 5921), (91437, 6749), (95250, 6852)]

Now that we have all the IDs ready, it's time to calculate the periods of RR Lyrae stars with great detail.

# Calculating the periods of RR Lyrae stars

In [26]:
# CALCULATING THE PERIODS
# ---------------------------

num = NUM_STARS
#num = 2
PERIODS = {}
PERIODOGRAMS = []


for n, i in enumerate(tqdm(range(num))):
    Lid = ZTF_rrlyrae[i][0]
    Zid = ZTF_rrlyrae[i][1]
    # LINEAR CALCULATION
    t, mag, magerr = data.get_light_curve(Lid).T
    Plinear, fL, pL = doPeriods(t, mag, magerr, 3, lsPS=True)

    # ZTF CALCULATION
    ZTFdata = ZTF_data[Zid][1]
    Pztf, fZ, pZ = getZTFperiod(ZTFdata, 3, ZTFbands=['zg', 'zr', 'zi'], lsPS=True)

    P_mean = (Plinear+Pztf)/2
    P_ratio = Pztf/Plinear

    # Saving everything
    PERIODS[Lid] = [Zid, Plinear, Pztf, P_mean, P_ratio]

    Lperiodogram = (fL, pL)
    Zperiodogram = (fZ, pZ)

    PERIODOGRAMS.append((Lperiodogram, Zperiodogram))

    # Fail-safe saving
    # PERIODS
    if (n % 100) == 0:
        # save dictionary to pkl file
        with open('../outputs/periods.pkl', 'wb') as fp:
            pickle.dump(PERIODS, fp)
    if (n%100) == 0:
        np.save('../outputs/periodogram.npy', np.array(PERIODOGRAMS, dtype=object), allow_pickle=True)

 50%|█████     | 1/2 [00:00<00:00,  2.28it/s]

failed for band zi Ndata= 0


100%|██████████| 2/2 [00:01<00:00,  1.71it/s]
