In [None]:
# Some standard definitions and some new:
import numpy as np
from astropy.stats import LombScargle
import matplotlib.pyplot as plt
import pandas as pd

# fancy plot layouts! see http://www.futurile.net/2016/02/27/matplotlib-beautiful-plots-with-style/
# print(plt.style.available) to see the list.
plt.style.use('seaborn-whitegrid') 
%matplotlib inline

# Getting Started with Lomb-Scargle Periodograms in Python

This notebook exercise is meant to familiarize you with applying periodogram analyses to real data. You will need to complete cells and add comments as necessary, then submit your final notebook on Moodle.

For deeper reading and more information on periodogram analysis, see: 

_Understanding the Lomb-Scargle Periodogram_, by Jake VanderPlas  
https://arxiv.org/abs/1703.09824

(Notebook material adapted from Jupyter notebooks by J. VanderPlas, available on Github)

### Due Monday, April 2nd at 12 pm on Moodle

## Group Names:

In [None]:
# Create a set of synthetic data with sinusoidal variation over time

rand = np.random.RandomState(42)
t = 100 * rand.rand(100)
y = np.sin(2 * np.pi * 2.0 * t) + 0.1 * rand.randn(100)

# generate some errorbars
dy = 0.1 * (1 + rand.rand(100))

## In the cell below, plot the synthetic data set ```(t, y)``` with errorbars ```(dy)```:

In [None]:
# Your code here:



### Q: Can you see a periodic signal within this dataset? What signal might we expect based on how the synthetic data were created?

### Answer:

(your answer here)


### The periodogram analysis can be as simple as one step (!), below. 

You can read the documentation here: http://docs.astropy.org/en/stable/stats/lombscargle.html

In [None]:
# Do Lomb-Scargle!

f, p = LombScargle(t, y, dy).autopower()

### What do _f_ and _p_ correspond to? (check the documentation for more info)
### Answer:
(your answer here)

## In the cell below, plot the periodogram:

In [None]:
# Your code here:



### Q: What is the dominant frequency of periodicity within this dataset? How does it compare with how we created this synethetic dataset at the beginning?

### Answer:

(your answer here)

### We can now determine the frequency by determining the largest value of the power:

In [None]:
best_frequency = f[np.argmax(p)]
best_frequency

### Following the approach from our phase-folding worksheet a few weeks ago, calculate the phase and the standard phase below:

(Hint: What is the best period corresponding to our frequency estimate from LS?)

In [None]:
# Your code here:

best_period = 
phase = 

In [None]:
# Your code here:

stnd_phase = 

## Plot the corresponding standard phase diagram below:

In [None]:
# Your code here:



### Now we can see how our Lomb-Scargle model fits the data.

### First we have to calculate the model, evaluated at the original time sampling of the data:

In [None]:
t_fit = np.linspace(t.min(),t.max(),100)

In [None]:
y_fit = LombScargle(t, y, dy).model(t_fit, best_frequency)

## Now phase-fold the fit results for the Lomb-Scargle model, based on the best period:

In [None]:
# Your code here:
# Hint: the "reference time" in this case should be t[0]; otherwise the phase will be incorrect.

phase_LS = 
stnd_phase_LS = 

## Finally, plot the phased data and the models below. 

### For readability, make the model a black line, and the synethetic dataset point symbols: ```fmt="."``` Be sure to add labels to the datasets and a legend.

In [None]:
fig, ax = plt.subplots(figsize=(12,6))

# Your code here: 



***

## Now that we've seen how the approach can be used on (simple!) synthetic data, let's work with some real data, from the LINEAR survey. 



The <a href="https://en.wikipedia.org/wiki/Lincoln_Near-Earth_Asteroid_Research">LINEAR Survey</a> is actually a program designed to find/track potentially-hazardous asteroids. 

However, they also get some great stellar variability data along the way! In fact, they've <a href="https://arxiv.org/abs/1505.02082">cataloged >7200 variable stars</a>.

## We'll start by reading in data and examining it:

We're using the pandas software library in python to read in this data table. The variable "data" now corresponds to a full data table, known as a "dataframe":


In [None]:
data = pd.read_csv('LINEAR_11375941.csv')

In the cells below, use data.head() and data.tail() to examine parts of the table/dataframe:

In [None]:
# Your code here:


In [None]:
# Your code here:


**Reminder:**   
Like normal arrays in numpy, you can select and work with a single column from a pandas dataframe by using the variable name and the column label, e.g.,

```data.magerr```

Try it out below...

In [None]:
# Your code here:


### What is the total timespan of the observations, in years?

In [None]:
# Your code here - make sure to check that your answer is reasonable!

# Hint: note that you can use data.t.min() to get the minimum value of a pandas data column, "t"



### In the cell below, plot the raw data. Recall that the Y-axis is in magnitudes (which direction should it go?)

In [None]:
# Your code here:



## Set up the Lomb-Scargle periodogram

In [None]:
# This is similar to the step performed with our synethetic data, but creates a new variable 
# corresponding to the Lomb-Scargle analysis that we can use.

lmscgl = LombScargle(data.t, data.mag, data.magerr)

In [None]:
frequency, power = lmscgl.autopower(nyquist_factor=500, minimum_frequency = 0.2)

## Q: In the cell above, what do the Nyquist factor and minimum frequency correspond to, and why are they important?
(hint: see the documentation again http://docs.astropy.org/en/stable/stats/lombscargle.html)

### Answer:

(your answer here)

## Calculate the best estimate of period from LS, and phase the data accordingly:

In [None]:
period_days = 1./frequency
best_frequency = frequency[np.argmax(power)]
best_period = period_days[np.argmax(power)] # units are still in days...
print('Best period: ', best_period*24, ' hours')

In [None]:
# Calculate the standard phase of this dataset.

# Your code here:



## Plot the periodogram of Lomb-Scargle power as a function of the period in days. 

### Note: This can be somewhat intensive to plot, so add ```rasterized='true'``` to your plot command so it loads more quickly.

In [None]:
# Your code here:



### Does the dominant period in the plotted periodogram agree with the peak estimate? Note that it's difficult to discern the period in days. Maybe a change of units will help. 

## Below, replot the periodogram, but this time in units of _hours_.

In [None]:
# Zoom in and change the units!

# Your code here:


## We can now plot the phase-folded data, based upon the best period found from the periodogram analysis. 

In [None]:
# Your code here:



## And finally, we can plot the data and the model of variation from the LS best-frequency estimate. Describe below what is defined/happening in the following cell below. What is the L-S method calculating, and how?


## Answer:
(Your answer here)

In [None]:
# Generate the corresponding model from the LS results:
phase_model = np.linspace(-0.5, 1.5, 100)
mag_model = lmscgl.model(phase_model / best_frequency, best_frequency)

In [None]:
fig, ax = plt.subplots(figsize=(10,3))

plt.plot(phase_model, mag_model, 'k-')

# Remember that the 0-to-1 standard phase is just a snapshot of the full variation. 
# Therefore, we can extend the plots in phase a bit in either direction, to show the overlap:

plt.errorbar(phase + -1.0, data.mag, data.magerr, fmt='.', color='gray', ecolor='0.5')
plt.errorbar(phase + 0, data.mag, data.magerr, fmt='.', color='gray', ecolor='0.5')
plt.errorbar(phase + 1.0, data.mag, data.magerr, fmt='.', color='gray', ecolor='0.5')

plt.xlim(-0.5, 1.5)
plt.xlabel('Phase')
plt.ylabel('Magnitude')
plt.gca().invert_yaxis()