# Lesson 1: Timeseries Data and Exoplanets


## Learning Goals: 
- Understand how missions like TESS and Kepler look for repeated changes in brightness to detect planets.
- Learn to plot a light curve using three distinct methods: 
    - light curve data
    - target pixel data
    - cutout an FFI.
- Start to be familiar with other common uses of timeseries data like stellar astrophysics, asteroseismology, etc.


### Nomenclature

It is often useful to compare exoplanets and stars to our own solar system. However, it very quickly gets tedious to say things like "1.1 Earth masses", so astronomers use some specialized vocabulary for these things.

* The symbol ☉ means "Solar" or "Sun". For example, $M_☉$ would mean "Solar Mass". 
* The symbol ⊕ means "Earth". $R_⊕$, for example means "Earth radius".
* AU, or astronomical unit, is the (average) distance at which the Earth orbits the sun. 

## How are exoplanets discovered?
### What is a transit? 

<img src="https://upload.wikimedia.org/wikipedia/commons/8/88/Exoplanet_transit_method.gif" width="500">

A transit occurs when a planet passes between a star and its observer. Transits reveal an exoplanet not because we directly see it from many light-years away, but because the planet passing in front of its star ever so slightly dims its light. This dimming can be seen in light curves – graphs showing light received over a period of time. When the exoplanet passes in front of the star, the light curve will show a dip in brightness. A light-curve is a plot of flux vs time, which is the type of plot we will be making today. 

More than just exoplanets, transits can be used to discover and analyze a variety of astronomical systems including binary stars, asteroseismic signals, and much more. We will explore these other applications more in Lesson 2 and beyond. Stay tuned!

## Plotting a Light Curve

We will be exploring the exoplanetary system around the star HD 21749. In 2019, TESS observations lead to the discovery of 2 exoplanets around HD 21749, one of which is Earth sized. HD 21749 is a K-type main sequence star, it has an estimated mass of 0.73 $M_☉$, a radius of 0.70 $R_☉$, and a luminosity of 0.20 $L_☉$.

The inner planet, HD 21749 c, is orbiting at a distance of 0.08 AU with a period of just 7.8 days. It has a radius of 1.1 R🜨 and was the first Earth-sized planet found by TESS. The outer planet, HD 21749 b, orbits the star at a distance of 0.21 AU with a period of 35.6 days. 

### Imports

The following cell holds the imported packages. These packages are necessary for running the rest of the cells in this notebook, and you can expect to use some of these packages almost everytime you do astronomical research. A description of each import is as follows:

* `numpy` to handle array mathematics
* `fits` from astropy.io for accessing FITS files
* `Table` from astropy.table for creating tidy tables of the data

* `matplotlib.pyplot` for plotting data and images
* `Observations` from astroquery.mast for querying data and observations from the MAST archive
* `Tesscut` from astroquery.mast for cutting out target pixels in FFI

In [None]:
from astropy.io import fits
from astropy.table import Table

from astroquery.mast import Observations
from astroquery.mast import Tesscut

import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

Don't forget to enable cloud data access!

In [None]:
Observations.enable_cloud_dataset()

### Query TESS data from MAST

We will be using the Observations class in the astroquery.mast subpackage from Astroquery. Visit [Lesson 0](../00-the-cloud/00-the-cloud.ipynb) if you want a refresher on querying for TESS data.

Let's get started by searching for TESS time series Observations of HD 21749.

In [None]:
# Query for TESS time series Observations of our target
TESS_table = Observations.query_criteria(objectname="HD 21749"
                                         , obs_collection="TESS"
                                         , dataproduct_type='timeseries'
                                         ) 

# Get associated science products for each Observation
data_products = Observations.get_product_list(TESS_table) 

# Keep only the science products
filtered = Observations.filter_products(data_products, productType="SCIENCE")

# Be selective about the columns we display
cols = ['obs_id', 'description', 'productSubGroupDescription']

# Look at the first five rows of the filtered results
filtered[0:5][cols]

We'd like to plot a light curve for this star, but we've returned a few different [TESS Data Products](https://outerspace.stsci.edu/display/TESS/2.0+-+Data+Product+Overview). 

A Target Pixel file (TPF) is a "postage stamp": a tiny subsection of TESS's full field of view that is focused around a specific target. TPFs are essentially a collection of images, allowing us to precisely measure the brightness of a particular patch of the sky.

A Light Curve (LC) is a more processed version of a TPF, with target brightness already extracted. This makes it easy to create a plot, but we lose the ability to select which pixels belong to the target, and which pixels to the background.

Let's explore these two file types in more detail by choosing an example Observation.

In [None]:
# We're fixing our example by providing the observation id
ex_id = "tess2018206045859-s0001-0000000279741379-0120-s"

# Get the light curve files
lc_prod = Observations.filter_products(data_products
                                      , obs_id = ex_id    # choosing a reproducable example
                                      , productSubGroupDescription = "LC")

# Get the target pixel files
tp_prod = Observations.filter_products(data_products
                                       , obs_id = ex_id    # Using the same example as above
                                       , productSubGroupDescription = "TP")

# Display the lc file as an example
lc_prod[cols]

We've selected the files we want, so now we need to access them.

### Download the FITS files

In [None]:
# Download the lc file
lc_file = Observations.download_products(lc_prod)['Local Path'][0]

#Take a peek at the FITS file we downloaded
fits.info(lc_file)
lc_fits = fits.open(lc_file)

# Adding a line break
print("\n")

# Now download the TP file
tp_file = Observations.download_products(tp_prod)['Local Path'][0]

#Take a peek at the FITS file we downloaded
fits.info(tp_file)
tp_fits = fits.open(tp_file)

We'll hold off on looking at the TPF for the moment. First, let's examine the contents of the LIGHTCURVE extension of our light curve FITS file. We can use this data to create a plot of the light curve.

In [None]:
lc = lc_fits[1].data
lc.columns

We can see that the LIGHTCURVE extension has columns for TIME and two different fluxes: SAP_FLUX and PDCSAP_FLUX. Let's plot both of them and compare. 

### Plot the Light Curve

In [None]:
sapflux = lc['SAP_FLUX'] #SAP flux column
pdcflux = lc['PDCSAP_FLUX'] #PDCSAP flux column
time_lc = lc['TIME'] #time column

fig = plt.figure(figsize = (11,4))

fig.add_subplot(211)
plt.plot(time_lc, sapflux,'.', label = 'SAP', color = "gold")
plt.legend(loc = 'lower left')
plt.ylabel("FLUX (e-/s)")

fig.add_subplot(212)
plt.plot(time_lc, pdcflux,'.', label = 'PDC', color = "red")
plt.legend(loc = 'lower left')
plt.ylabel("FLUX (e-/s)")
plt.xlabel('TIME  (BJD-2457000)')

We notice some intersting things about this data immediately. 

SAP stands for Simple Aperture Photometry. It is the "raw data", so it's just a sum of all the pixels in the target aperture.

The Pre-search Data Conditioning (PDC) light curve removes any long-term trends from the data. It is intended to enhance transiting planet signals and may not be suitable if you're looking for other astrophysical phenomena.

### Creating a Light Curve from the Target Pixel File

Now, we will use the other FITS file we have downloaded to extract the light curve from the pixels. Aftwards, we can compare these two light curves. 

Let's start by examining the contents of our Target Pixel FITS file. 

In [None]:
fits.info(tp_file)

#### Determining the Optimal Aperture from the PIXELS extension
Each pixel in the aperture extension image is an integer that represents a set of binary flags. The entire set of flags and what they mean can be found in the [TESS Archive Manual](https://outerspace.stsci.edu/display/TESS/TESS+Archive+Manual). Of interest to us in this exersize is which pixels belong to the "optimal aperture" used to create this target's light curve.

In [None]:
aperture_image = tp_fits[2].data
aperture = np.bitwise_and(aperture_image, 2) / float(2)
aperture

### Get the Flux and Time for the Optimal Aperture
We can see that there are some values of "1" in our optimal aperture array, these indicate the pixels that we will use to create the target's light curve. Now, to create a time series we need to sum the FLUX values for each of the pixels in the optimal aperture. 

In [None]:
#This is a basic function for getting the aperture photometry
def aperture_phot(image,aperture):
    flux = np.sum(image[aperture==1])
    return flux

In [None]:
#get the data for the PIXELS extension of our fits file
pix = tp_fits[1].data

#Use the map lambda function to sum all the flux in each pixel in our defined aperture
flux = np.array(list (map (lambda x: aperture_phot(x,aperture), pix['FLUX'])))

#get the time from the PIXELS extension of our fits file
time = pix["TIME"]

### Plot the Light Curve


In [None]:
fig = plt.figure(figsize = (11,4))
fig.add_subplot(212)
plt.plot(time, flux,'.', color = "green")
plt.ylabel("FLUX (e-/s)")
plt.xlabel('TIME  (BJD-2457000)')

### Compare Light Curves
Now that we have a SAP_FLUX light curve and and the light curve we calculated with the target pixel files, we can overplot them.

In [None]:
fig = plt.figure(figsize = (11,4))
fig.add_subplot(211)
plt.plot(time_lc, sapflux, ".", color = "gold")
plt.plot(time, flux,'.', color = "green")
plt.ylabel("FLUX (e-/s)")
plt.xlabel('TIME  (BJD-2457000)')

Success! The light curve that we calculated from our Target Pixel file overlaps exactly the SAP_FLUX light curve we got from the Light Curve file.

## Create a Light Curve from an FFI
Lastly, we will explore how to get a light curve by cutting out an FFI (Full Frame Image). This time, we will be using observations of a known exoplanetary system around the star, TOI-778, also known as, HD 115447. 

TOI-778b is a large exoplanet, with a mass of 2.8 and a radius of 1.37 times that of Jupiter, orbiting with a period of 4.63 days at a distance of about 0.06 AU.   

In [None]:
TESS_table = Observations.query_criteria(objectname="HD 115447"
                                         , obs_collection="TESS"
                                         , target_name="TESS FFI")

TESS_table

We can see that there are multiple FFI's for HD 115447 in the TESS database. Great!

### Use Astrocut to get pixel timeseries cutout from TESS FFIs

Now, we have to use the package Astrocut to cut out the pixels in our FFI around our target's coordinates. This will give us am HDUlist. HDUlist objects are the same thing you get back when you download a fits file and then run `astropy.io.fits.open(FITS_file_name)`, which is what we did earlier. 

The format of the data now is exactly the same as a Target Pixel File, so we will perform the same steps as before to plot the light curve. 


In [None]:
# Get the FFI cutout
hdulist = Tesscut.get_cutouts(objectname = "HD 115447"
                              , size=10    # return a 10x10 grid of pixels
                              , sector=10  # get only data from sector 10
                             )

In [None]:
# Look at the file headers
hdulist[0].info()

# Load the first results
hdu1 = hdulist[0]

Now, we will follow the same steps as earlier for using a Target Pixel File to plot a light curve. This will utilize the function we defined earlier `aperture_phot`. However, this time we will be using all the pixels in the cutout instead of finding an Optimal Aperture. To use all the returned pixels, we set our 2D aperture array to be True for all those with a value of 1 in that image. 

In [None]:
# Use all pixels in our aperture
aperture_ffi = hdu1[2].data

# get the data for the PIXELS extension of our fits file
pix_ffi = hdu1[1].data

# Use the map lambda function to sum all the flux in each pixel in our defined aperture
flux_ffi = np.array(list (map (lambda x: aperture_phot(x, aperture_ffi), pix_ffi['FLUX'])))

# Get the time array so we have an x-axis to plot
time_ffi = hdu1[1].data['TIME']

### Plot the Light Curve

As before, we'll create a flux vs. time plot of the data.

In [None]:
# Create the figure/subplot
fig = plt.figure(figsize = (11,6))
fig.add_subplot(212)

# Add the data and axis labels
plt.plot(time_ffi, flux_ffi,'.', color = "dodgerblue")
plt.ylabel("FLUX (e-/s)")
plt.xlabel('TIME  (BJD-2457000)')

Wait! This looks very different from our plots before. It turns out that behind the scenes, our TPFs and LCs were benefitting additional calibration and background subtraction. The two spikes we see are effects of TESS's orbit and stray light from the Earth or the moon.

We could do background subtraction ourselves, but that topic is addressed in later Notebooks (e.g. 09). For now, let's just "zoom in" around the level of flux and see if we can spot the transits.

In [None]:
# Create the same plot as before
fig = plt.figure(figsize = (11,6))
fig.add_subplot(212)
plt.plot(time_ffi, flux_ffi,'.', color = "dodgerblue")
plt.ylabel("FLUX (e-/s)")
plt.xlabel('TIME  (BJD-2457000)')

# Zoom in on the. y-axis
plt.ylim(75000,78000)

Now, it is clear that there are four transits of TOI-778b in this light curve. Despite our sloppy data processing, the transits are easily visible since it is a large planet orbiting close to its host star. These types of "large, near to their host star" exoplanets are known as "hot jupiters". You can learn more about hot jupiters by visiting NASA's [Exoplanet Exploration page](https://exoplanets.nasa.gov/resources/1040/hot-jupiter/).

## More application for timeseries data
We call lightcurves "timeseries" data because it is the process of visualizing the light from an object (or system) in the **time domain**, as opposed to the wavelength domain (which would be called a spectrum) or just in the sky (which would be called an image).

As hinted at in the beginning of this lesson, lightcurves can be used to analyse a wide variety of astronomical systems. Including, but not limited to, transient events such as supernovae or gamma ray bursts, periodic variations such as eclipsing binary stars or radio pulsars, and they can be used to detect aperiodic variations too such as blackholes with bright accretion disks. For more information on the many applications of lightcurves in astronomy, check out both the [basic](https://imagine.gsfc.nasa.gov/science/toolbox/timing1.html) and [advanced](https://imagine.gsfc.nasa.gov/science/toolbox/timing2.html) pages on timing analysis from NASA. 

## Homework
<!-- Geometry puzzle. Assuming a normal distribution of orbital inclinations, what fraction of planets with radius R$_E$ would we observe fully transiting the disk of their sun-like (R = R☉) host stars?  -->

Choose an exoplanet from the TESS [Target Of Interested (TOI)](https://tev.mit.edu/data/collection/193/) table, perform a TESSCut, and plot a lightcurve from the FFI. Can you identify any dips in the lightcurve that correspond to a transit of this planet? 

## Additional Resources
Can't get enough? Here are some links to more information!

Here is the paper which announced TESS's [discovery of the exoplanets around HD 21749](https://iopscience.iop.org/article/10.3847/2041-8213/ab12ed/meta). 

Here is a link to the [SIMBAD page](https://simbad.u-strasbg.fr/simbad/sim-id?Ident=TOI-778&submit=submit+id) on TOI-778.

NASA's [Exoplanet Exploration](https://exoplanets.nasa.gov/resources/1040/hot-jupiter/) page on Hot Jupiters. 

NASA's [basic](https://imagine.gsfc.nasa.gov/science/toolbox/timing1.html) and [advanced](https://imagine.gsfc.nasa.gov/science/toolbox/timing2.html) pages on lightcurves and timing analysis.

Here are a few resources on Transit Geometry and Probability:
- NASA's [About Transits](https://www.nasa.gov/kepler/overview/abouttransits) page
- Astrobites article on [Transit Probability](https://astrobites.org/2013/05/23/transit-probabilities-not-as-simple-as-they-seem/)
- [Textbook chapter](https://ethz.ch/content/dam/ethz/special-interest/phys/particle-physics/quanz-group-dam/documents-old-s-and-p/Courses/ExtrasolarPlanetsFS2016/exop2016_chapter3_part2.pdf) on Transits from ETH Zurich

For more info on TESS data products, visit the [Data Product Overview Page](https://outerspace.stsci.edu/display/TESS/2.0+-+Data+Product+Overview).

## Whats next?
You may have noticed that there is some messy signal in our light curves, next week we will learn what a data quality flag is and how to remove data corresponding to bad data quality flags in order to clean up our plots. Additonally, we will start to look into other types of systems that can be identified using the transit method/light curves. 

## Acknowledgements

This notebook includes data collected with the TESS mission, obtained from the MAST data archive at the Space Telescope Science Institute (STScI). Funding for the TESS mission is provided by the NASA Explorer Program. STScI is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5–26555.

Any published work that uses Astroquery should include a citation which can be found at [this link](https://github.com/astropy/astroquery/blob/main/astroquery/CITATION) or the BibTeX entry is available from the package itself with: `astroquery.__citation__`

### Notebook Information:
Author: Emma Lieb

Last Updated: 07/20/2023