# NASA Exoplanet Archive Tutorial
## Week 3, Intro-to-Astro 2020

-1. What is the [Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu/)?

In this tutorial, together we will: 

0. See an example of an exoplanet mass-radius diagram using data from the Exoplanet Archive

1. Learn how to **download** data from the Exoplanet Archive ourselves

2. **Visualize** the radius distribution of transiting exoplanets discovered by the *Kepler* space mission

On your own you will

3. Create an **orbital period vs radius** diagram for the transiting exoplanets from *Kepler*. Comment on any features that you see.

4. Create one more plot of **any two parameters** that you'd like. Write a few sentences explaining why you plotted the parameters you chose. Comment on any features that appear.

For your plots in 3. and 4. make sure you **label your axes with units** and **choose useful axis scales**!

#### Updated June 2020, Joey Murphy

In [None]:
# It's always nice to have a cell at the top of your jupyter notebook for housing all of your import statements
# and "magic commands" (lines that start with %), if any.

# For bonus points: add a comment to your future self on why you're importing each module

# File/directory handling
import os 

# Data handling
import pandas as pd
pd.set_option('display.max_columns', None) # Display all of the columns of a DataFrame

# For math
import numpy as np

# Plotting
import matplotlib.pyplot as plt

# "Magic command" to make the plots appear *inline* in the notebook
%matplotlib inline

# 0. Mass-radius diagram example

In [None]:
# Create a variable for the path to the file containing our example's data
data_dir = 'data'
data_fname = 'confirmed_example.csv'
data_path = os.path.join(data_dir, data_fname) # os.path.join creates a valid path out of the directory and filename
print(f'Data will be loaded from: {data_path}')

In [None]:
# Load the data that's stored in the .csv file from the cell above (csv == comma separated values).
# Load this data into a Pandas DataFrame object using the Pandas function read_csv().

# Use comment keyword argument in read_csv() to ignore the file header (more on this in a minute)
example_data = pd.read_csv(data_path, comment='#')

In [None]:
... # What sort of data is contained in this pandas.DataFrame?

In [None]:
# What are the dimensions of this DataFrame?
print(example_data.shape) # prints (# of rows, # of columns)

What do all of these columns mean? Let's checkout the .csv file itself for more information!

...

So we saw that this table was generated from the Exoplanet Archive, and whoever generated it enforced some constraints: the mass and radius (in units of Jupiter mass and radius) must **not** be *null* i.e. we only want to download planets that have both a radius **and** a mass measurement.

We also saw that there were a **ton** of columns that we didn't really mention in that table, and some are a little more useful than others. That's okay, for now let's just load *all* of the data into a Pandas DataFrame and then we can choose which columns we actually want to use.

In [None]:
# Make a quick mass-radius diagram!

# Plot the data with mass (in units of Jupiter masses) on the x-axis and radius 
# (in units of Jupiter radii) on the y-axis.

# '.'   --> use small unconnected dots to plot the data points
# alpha --> governs the transparency of each datapoint: 0 = completely transparent, 1 = completely opaque.
# (alpha = 0.3 will let us see the density of the data points more easily)
plt.plot(example_data['pl_bmassj'], example_data['pl_radj'], '.', alpha=0.3)

# At the end, show the plot
plt.show()

The plot above is alright, but it seems like a lot of data is bunched up at low masses and the large range of the data makes it hard to see the finer structure... let's replot things with **log axes**. Let's also add **axis labels** so we actually know what is being plotted. As a bonus, we'll add some text to the plot to show **how many** exoplanets are being included.

In [None]:
# Plot the data with mass (in units of Jupiter masses) on the x-axis
plt.plot(example_data['pl_bmassj'], example_data['pl_radj'], '.', alpha=0.3)

# Label your axes with units!
plt.xlabel('Mass [M$_J$]', fontsize=14)   # You can use LaTex formatting to make your plots look more professional...
plt.ylabel('Radius [R$_J$]', fontsize=14) # more on LaTex later in the summer! $_J$ creates a subscript J for Jupiter

# Add some text to the plot so we know how many data points there are
n_planets = len(example_data) # Number of planets being plotted

# plt.text(x-coordinate, y-coordinate, text string, **kwargs) # x and y coordinates are in data units
plt.text(1e-3, 1, f'N = {n_planets}', fontsize=14)

# Set the x and y-axis scales to be log so we can see structure more easily
...
...

# At the end, show the plot
plt.show()

# 1. Downloading a sample of exoplanets subject to constraints

Now that we've plotted some of the data from the Exoplanet Archive, how do we download it ourselves? How did I get that complicated-looking .csv file with all of the data in the first place?

Our mission: 

1. Go to the [Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu/) website and navigate to the **Confirmed Planets** table after clicking the **Data** tab at the top of the home page.
2. This time, let's **add columns** so we have **planet mass and radius** in **units of Earth mass and Earth radius**
3. Enter constraints into the query boxes at the top of the columns to get all of the planets **discovered by *Kepler*** via the **transit method**. Make sure to **exclude** any rows that might have a **null radius measurement** for one reason or another.
4. **Download** the table as a **.csv** and **save** it to a **subdirectory** named **"data"**. Be sure to give your .csv file an informative name.

# 2. Visualize the radius distribution of transiting *Kepler* planets

In [None]:
# Like before, let's first create the path to our new .csv file
data_fname = 'confirmed_kepler_transiting.csv'
data_path = os.path.join(data_dir, data_fname)
print(f'Data will be loaded from: {data_path}')

In [None]:
# Load the data into a Pandas DataFrame object
kepler_data = pd.read_csv(data_path, comment='#')

# How many rows and columns do we have this time?
print(...) # Notice that we have 2 more columns than we did before

Any idea why we have so many rows of planets this time than before? 

*Hint: We the confirmed_example.csv file was generated with a constraint on a different planet parameter that we didn't include here*

In [None]:
kepler_data.head()

In [None]:
# Make a histogram of planet's in units of Earth radius
plt.hist(kepler_data[...]) # Plot the histogram

# Label the axes
plt.xlabel('Radius [R$_\oplus$]', fontsize=14)
plt.ylabel('N', fontsize=14)

# Show the plot
plt.show()

Well this histogram isn't very informative... What about using different **bins** for the histogram.

In [None]:
# Create bins that are uniformly-sized in log space
log_bins = np.logspace(np.log10(0.3), np.log10(20), 50)
plt.hist(kepler_data['pl_rade'], bins = log_bins)

# Log scale on the x-axis
plt.xscale('log')

# Label the axes
plt.xlabel('Radius [R$_\oplus$]', fontsize=14) # More LaTex. \oplus gives us the symbol for Earth
plt.ylabel('N', fontsize=14)

# Show the plot
plt.show()

It looks like the distribution of planets has some sort of **bimodality** between about 1 and 2 Earth masses. Let's zoom in on that area by **restricting our x-axis limits**.

In [None]:
plt.hist(kepler_data['pl_rade'], bins = log_bins) # Using the same bins as above

# Let's zoom in on the distribution of planet radii in this region
plt.xlim([1., 5]) # Units of Earth radius

# Same plot housekeeping as above
# -------------------------------
# Log scale on the x-axis
plt.xscale('log')

# Label the axes
plt.xlabel('Radius [R$_\oplus$]', fontsize=14) # More LaTex. \oplus gives us the symbol for Earth
plt.ylabel('N', fontsize=14)

# Show the plot
plt.show()

Are there **any notable features** in this distribution of planets by radius?

# 3. Orbital period vs Radius for transiting *Kepler* planets

Now it's your turn! Make a plot of the transiting planets discovered by *Kepler* with orbital period on the x-axis and planet radius (in units of Earth radius) on the y-axis. 

Be sure to label your axes with units! Refer to the header in the .csv file for more information. 

# 4. Make your own plot of whatever two parameters you want!

Now make a plot visualizing the relationship (or lack thereof) of any two parameters that you want. Go back to the [Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu/) to input constraints (if any) in the Confirmed Table that are relevant to your parameters and download your new data. Spend some time looking at all of the columns available in the Exoplanet Archive data table to see what interests you. (We didn't even get a chance to make use of any of the stellar parameters!)

Make sure you include axis labels and use appropriate axis scales so that you can see features of the data across a wide range of values. Why did you plot what you plotted? Do you notice any interesting features?