# Habitable Zone Project Notebook



In [None]:
# Imports
import numpy as np # this does math
import pandas as pd # this is pandas, which we're going to use to manage our spreadsheet of exoplanet systems
import matplotlib.pyplot as plt # this lets us plot things


## Exoplanet Data -- Practice with Pandas

The NASA exoplanet archive (https://exoplanetarchive.ipac.caltech.edu/) keep track of all of the known exoplanet systems. We'll start by reading in the data from the exoplanet archive into something called a data frame (basically just a spreadsheet that we can use to organize the planet systems and their properties). 

First, go to the exoplanet archive and click on the 'data' tab and then 'planetary systems'. You should see a large table. Each row is a planetary system, and the columns list some of the properties for each system. The file 'Exo_Archive.csv' is just a downloaded version of the exoplanets archive table. 

To work with the table of exoplanet data, we're going to be using a tool called 'pandas' (https://www.w3schools.com/python/pandas/pandas_intro.asp). The first part of this notebook will give a tutorial on how to use pandas. 

Skills: Using pandas to work with data tables, plotting with matplotlib 

In [None]:
# First, let's read in the data and practice working with a data table
data = pd.read_csv('Exo_Archive.csv', index_col='pl_name')
data #This displays the table. You can just think of it as a regular spreadsheet, with row and column names. 

In [None]:
# Practice retrieving data from the data frame
# The data frame has an index (the planet names) and different columns. 
# The full list of column names and what they mean can be found here: https://exoplanetarchive.ipac.caltech.edu/docs/API_PS_columns.html, but we'll point out some of the most important columns in this tutorial

# We can pull out a single row from the table by specifying an index label (planet name)
# When doing a row index, we have to put ".loc" to let pandas know we're giving it an index label
print(data.loc['51 Peg b'])

In [None]:
# We can also get a column by specifying a column name (no .loc needed). For example, let's get the list of all masses:
print(data['pl_bmasse']) # pl stands for 'planet' and bmasse refers to the 'best mass' estimate in Earth masses

In [None]:
# Finally, we can combine these to retrieve a value from the table by specifying an index and column label. 
# For example, to get the mass for the planet 51 Peg b we can write:
print(data.loc['51 Peg b']['pl_bmasse']) 

In [None]:
# We can also use pandas to quickly search through our data. For example, we can get the largest and smallest values in a column by using .min() and .max()
print(data['pl_bmasse'].max()) #This is the largest planet mass (in Earth masses)
print(data['pl_bmasse'].min()) #This is the smallest planet mass

As we keep working with the data table you'll become more familiar with pandas. For now, let's try plotting some of our planet properties. First, plot the orbital period of each planet ('pl_orbper') against it's radius ('pl_rade'). Remember to label your axes! Also, some of the planet properties will have a really large range of values. If a plot looks funky or all bunched up to one side, try making the axis log scale instead. If you feel comfortable, try out plotting for yourself! Or go to the next cell for the solution code. 

In [None]:
# First, we'll plot the orbital period of each planet ('pl_orbper') against it's radius ('pl_rade')
plt.scatter(data['pl_orbper'], data['pl_rade']) # plot the data
# Label the axes
plt.xlabel('Orbital Period (days)')
plt.ylabel('Planet Mass (Earth masses)')
# Change to log scales (because these numbers will cover a pretty large range)
plt.xscale('log')
plt.yscale('log')

This plot already tells us some interesting things. Fist, where does Earth fall on the plot? How do most exoplanet masses and periods compare to Earth?

In [None]:
# This time, try plotting mass against radius. You can check the list of column names here: https://exoplanetarchive.ipac.caltech.edu/docs/API_PS_columns.html. 

## Defining a function -- Habitable Zone Equation
Go back to the project guide and work through making an equation to calculate the temperature of a planet based on its distance from the host star. Once you have your two equations we're going to set up a function for each equation. 

Skill : Writing functions

In [None]:
# Functions Tutorial
# Functions take in some parameters, use those parameters to do something, and then return the results. The structure for a function looks like this:
#    def function(input):
#        # do some stuff
#        return output
# For example, let's make a function to add numbers. We'll want to input two numbers, add them, and then return the sum
def add(a, b):
    sum = a+b
    return sum

# Now we can use this function whenever we want to add! 
a = 7
b = 14
total = add(a, b)
print(total)

In [None]:
# As a practice, let's make a function to turn orbital periods into the radius of the planet away from the star. 
# a (semi major axis) = ( M * T^2 )^(1/3) ; where M is the mass of the host star in solar masses and T is the period in years
# The data table already gives us periods in years and star masses in solar masses, so we just need to read in those values, do the math in the equation above, and return distance a
# Have a shot at writing out the funtion

def calculate_a(M, T):
    return 

# Now test your function by applying it to the data table!
data['pl_a'] = 


In [None]:
# Let's make your equations into functions. For each, figure out what inputs you need, write out the equation for the temperature or distance to the planet, and then return the quantity 
# It is convenient to output distances from the star in AU, so check the units of your equation and make sure to add a conversion in your function if needed!
# It's good practice to note down the units for the inputs for your functions, you can do that through comments

def T_planet():
    # enter your equation here
    return

def d_planet():

    return 

In [None]:
# Now let's test your functions! We'll use the Sun for this example. The temperature of the sun is 5800 K and the radius is 7*10^10 cm. 
# What temperature would an earth-like planet have? (Earth has a radius of 6400 km and is at 1 AU, remember to convert your units!)
T_earth = 
# The actual temperature of the Earth is aroudn 300 K. How does your answer compare? What might cause any differences between your calculation and the real value? 

# Given the temperature constraints you came up with for the habitable zone, use your calculator to get the inner and outer radii of the habitable zone
d_inner = 
d_outer = 
# How do these distances compare with the solar system planets? 

## Apply your calculator to the Exoplanet Database! 
Let's go back to the data table we were looking at before! We're going to apply your calculator to every entry in the table. To make sure this will work, double check the column names you'll need to use and the units that each of those columns uses. You'll want the units to match those you wrote down in your calculator.  https://exoplanetarchive.ipac.caltech.edu/docs/API_PS_columns.html Ask a TA if you need any help! 

Skill: Applying functions to columns in a data table

In [None]:
# First, let's make two new columns in the data table, one with the inner distance of the habitable zone and one with the outer. Fotr now we'll set both columns to 0
data['HZ_inner'] = 0
data['HZ_outer'] = 0

# While we're at it, let's make one more column that we'll use to mark down any planets within the habitable zone
data['planet_in_HZ'] = 0

In [None]:
# Now we're going to use our function to fill in our two new columns! 
# We need to tell python that we want the column 'HZ_inner' to be equal to the output of your d_planet function, using the appropriate columns as input. Which temperature for the planet do we want to input? 
# Have a stab at writing this down as code, and ask for help if you get stuck! 


# Then do the same thing for the 'HZ_outer' column. Which temperature do we want to use in this case? 




## Check for habitable zone planets using a For loop 
Now we want to go through our table and for each row, check whether the measured distance between the planet and star (semi-major axis) is within our inner and outer habitable zone bounds

There are fancy ways to do this, but for now we're just going to look at each row in turn using a for loop

The structure of a for loop looks like this: 

    for i in [a list]: 
        # do some things
    

So for each value of i, we will do some stuff, move to the next value of i and do more stuff, and so on until our list of i values runs out

Skill : working with for loops 

In [None]:
# In this cell, we want to write down code that will go through each line of the data table and check if the semi-major axis of the planet in that line is greater the our inner HZ bound and smaller than the outer bound
# If it is, set the column 'planet_in_HZ' to 1 for that row
# The structure of the code is outlined below, try to fill in the rest!

for pl in data.index: #this will loop through every planet in the index column of our data table
    if (   ):
        data.loc[pl,'planet_in_HZ'] = 1
    else:
        data.loc[pl,'planet_in_HZ'] = 0

In [None]:
# Let's take a look at the planets you found! 
# First, we'll just collect the planets that were in the calculated habitable zones by selecting all the rows of the data table where 'planet_in_HZ' was set to 1:
data_HZ = data.loc[data['planet_in_HZ']==1]

# Now you can plot the planet properties in the data_HZ table! Take a look at the masses, radii, and periods like we did at the beginning of this notebook. 


## Where should we look for habitable zone planets?
As you've seen, the habitable zone depends on the properties of the host star. We classify stars into different types based on properties like temperature and color (https://www.britannica.com/science/stellar-classification). Using your data table, plot the range of the habitable zone against different stellar properties. To show the inner and outer limits of each star's zone, you can use the function plt.axvline(x, ymin, ymax), where ymin and ymax are the inner and outer radii of the zone and x is the value of the stellar property. Try a few different properties (temperature, radius, mass, luminosity) -- you can find a full list at this link, in the stellar properties table https://exoplanetarchive.ipac.caltech.edu/docs/API_PS_columns.html. How do the habitable zones change with stellar properties? 

Let's think about which habitable zones are the easiest to observe. You can either read about different exoplanet detection methods here, or talk to the exoplanet transit group. Which planets are easiest to detect? Based on this, what types of stars should we be targeting for habitable zone planet searches? 