# Exoplanets
Written by Blythe Davis<br>
Created: July 23, 2021
Updated: July 23, 2021

## Exoplanet Discovery
For the past 25+ years, NASA has used ground- and space-based methods to [identify exoplanets](https://exoplanets.nasa.gov/exep/about/missions-instruments) (planets outside of our solar system). In the past ten years in particular, campaigns like Kepler, K2, and TESS have produced an explosion of results. To date, approximately 4,400 exoplanets have been identified, and over 3,000 potential exoplanet candidates have been discovered. In this notebook, we will use Holoviews and Panel together with Astropy to make a dashboard visualizing the discovery of confirmed and candidate exoplanets over the years. We'll also include a scatterplot in our dashboard that reveals details about the relationship between mass and radius of exoplanets, as well as controls to filter the data based on whether the planets could support life, and if so, if chemical rockets could be used to escape the planet.

In [None]:
import pandas as pd
import holoviews as hv
import panel as pn
from colorcet import fire
import hvplot.pandas # noqa

pn.extension()
hv.extension('bokeh',width=100)

## Loading data
For this notebook, we will be loading our exoplanet data from three different CSV files: [stars](data/stars.csv), a [dataset of 257,000 stars](https://www.kaggle.com/solorzano/257k-gaia-dr2-stars?select=257k-gaiadr2-sources-with-photometry.csv) identified by the European Gaia space mission; [exoplanets](data/exoplanets.csv), a collection of 480 exoplanets obtained from the [NASA Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu/); and [candidates](data/candidates.csv), a collection of approximately 3,000 candidate exoplanets collated from the [Kepler](http://exoplanets.org/table?datasets=kepler) and [TESS](https://exofop.ipac.caltech.edu/tess/view_toi.php) campaigns. Since these datasets are fairly large, we'll cache the data using Panel:

In [None]:
stars = pn.state.as_cached('stars', lambda: pd.read_csv("data/stars.csv"))
stars.head()

In [None]:
exoplanets = pn.state.as_cached('exoplanets', lambda: pd.read_csv("data/exoplanets.csv"))
exoplanets.head()

In [None]:
candidates = pn.state.as_cached('candidates', lambda: pd.read_csv("data/candidates.csv"))
candidates.head()

## Converting coordinates
Because our goal is to generate a map of the exoplanets and stars, we need a standardized coordinate system for all three of our dataframes. Here, we'll use the [Astropy](https://www.astropy.org/) package to perform coordinate transformations. The original datasets use an equatorial coordinate system, given by ``ra`` (right acension) and ``dec`` (declination), but the specific notation varies among the datasets, and equatorial coordinates are less commonly used to visualize space. We will convert to galactic coordinates, a spherical coordinate system centered at the sun. Points in the galactic coordinate system are represented by two values: longitude (abbreviated "l") and latitude (abbreviated "b").

Using the Astropy ``SkyCoord`` function, we define two functions, ``eqtogalL`` and ``eqtogalB``, which convert equatorial coordinates to galactic coordinates. ``eqtogalL`` takes (``ra``, ``dec``) as an argument and produces the longitude ``l``, while ``eqtogalB`` takes (``ra``, ``dec``) and produces the latitude ``b``.

If we were to use the entire ``stars`` dataframe, this program would run very slowly. Thus, we consider only the brightest stars (those with the G magnitude, abbreviated "phot_g_mean_mag," over 11), and of those, we sample 10% using ``.sample(frac=0.1)``.

After converting, we create new columns in each of our dataframes for latitude and longitude then cache the results. Note that the size of the datasets means that converting to galactic coordinates can take up to a minute or two. To speed up the process, you can sample a fraction of any of the dataframes.

In [None]:
from astropy import units as u
from astropy.coordinates import SkyCoord

def eqtogalL(df):
    "Convert right acension and declination to longitude"
    ret = SkyCoord(ra=float(df.ra)*u.degree,dec=float(df.dec)*u.degree,frame='icrs').galactic
    return float(ret.to_string("decimal").split( )[0])

def eqtogalB(df):
    "Convert right acension and declination to latitude"
    ret = SkyCoord(ra=float(df.ra)*u.degree,dec=float(df.dec)*u.degree,frame='icrs').galactic
    return float(ret.to_string("decimal").split( )[1])

In [None]:
stars = stars[stars["phot_g_mean_mag"]>11].sample(frac=0.1)

In [None]:
stars['l'] = pn.state.as_cached('stars_l', lambda: stars.apply(eqtogalL, axis=1))
stars['b'] = pn.state.as_cached('stars_b', lambda: stars.apply(eqtogalB, axis=1))
stars.reset_index(inplace=True, drop=True)
stars.head()

In [None]:
exoplanets['l'] = pn.state.as_cached('exoplanets_l', lambda: exoplanets.apply(eqtogalL, axis=1))
exoplanets['b'] = pn.state.as_cached('exoplanets_b', lambda: exoplanets.apply(eqtogalB, axis=1))
exoplanets.head()

In [None]:
candidates['l'] = pn.state.as_cached('candidates_l', lambda: candidates.apply(eqtogalL, axis=1))
candidates['b'] = pn.state.as_cached('candidates_b', lambda: candidates.apply(eqtogalB, axis=1))
candidates.reset_index(inplace=True, drop=True)
candidates.head()

## The Goldilocks zone and the Tsiolkovsky rocket equation
One of the methods used to determine which exoplanets could potentially support life is to check whether liquid water could exist there. For water to be present on the planet as liquid, the planet's temperature must be within a fairly narrow range, and therefore the planet must be within a certain distance of the nearest star. Exoplanets within this range are said to be in the "Goldilocks zone."

If intelligent life were to exist on one of these planets, would it be capable of space travel? If the hypothetical life forms used similar methods to humans — for example, hydrogen- and oxygen-powered chemical rockets — would they even be able to leave their planet? A heavier rocket requires exponentially more fuel, but more fuel means more mass. The Tsiolkovsky rocket equation makes this question more precise:

$$\Delta v = v_e\ln\left(\frac{m_0}{m_f}\right),$$

where $\Delta v$ is the [impulse per mass unit](https://en.wikipedia.org/wiki/Impulse_(physics)) required for the rocket to travel its course, $v_e$ is [effective exhaust velocity](https://en.wikipedia.org/wiki/Specific_impulse#Specific_impulse_as_effective_exhaust_velocity), $m_0$ is the initial mass of the rocket, and $m_f$ is the final mass of the rocket (here, equal to $m_0$ minus the mass of the fuel spent on the flight). To see the rocket equation in action, consider a planet of the same density as Earth with radius $R$ double Earth's and thus mass $M$ eight times Earth's. 
<p></p>

<details><summary>Computation details</summary>
    
For the purposes of this example, we'll assume that $$\Delta v = \sqrt{\frac{GM}{R}},$$ where $G\approx 6.67\cdot 10^{-11}$ (in reality, some complicating factors exist, but our formula works as an approximation at relatively low altitudes$^*$). Then

$$\Delta v = \sqrt{\frac{6.67\cdot 10^{-11}\cdot 4.78\cdot10^{25}}{1.27\cdot10^7}}\approx 22407 \frac{\text{m}}{\text{s}}.$$

Using the [highest recorded exhaust velocity of a chemical rocket](https://en.wikipedia.org/wiki/Tripropellant_rocket#:~:text=In%20the%201960s%2C%20Rocketdyne%20fired,for%20a%20chemical%20rocket%20motor.), $5320\frac{\text{m}}{\text{s}},$ and we'll calculate the approximate percent of the rocket's mass that would have to be fuel in order to propel the rocket to $250$ m$^*$:

$$22407= 5320 \ln\left(\frac{m_0}{m_f}\right),$$

so

$$\frac{m_0}{m_f}\approx 67.5.$$

$^*$We won't go into detail here, but the $\Delta v$ calulation for $250$ m is derived from the [vis-viva equation](https://en.wikipedia.org/wiki/Vis-viva_equation).</details>

To make it to $250$ m above this planet's surface, about $98.5\%$ of the rocket's initial mass would need to be fuel. For comparison, the rocket with the highest initial-to-final mass ratio ever built was the [Soyuz-FG](https://en.wikipedia.org/wiki/Soyuz-FG) rocket, which was $91\%$ fuel by mass. Moreover, we were very generous with the conditions used to compute the mass ratio to escape our imaginary planet. The exhaust velocity we used was only ever recorded for a highly corrosive, dangerous, expensive propellant that, with the current state of technology, is not feasible for use in space travel.



## Filtering by feasibility of space travel

We can use the rocket equation to get an idea of which exoplanets might be the right size to allow for space travel. Let's assume that the hypothetical life forms on an exoplanet can make a chemical rocket with exhaust velocity at most $5320\frac{\text{m}}{\text{s}}.$ Let's also say that they've figured out how to make rockets that are up to $95\%$ fuel by mass (so $\frac{m_0}{m_f}=20$). These two assumptions will allow us to make an educated guess of whether the mass and radius of the exoplanet would allow for space travel with these rockets:

$$\sqrt{\frac{GM}{R}}\approx \Delta v \leq 5320\ln{20}.$$

We can now define a function  ``deltav`` that approximates $\Delta v$ for each exoplanet and returns ``True`` or ``False`` depending on whether that value is small enough. We'll then add a corresponding column ``escapable`` in our dataframe and cache it.

In [None]:
import math

def deltav(exoplanets):
    m = exoplanets.mass
    r = exoplanets.radius
    h = exoplanets.habitable
    "Determine whether delta-v is sufficiently small for feasible space travel with chemical rockets"
    G = 6.67*(10**(-11))
    if math.sqrt(G*m/r)<=5320*math.log(20) and h == True:
        return True
    else:
        return False
exoplanets['escapable'] = pn.state.as_cached('escapable', lambda: exoplanets.apply(deltav, axis=1))
exoplanets.head()

## Defining widgets

We will use Panel to define widgets for our dashboard: a slider representing discovery year, a checkbox determining whether to show unconfirmed exoplanets, a second checkbox determining whether to display only planets in the potentially habitable zone, and two dropdown menus to determine what the size and color of the points on the plot will represent.

In [None]:
year_range = pn.widgets.RangeSlider(name='Discovery year range', start=1996, end=2021)
checkbox_candidates = pn.widgets.Checkbox(name='Show uncomfirmed exoplanets')
checkbox_habitable = pn.widgets.Checkbox(name='Show only planets in potentially habitable zone')
checkbox_escapable = pn.widgets.Checkbox(name='Mark planets in habitable zone that could be escaped with a chemical rocket')
select_size = pn.widgets.Select(name='Size points by:', options={"Earth radius":"radius", "Earth mass":"mass",
                                                                 "Temperature": "temperature"})
select_color = pn.widgets.Select(name='Color points by:', options={"Earth radius":"radius", "Earth mass":"mass",
                                                                   "Temperature": "temperature"})
widgets = [year_range, checkbox_candidates, checkbox_habitable, checkbox_escapable, select_size, select_color]

We'll also create a point representing the sun to orient users. Technically, it should be right at the origin, $(0,0),$ but only half the circle would be visible on our plot in that case, so we'll move it up to $(10,0).$

In [None]:
d = {'l':[10],'b':[0]}
origin = pd.DataFrame(data=d)

## Filtering and plotting points
To generate our plot, we'll need a function ``gal_map`` that takes the values of our widgets as input, uses them to filter the data, and outputs a plot of the relative positions of the exoplanets (and candidates, depending on whether the corresponding checkbox is selected) with the data points from ``stars`` as the background and a yellow point ``sun`` at (0,0) representing the sun. To simplify, we will also define ``filter_year``, ``filter_candidates``, ``filter_habitable``, and ``show_escapable`` to filter the data. For ``stars``, we'll include ``datashade=True`` to use [Datashader](https://datashader.org/), which helps more accurately visualize large datasets like the ``stars`` dataframe.

Note that when "mass" or "temperature" is selected to deterimine the size of the points, we scale the points to 1% of the actual value using ``size_scale``; this way, planets with large masses or high temperatures do not overwhelm the plot but the relative sizes of the points retain their meaning. Also, since complete data are not available for all planets, those missing the selected variable will be colored grey.

In [None]:
def filter_year(e, year_range):
    exo_lower = e.disc_year>=year_range[0]
    exo_upper = e.disc_year<=year_range[1]
    exo_filter = exo_lower & exo_upper
    return e[exo_filter].drop_duplicates()

In [None]:
def filter_candidates(c, year_range):
    can_lower = c.year>=year_range[0]
    can_upper = c.year<=year_range[1]
    can_mask = can_lower & can_upper
    return c[can_mask]

In [None]:
def filter_habitable(e):
    hab = e.habitable == True
    exo_filter = hab
    return e[exo_filter].drop_duplicates()

In [None]:
def show_escapable(e):
    return e[e['escapable']==True]

In [None]:
opts = {"x":"b","y":"l","xlabel":"longitude (deg)","ylabel":"latitude (deg)"}

In [None]:
def gal_map(year_range, cands, hab, esc, size, color):
    filtered_exoplanets = filter_year(exoplanets,year_range)
    if hab:
        filtered_exoplanets = filter_habitable(filtered_exoplanets)
   
    star_background = (stars.hvplot.scatter(**opts,datashade=True,
                                               color="phot_g_mean_mag",cmap=fire,
                                               colorbar=True))
    overlay_points = (filtered_exoplanets.hvplot.scatter(**opts,
                                               color=color, size=size,
                                               clabel=color).opts(cmap='blues'))

    size_scale = 1 if size == "radius" else 0.005
    overlay_points.opts(size = size_scale*hv.dim(size))
    sun = origin.hvplot.scatter(**opts,size=60,color="yellow")
   
    layers = [star_background, overlay_points,sun]
    if cands:
        filtered_candidates = filter_candidates(candidates, year_range)
        candidate_points = (filtered_candidates.hvplot.scatter(x='b',y='l',
                                               size=30,color="green",alpha=0.5))
        layers.append(candidate_points)
    
    if esc:
        escapable = show_escapable(filtered_exoplanets)
        escapable_points = (escapable.hvplot.scatter(**opts,color="red",clabel=color)
                                            .opts(cmap='blues',size=12,alpha=0.5))
        layers.append(escapable_points)
    return hv.Overlay(layers).collate().opts(bgcolor="black",title='Map of exoplanets')

In [None]:
bound_gm = pn.bind(gal_map,*widgets)

pn.Column(pn.panel(bound_gm))    

We'll also include a scatterplot of two variables chosen by the user, with points colored according to habitability. First, we'll define a function ``radius_mass`` that outputs the scatterplot with the chosen variables on their respective axes.

In [None]:
def radius_mass(x_axis,y_axis):
    habitable = exoplanets[exoplanets['habitable']==True].dropna(subset=[x_axis, y_axis])
    uninhabitable = exoplanets[exoplanets['habitable']==False].dropna(subset=[x_axis, y_axis])
    habitable_points = habitable.hvplot.scatter(x=x_axis,y=y_axis,color="red",
                                                label="Potentially habitable",size=30).opts(legend_position='top_right')
    uninhabitable_points = uninhabitable.hvplot.scatter(x=x_axis,y=y_axis,
                                                        color="blue",alpha=0.5,
                                                        label="Uninhabitable",size=10).opts(legend_position='top_right')
    return uninhabitable_points*habitable_points.opts(title=f'Scatterplot of {x_axis} and {y_axis}')

Next, we'll define two dropdown menus to choose the axis variables.

In [None]:
x_axis = pn.widgets.Select(name='x-axis:', options={"Earth radius":"radius", "Earth mass":"mass",
                                                                   "Temperature": "temperature"})
y_axis = pn.widgets.Select(name='y-axis:', options={"Earth mass":"mass","Earth radius":"radius",
                                                                   "Temperature": "temperature"})

We can use ``pn.bind`` to bind the user's axis selections to the plot output.

In [None]:
bound_rm = pn.bind(radius_mass,x_axis=x_axis,y_axis=y_axis)
pn.Column(bound_rm)

In the plot above, we can see that most planets have a mass under 2000 times Earth's and radii under 20 times Earth's, but there are a few outliers. The exoplanet with the largest mass in our dataset is the gas giant [HR 2562-b](https://exoplanets.nasa.gov/exoplanet-catalog/7229/hr-2562-b/), whose mass is over 9000 times Earth's, but whose radius is only about 12 times Earth's.

[GQ Lupi b](https://exoplanets.nasa.gov/exoplanet-catalog/7029/gq-lupi-b/) has the largest radius compared to Earth's, dwarfing our planet by a factor of 33, while its mass is about 6000 times Earth's. On the other side of the scale, the smallest planet in radius is [Kepler-62 c](https://exoplanets.nasa.gov/exoplanet-catalog/373/kepler-62-c/), whose radius is about half of Earth's, but whose mass is quadruple Earth's.

The hottest planet in our dataset is the gas giant [KELT-9b](https://exoplanets.nasa.gov/exoplanet-catalog/3508/kelt-9-b/), whose surface temperature is 4050 Kelvin, and the coldest is [OGLE-2005-BLG-390L b](https://exoplanets.nasa.gov/exoplanet-catalog/6081/ogle-2005-blg-390l-b/), at only 50 Kelvin. Temperature and radius appear to have a positive correlation.

In terms of habitability, all the potentially habitable exoplanets have radius less than five times Earth's, mass less than forty times Earth's, and surface temperature between about 200 and 400 Kelvin. For comparison, Earth's surface temperature is 288 Kelvin.

## Putting it all together
Finally, we create a panel from our widgets and plots to display the final dashboard.

In [None]:
filtered_view = pn.Row(
    pn.Column(*widgets,pn.Column(bound_gm, width=800),
                pn.Row(pn.Column(x_axis,y_axis),
                pn.Column(bound_rm,width=400))))

filtered_view.servable()