# Categorization of exoplanets

---

*Author: Ema Donev, 2022*

In [2]:
# Basic libraries
import pickle
import os
import sys
from tqdm import tqdm
import gc

# Plotting
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib import ticker
import matplotlib.colors as mcolors
from matplotlib.font_manager import FontProperties

# DataFrame analysis
import pandas as pd

# Math libraries
import numpy as np
import scipy as sc
from scipy.stats import norm

In [3]:
# CONFIGURATION
# -------------
sns.set_theme(style='white') # setting the theme for plotting
sys.path.insert(0,'../src/')
np.random.seed(42)

# configuring plotting colors
clr = ['#465BBB', '#3F8FCE', '#7ABBCE', '#3A3865', '#A82F43', '#612A37', '#DC5433', '#F29457']
b1 = '#465BBB'
b2 = '#3F8FCE'
b3 = '#7ABBCE'
b4 = '#3A3865'
black1 = '#22212A'
black2 = '#2D1E21'
or1 = '#A82F43'
or2 = '#612A37'
or3 = '#DC5433'
or4 = '#F29457'
clrs = sns.set_palette(sns.color_palette(clr))

# configuring fonts for plotting
font = FontProperties()
font.set_family('serif')
font.set_name('Andale Mono')
font.set_style('normal')

%matplotlib inline 

In [4]:
exo_data = pd.read_csv('../DATA/exoplanets.csv')

## Types of exoplanets
---

### Gas giants

Gas giants are planets similar to Jupiter and Saturn. They have a mass between 60 and 10,000 times that of Earth. Their radius ranges from 10 to 44 times Earth's radius. They are composed of gases like hydrogen and helium and have low densities. Gas giants are the easiest to detect because they are massive and large in size.

### Hot Jupiters/Hot Neptunes

Hot Jupiters are a subtype of gas giants. They are called "hot" Jupiters because they are located very close to their stars. The orbital period of a hot Jupiter is less than 10 days. Hot Jupiters disrupted scientists' models of planetary system formation, as it was not known that a gas giant could be so close to its star. Today's theories suggest that hot Jupiters originally formed farther away from their stars, where there is plenty of stellar dust and hydrogen and helium. Once the planet formed, it began migrating toward the star, where it ultimately settled. Unfortunately, in some hot Jupiters, the temperature is so high that their atmosphere slowly evaporates. Hot Neptunes are just like Hot Jupiters, just smaller in size and, well, Neptune-like.

### Neptune-likes

Neptune-like planets are, as the name suggests, similar to Neptune. Their mass ranges from 6 to 60 times that of Earth, and their radius ranges from 2.5 to 10 times Earth's radius. The dense atmospheres of Neptune-like planets are most likely composed of hydrogen, helium, and a little methane. Their density is similar to that of gas giants, as they are made of the same materials.

### Super-Earths

Super-Earths are larger than Earth, as the name suggests. However, the name does not indicate the composition of the exoplanet. The mass of a super-Earth ranges from 1 to 6 times Earth's mass, and the radius ranges from 1.6 to 2.5 times Earth's radius. Super-Earths can have different compositions and may or may not have an atmosphere. All characteristics of super-Earths depend on the density of the exoplanet. Super-Earths are of great interest to us because they could potentially support life.

### Terrestrial planets

Terrestrial planets are of particular interest to astrobiologists because they are the most suitable for life. The mass of these planets ranges from 0.05 to 1 times Earth's mass, and their radius ranges from 0.1 to 1.6 times Earth's radius. Terrestrial planets are usually composed of rock and metal and may or may not have an atmosphere. These planets are the most similar in composition and mass to Earth and are very important in the search for life. Additionally, they are very small and therefore difficult to detect using exoplanet detection methods.

---

## Classification via mass/radius

In [5]:
for row in tqdm(range(exo_data.shape[0])):
    r = exo_data.loc[row,'pl_rade']      # access radius
    m = exo_data.loc[row,'pl_bmasse']    # access mass

    # ==== IF THE PLANET HAS A RADIUS =====
    if (np.isnan(m) and not np.isnan(r)):
        if (r < 1.6):
            exo_data.loc[row, 'exo_class'] = "Terrestrial"
        elif (r >= 1.6) and (r < 2.5):
            exo_data.loc[row, 'exo_class'] = "Super-Earth"
        elif (r >= 2.5) and (r < 10):
            exo_data.loc[row, 'exo_class'] = "Neptune-like"
        elif (r >= 10):
            exo_data.loc[row, 'exo_class'] = "Gas giant"
    # ==== IMA MASU =====
    elif (np.isnan(m) == False):
        if (m < 1):
            exo_data.loc[row, 'exo_class'] = "Terrestrial"
        elif (m >= 1) and (m < 6):
            exo_data.loc[row, 'exo_class'] = "Super-Earth"
        elif (m >= 6) and (m < 60):
            exo_data.loc[row, 'exo_class'] = "Neptune-like"
        elif (m >= 60):
            exo_data.loc[row, 'exo_class'] = "Gas giant"

del m,r,row
gc.collect()

100%|██████████| 4940/4940 [00:00<00:00, 27816.08it/s]


0

In [6]:
exo_data.exo_class.value_counts()

Super-Earth     1707
Neptune-like    1614
Gas giant       1436
Terrestrial      178
Name: exo_class, dtype: int64

> For values with NaN, I fill them with 0s instead, so it is easier to draw graphs later.

In [7]:
print(exo_data.exo_class.isnull().sum())
exo_data['exo_class'] = exo_data.exo_class.fillna(np.nan)

5


## Classification via distance

In [8]:
exo_data['exo_class_ext'] = np.nan # fill a collumn with NaN values
for row in tqdm(range(exo_data.shape[0])):
    p = exo_data.loc[row,'pl_orbper']      # access orbital period
    c = exo_data.loc[row,'exo_class'] # access exoplanet class
    if (p > 0) and (p < 4) and (c == "Neptune-like"): # if the period is within range and the planet is neptune-like
        exo_data.loc[row, 'exo_class_ext'] = "Hot Neptune" # assign a new value
    elif (p > 0) and (p < 10) and (c == "Gas giant"): # if the period is within range and the planet is jupiter-like
        exo_data.loc[row, 'exo_class_ext'] = "Hot Jupiter"
del p,c,row
gc.collect()

100%|██████████| 4940/4940 [00:00<00:00, 62717.33it/s]


0

In [9]:
test = pd.crosstab(exo_data['exo_class'],exo_data['exo_class_ext'])
test.head()

exo_class_ext,Hot Jupiter,Hot Neptune
exo_class,Unnamed: 1_level_1,Unnamed: 2_level_1
Gas giant,572,0
Neptune-like,0,133


In [10]:
exo_data['exo_class_ext'] = exo_data['exo_class_ext'].fillna(exo_data['exo_class'])
exo_data.head()

Unnamed: 0,pl_name,hostname,sy_snum,sy_pnum,pl_orbper,pl_orbsmax,pl_rade,pl_bmasse,pl_dens,pl_orbeccen,st_teff,st_mass,st_met,sy_dist,exo_class,exo_class_ext
0,11 Com b,11 Com,2,1,326.03,1.29,12.1,6165.6,19.1,0.231,4742.0,2.7,-0.35,93.1846,Gas giant,Gas giant
1,11 UMi b,11 UMi,1,1,516.21997,1.53,12.3,4684.8142,13.8,0.08,4213.0,2.78,-0.02,125.321,Gas giant,Gas giant
2,14 And b,14 And,1,1,185.84,0.83,12.9,1525.5,3.9,0.0,4813.0,2.2,-0.24,75.4392,Gas giant,Gas giant
3,14 Her b,14 Her,1,2,1773.40002,2.93,12.9,1481.0878,3.79,0.37,5338.0,0.9,0.41,17.9323,Gas giant,Gas giant
4,16 Cyg B b,16 Cyg B,3,1,798.5,1.66,13.5,565.7374,1.26,0.68,5750.0,1.08,0.06,21.1397,Gas giant,Gas giant


We have now succesfully classified all exoplanets via their mass/radius and their distance into 6 types.

In [11]:
exo_data.to_csv("../DATA/exoplanets_categorized.csv", index=False)

## Statistics of categorization

In this small section we analyze the categorization, by looking at the distribution of types across planetary systems, general properties, etc.

---

In [12]:
pl_num = pd.crosstab(exo_data['exo_class_ext'],exo_data['sy_pnum'])
pd.DataFrame(pl_num)
pl_num.head(10)

sy_pnum,1,2,3,4,5,6,7,8
exo_class_ext,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Gas giant,572,211,44,25,6,4,0,2
Hot Jupiter,526,33,9,2,2,0,0,0
Hot Neptune,85,22,14,8,1,3,0,0
Neptune-like,732,367,222,84,49,24,0,3
Super-Earth,886,396,224,117,60,17,4,3
Terrestrial,93,39,24,12,7,0,3,0


We can see the composition of systems with 1 to 8 planets. The table shows only one system with 8 planets, which means that our solar system is unique and not similar to other systems in the universe as we know them so far. This system with 8 planets is very diverse, containing three types of exoplanets: 2 gas giants, 3 Neptune-like planets, and 3 Super-Earths. Additionally, we have only one system with 7 planets, which contains only 4 Super-Earths and 3 terrestrial planets.

From the table, we can see that there are the most systems with just one exoplanet, typically a Super-Earth as the sole planet. This indicates that the formation of multiple planets, or more than one planet, is difficult, and it is easier for a system to have just one planet. In general, planetary systems are diverse, with different combinations of planets. Systems with only one planet are probably quite similar, but it is not possible to confirm this without knowing all the other parameters of these exoplanets.

In [13]:
st_num = pd.crosstab(exo_data['exo_class_ext'],exo_data['sy_snum'])
pd.DataFrame(st_num)
st_num.head(10)

sy_snum,1,2,3,4
exo_class_ext,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Gas giant,704,136,22,2
Hot Jupiter,463,94,15,0
Hot Neptune,124,9,0,0
Neptune-like,1395,83,3,0
Super-Earth,1633,65,9,0
Terrestrial,161,12,5,0


We can see that there are systems with 4 stars, and that system contains 2 gas giants. This is expected, as it is very difficult to form planets around 2 stars. The most common systems are those with 1 star, where planet formation is relatively simple. We can observe that as systems become more complex, with a greater number of stars, there are fewer terrestrial planets and super-Earths, and more gas giants.

In [14]:
medijan_mase = exo_data.groupby('exo_class_ext')['pl_bmasse'].agg('median').to_dict()
medijan_gustoce = exo_data.groupby('exo_class_ext')['pl_dens'].agg('median').to_dict()
medijan_radijusa = exo_data.groupby('exo_class_ext')['pl_rade'].agg('median').to_dict()
medijan_eks = exo_data.groupby('exo_class_ext')['pl_orbeccen'].agg('median').to_dict()

summary = pd.DataFrame(index=exo_data['exo_class_ext'].unique())
summary['Masa'] = summary.index.map(medijan_mase)
summary['Radijus'] = summary.index.map(medijan_radijusa)
summary['Gustoca'] = summary.index.map(medijan_gustoce)
summary['Eksentricitet'] = summary.index.map(medijan_eks)
summary

Unnamed: 0,Masa,Radijus,Gustoca,Eksentricitet
Gas giant,723.92175,13.1,1.78,0.1825
Hot Jupiter,299.07803,13.6,0.665,0.0
Neptune-like,9.6,2.93,1.95,0.0
Hot Neptune,8.91,2.83,2.09,0.0
Super-Earth,3.29,1.62,4.18,0.0
Terrestrial,0.566,0.86,4.89,0.0
,,,,


This table shows the medians for mass, radius, density, and eccentricity across all categories of exoplanets. Why use the median instead of the average? Because some exoplanets in the table have very large values that are not precise. The average would take these larger values into account, and the result would be different if these large values were excluded from the calculation. The median is different from the average in that it represents the middle number in a list of sorted numbers, so large values do not have a significant impact on the final calculation.

We can see that gas giants, the largest planets, have the highest eccentricities, while all others have more circular orbits. We can also observe that the most common density in gas giants reflects a composition of hydrogen and helium, while super-Earths and terrestrial planets have a density typical of rocky materials. The masses and radii are all proportional to the predicted sizes for each type of exoplanet, indicating that the categorization is accurate, though with some degree of volatility.

In [15]:
def annotate_countplot(sp, df: pd.DataFrame(), perc_height:float, font_size:int=10):
    for p in sp.patches:
        height = p.get_height()
    
        sp.text(p.get_x() + p.get_width()/2., 
                height + len(df) * perc_height, height,
                ha = 'center', fontsize = font_size)

---

# Next...

The next step is to classify stars in every system into its respective spectral type, and finally form planetary systems!