# Exoplanet Discovery Method Analysis

> Author: James Holmes<br>Version: 1.0.0

Exploring how discovery methods of exoplanets influence what attributes and descriptors can be found of a planet.

This notebook utilizes a dataset of discovered exoplanets, provided by the [NASA Exoplanet Archive](https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=PS). Looking at how a given exoplanet is found, what attributes does that particular discovery method reveal and is it consistent across other planets of the same discovery method?

The dataset is first cleaned to give raw rows and columns to be put into a data frame for easier data manipulation to find out how discovery methods influence the traits found about a planet. With much of the data of a planets attributes being missing, if enough are consistently missing based on their discovery method, it could be likely that it is an attribute that can't be found about them. The same could be proposed for the contrary and attributes that were consistently found.


In [165]:
from timeit import timeit

## Cleaning Dataset
---
### Preparing CSV File to Read Dataset
importing csv file name and dedicating a string to be used to open it.

In [166]:
from constants import EXOPLANET_DATASET

# INFO: Open file to create a mapping of the column names to their descriptions

# INFO: declaring key_map as a dictionary to store the column names and their descriptions
key_map: dict[str, str] = {}
# INFO: declaring disc_flag_map as a dictionary to store the column names and their descriptions
disc_flag_map: dict[str, str] = {}
# INFO: declaring rows_to_skip as an integer to store the number of rows to skip
rows_to_skip: int = 0

with open(EXOPLANET_DATASET, 'r') as f:
    while (line := f.readline()).startswith('#'):
        rows_to_skip += 1
        if line.startswith('# COLUMN'):
            key, name = line.split(':')
            key_map[key.strip()[9:]] = name.strip()
            if 'Detected' in name:
                disc_flag_map[key.strip()[9:]] = name.strip()    
                
disc_flag_map
        

{'rv_flag': 'Detected by Radial Velocity Variations',
 'pul_flag': 'Detected by Pulsar Timing Variations',
 'ptv_flag': 'Detected by Pulsation Timing Variations',
 'tran_flag': 'Detected by Transits',
 'ast_flag': 'Detected by Astrometric Variations',
 'obm_flag': 'Detected by Orbital Brightness Modulations',
 'micro_flag': 'Detected by Microlensing',
 'etv_flag': 'Detected by Eclipse Timing Variations',
 'ima_flag': 'Detected by Imaging',
 'dkin_flag': 'Detected by Disk Kinematics'}

### Generate Data Frame
Making data frame containing all the exoplanets.

In [167]:
import pandas as pd

pl_df: pd.DataFrame = pd.read_csv(EXOPLANET_DATASET, skiprows=rows_to_skip, low_memory=False)

pl_df

Unnamed: 0,rowid,pl_name,hostname,pl_letter,hd_name,hip_name,tic_id,gaia_id,default_flag,sy_snum,...,sy_kepmagerr2,rowupdate,pl_pubdate,releasedate,pl_nnotes,st_nphot,st_nrvc,st_nspec,pl_nespec,pl_ntranspec
0,1,11 Com b,11 Com,b,HD 107383,HIP 60202,TIC 72437047,Gaia DR2 3946945413106333696,0,2,...,,2014-07-23,2011-08,2014-07-23,2,1,2,0,0,0
1,2,11 Com b,11 Com,b,HD 107383,HIP 60202,TIC 72437047,Gaia DR2 3946945413106333696,0,2,...,,2014-05-14,2008-01,2014-05-14,2,1,2,0,0,0
2,3,11 Com b,11 Com,b,HD 107383,HIP 60202,TIC 72437047,Gaia DR2 3946945413106333696,1,2,...,,2023-09-19,2023-08,2023-09-19,2,1,2,0,0,0
3,4,11 UMi b,11 UMi,b,HD 136726,HIP 74793,TIC 230061010,Gaia DR2 1696798367260229376,1,1,...,,2018-09-04,2017-03,2018-09-06,0,1,1,0,0,0
4,5,11 UMi b,11 UMi,b,HD 136726,HIP 74793,TIC 230061010,Gaia DR2 1696798367260229376,0,1,...,,2018-04-25,2011-08,2014-07-23,0,1,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35195,35196,ups And d,ups And,d,HD 9826,HIP 7513,TIC 189576919,Gaia DR2 348020448377061376,0,2,...,,2014-08-21,2004-01,2014-08-21,5,1,10,1,0,0
35196,35197,ups Leo b,ups Leo,b,,,TIC 49430557,Gaia DR2 3794167001116433152,1,1,...,,2022-01-10,2021-12,2022-01-10,0,0,0,0,0,0
35197,35198,xi Aql b,xi Aql,b,HD 188310,HIP 97938,TIC 375464367,Gaia DR2 4298361114750843904,0,1,...,,2014-07-23,2011-08,2014-07-23,1,1,1,0,0,0
35198,35199,xi Aql b,xi Aql,b,HD 188310,HIP 97938,TIC 375464367,Gaia DR2 4298361114750843904,0,1,...,,2014-05-14,2008-06,2014-05-14,1,1,1,0,0,0


### Tweaking Data
Dropping row ID column, removing none number values, and dropping columns with extra identifiers.

In [168]:
pl_df = pl_df.dropna(axis=1, how='all')
columns_to_drop = [
    'rowid',
    'hd_name',
    'hip_name',
    'tic_id',
    'gaia_id',
    'default_flag',
    'disc_refname',
    'disc_pubdate',
    'disc_locale',
    'disc_facility'
]
pl_df = pl_df.drop(columns_to_drop, axis=1)

# flag_columns = [col for col in pl_df.columns if 'flag' in col.lower()]
# pl_df[flag_columns] = pl_df[flag_columns].astype(bool)

pl_df

Unnamed: 0,pl_name,hostname,pl_letter,sy_snum,sy_pnum,sy_mnum,cb_flag,discoverymethod,disc_year,disc_telescope,...,sy_kepmag,rowupdate,pl_pubdate,releasedate,pl_nnotes,st_nphot,st_nrvc,st_nspec,pl_nespec,pl_ntranspec
0,11 Com b,11 Com,b,2,1,0,0,Radial Velocity,2007,2.16 m Telescope,...,,2014-07-23,2011-08,2014-07-23,2,1,2,0,0,0
1,11 Com b,11 Com,b,2,1,0,0,Radial Velocity,2007,2.16 m Telescope,...,,2014-05-14,2008-01,2014-05-14,2,1,2,0,0,0
2,11 Com b,11 Com,b,2,1,0,0,Radial Velocity,2007,2.16 m Telescope,...,,2023-09-19,2023-08,2023-09-19,2,1,2,0,0,0
3,11 UMi b,11 UMi,b,1,1,0,0,Radial Velocity,2009,2.0 m Alfred Jensch Telescope,...,,2018-09-04,2017-03,2018-09-06,0,1,1,0,0,0
4,11 UMi b,11 UMi,b,1,1,0,0,Radial Velocity,2009,2.0 m Alfred Jensch Telescope,...,,2018-04-25,2011-08,2014-07-23,0,1,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35195,ups And d,ups And,d,2,3,0,0,Radial Velocity,1999,Multiple Telescopes,...,,2014-08-21,2004-01,2014-08-21,5,1,10,1,0,0
35196,ups Leo b,ups Leo,b,1,1,0,0,Radial Velocity,2021,1.88 m Telescope,...,,2022-01-10,2021-12,2022-01-10,0,0,0,0,0,0
35197,xi Aql b,xi Aql,b,1,1,0,0,Radial Velocity,2007,1.88 m Telescope,...,,2014-07-23,2011-08,2014-07-23,1,1,1,0,0,0
35198,xi Aql b,xi Aql,b,1,1,0,0,Radial Velocity,2007,1.88 m Telescope,...,,2014-05-14,2008-06,2014-05-14,1,1,1,0,0,0


### Confirming Discovery Method Consistency

Checking if the stated discovery method aligns with the flagged discovery method.

In [170]:
mask = pd.Series([True] * len(pl_df))
for key, value in disc_flag_map.items():
    mask &= ~((pl_df['discoverymethod'].isin([value])) & (pl_df[key] == 0))
    
pl_df.drop(pl_df[~mask].index, inplace=True)

pl_df.reset_index(drop=True, inplace=True)

pl_df

Unnamed: 0,pl_name,hostname,pl_letter,sy_snum,sy_pnum,sy_mnum,cb_flag,discoverymethod,disc_year,disc_telescope,...,sy_kepmag,rowupdate,pl_pubdate,releasedate,pl_nnotes,st_nphot,st_nrvc,st_nspec,pl_nespec,pl_ntranspec
0,11 Com b,11 Com,b,2,1,0,0,Radial Velocity,2007,2.16 m Telescope,...,,2014-07-23,2011-08,2014-07-23,2,1,2,0,0,0
1,11 Com b,11 Com,b,2,1,0,0,Radial Velocity,2007,2.16 m Telescope,...,,2014-05-14,2008-01,2014-05-14,2,1,2,0,0,0
2,11 Com b,11 Com,b,2,1,0,0,Radial Velocity,2007,2.16 m Telescope,...,,2023-09-19,2023-08,2023-09-19,2,1,2,0,0,0
3,11 UMi b,11 UMi,b,1,1,0,0,Radial Velocity,2009,2.0 m Alfred Jensch Telescope,...,,2018-09-04,2017-03,2018-09-06,0,1,1,0,0,0
4,11 UMi b,11 UMi,b,1,1,0,0,Radial Velocity,2009,2.0 m Alfred Jensch Telescope,...,,2018-04-25,2011-08,2014-07-23,0,1,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35195,ups And d,ups And,d,2,3,0,0,Radial Velocity,1999,Multiple Telescopes,...,,2014-08-21,2004-01,2014-08-21,5,1,10,1,0,0
35196,ups Leo b,ups Leo,b,1,1,0,0,Radial Velocity,2021,1.88 m Telescope,...,,2022-01-10,2021-12,2022-01-10,0,0,0,0,0,0
35197,xi Aql b,xi Aql,b,1,1,0,0,Radial Velocity,2007,1.88 m Telescope,...,,2014-07-23,2011-08,2014-07-23,1,1,1,0,0,0
35198,xi Aql b,xi Aql,b,1,1,0,0,Radial Velocity,2007,1.88 m Telescope,...,,2014-05-14,2008-06,2014-05-14,1,1,1,0,0,0
