# Predicting Exoplanet Candidates with Machine Learning

In this project, we aim to build a Machine Learning model to predict whether a candidate is a confirmed exoplanet or not. We will use publicly available data to create a classification model that can automate the process of identifying confirmed planets. This document will serve as a guide through the entire workflow, starting from data exploration and preparation to model building and evaluation.

The data used in this project can be found at the following link:

[Dataset Link](https://drive.google.com/file/d/1Ui9lIx8LeKaV6UuXT5VNbyz6XplGO2N5/view)

The dataset contains public information about exoplanets provided by the California Institute of Technology (CalTech) and the National Aeronautics and Space Administration (NASA) of the United States.

If you are interested in learning more about the variables included in the dataset, you can refer to the following link for a detailed description:

[Variable Description](https://exoplanetarchive.ipac.caltech.edu/docs/API_kepcandidate_columns.html)



# Identification and Data Preparation

In this section, we will focus on the identification and preparation of data for building a Machine Learning model to detect confirmed exoplanets. We will perform an exploratory analysis of the variables and prepare the data to create a clean dataset ready for modeling.

In [24]:
import pandas as pd

In [25]:
# Load the dataset
df = pd.read_csv("exoplanetas.csv", delimiter=';')
# Display the first few rows to understand the structure of the dataset
df.head()

Unnamed: 0,koi_disposition,koi_period,koi_period_err1,koi_period_err2,koi_time0bk,koi_time0bk_err1,koi_time0bk_err2,koi_impact,koi_impact_err1,koi_impact_err2,...,koi_slogg,koi_slogg_err1,koi_slogg_err2,koi_srad,koi_srad_err1,koi_srad_err2,ra,dec,deg_sc,koi_kepmag
0,CONFIRMED,544183827,2479.0,-2479.0,16251384,352,-352,586,59,-443,...,4467,64,-96,927,105,-61,29193423,48141651,,15347
1,FALSE POSITIVE,1989913995,0.00149,-0.00149,175850252,581,-581,969,5126,-77,...,4544,44,-176,868,233,-78,29700482,48134129,,15436
2,FALSE POSITIVE,1736952453,2.63e-05,-2.63e-05,170307565,115,-115,1276,115,-92,...,4564,53,-168,791,201,-67,28553461,4828521,,15597
3,CONFIRMED,2525591777,0.000376,-0.000376,17159555,113,-113,701,235,-478,...,4438,7,-21,1046,334,-133,28875488,482262,,15509
4,CONFIRMED,413443512,0.00105,-0.00105,17297937,19,-19,762,139,-532,...,4486,54,-229,972,315,-105,29628613,4822467,,15714


In [26]:
# Display the column names to identify the available variables
print(df.columns), df.shape[1]

Index(['koi_disposition', 'koi_period', 'koi_period_err1', 'koi_period_err2',
       'koi_time0bk', 'koi_time0bk_err1', 'koi_time0bk_err2', 'koi_impact',
       'koi_impact_err1', 'koi_impact_err2', 'koi_duration',
       'koi_duration_err1', 'koi_duration_err2', 'koi_depth', 'koi_depth_err1',
       'koi_depth_err2', 'koi_prad', 'koi_prad_err1', 'koi_prad_err2',
       'koi_teq', 'koi_insol', 'koi_insol_err1', 'koi_insol_err2',
       'koi_model_snr', 'koi_tce_plnt_num', 'koi_steff', 'koi_steff_err1',
       'koi_steff_err2', 'koi_slogg', 'koi_slogg_err1', 'koi_slogg_err2',
       'koi_srad', 'koi_srad_err1', 'koi_srad_err2', 'ra', 'dec', 'deg_sc',
       'koi_kepmag'],
      dtype='object')


(None, 38)

The dataset contains various features related to exoplanet candidates, such as their orbital period, radius, temperature, and other stellar properties.

### Description of Columns

- **koi_disposition**: The final classification of the candidate, which can be CONFIRMED, FALSE POSITIVE, or CANDIDATE.
- **koi_period**: Orbital period of the candidate (in days).
- **koi_period_err1 / koi_period_err2**: Upper and lower uncertainty in the orbital period.
- **koi_time0bk**: Time corresponding to the center of the first detected transit.
- **koi_time0bk_err1 / koi_time0bk_err2**: Upper and lower uncertainty in the transit epoch.
- **koi_impact**: Impact parameter, representing the sky-projected distance between the center of the stellar disc and the center of the planet disc at conjunction.
- **koi_impact_err1 / koi_impact_err2**: Upper and lower uncertainty in the impact parameter.
- **koi_duration**: Duration of the observed transits (in hours).
- **koi_duration_err1 / koi_duration_err2**: Upper and lower uncertainty in the transit duration.
- **koi_depth**: Transit depth, representing the fraction of stellar flux lost at the minimum of the planetary transit.
- **koi_depth_err1 / koi_depth_err2**: Upper and lower uncertainty in the transit depth.
- **koi_prad**: Planetary radius (in Earth radii).
- **koi_prad_err1 / koi_prad_err2**: Upper and lower uncertainty in the planetary radius.
- **koi_teq**: Estimated equilibrium temperature of the planet (in Kelvin).
- **koi_insol**: Insolation flux received by the planet, relative to Earth.
- **koi_insol_err1 / koi_insol_err2**: Upper and lower uncertainty in the insolation flux.
- **koi_model_snr**: Signal-to-noise ratio of the transit.
- **koi_tce_plnt_num**: Number of planets identified in the system.
- **koi_steff**: Stellar effective temperature (in Kelvin).
- **koi_steff_err1 / koi_steff_err2**: Upper and lower uncertainty in the stellar effective temperature.
- **koi_slogg**: Stellar surface gravity (logarithm, in cm/s²).
- **koi_slogg_err1 / koi_slogg_err2**: Upper and lower uncertainty in the stellar surface gravity.
- **koi_srad**: Stellar radius (in solar radii).
- **koi_srad_err1 / koi_srad_err2**: Upper and lower uncertainty in the stellar radius.
- **ra**: Right ascension of the target star (in degrees).
- **dec**: Declination of the target star (in degrees).
- **deg_sc**: Sky coordinate.
- **koi_kepmag**: Kepler magnitude of the target star.

## 1) Identifying the Target Variable

The target variable for our model is `koi_disposition`, which indicates whether an exoplanet candidate is CONFIRMED, FALSE POSITIVE, or still a CANDIDATE. For our purposes, we will focus on classifying the candidates as either CONFIRMED, FALSE POSITIVE, or CANDIDATE to fully capture the disposition of each exoplanet candidate.