<a href="https://colab.research.google.com/github/joehawkens/MachineLearning/blob/main/FINAL_PROJECT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EXOPLANET PREDICTION MODEL
**GOAL: To predict if astronomical objects are exoplanets using data collected from the Kepler Space Observatory telescope.**

- Source: https://www.kaggle.com/datasets/nasa/kepler-exoplanet-search-results
- Data: https://raw.githubusercontent.com/joehawkens/MachineLearning/main/exoplanets.csv

# DATA CLEAN

### Checking the columns:
- Total Rows: 9,564

In [20]:
import pandas as pd
exoplanet_data = pd.read_csv('https://raw.githubusercontent.com/joehawkens/MachineLearning/main/exoplanets.csv')
exoplanet_data.head()


# Count total rows for each column
row_counts = exoplanet_data.count()

# Print the row counts for each column
# print(row_counts)

data_types = exoplanet_data.dtypes

# print(data_types)
# print(exoplanet_data['koi_tce_delivname'])
print(exoplanet_data['koi_fpflag_nt'].value_counts())
print(exoplanet_data['koi_fpflag_ss'].value_counts())
print(exoplanet_data['koi_fpflag_co'].value_counts())
print(exoplanet_data['koi_fpflag_ec'].value_counts())

0    7764
1    1800
Name: koi_fpflag_nt, dtype: int64
0    7349
1    2215
Name: koi_fpflag_ss, dtype: int64
0    7700
1    1864
Name: koi_fpflag_co, dtype: int64
0    8416
1    1148
Name: koi_fpflag_ec, dtype: int64


### Dropping unecessary columns:
(KOI = Kepler Object of Interest)
- row ID - Identifier
- kep ID - Identifier
- kepoi_name - Identifier
- kepler_name - Identifier
- koi_teq_error1 - no data
- koi_teq_error2 - no data
- koi_pdisposition - same as disposition
- koi_tce_plnt_num - Identifier
- koi_tce_delivname - Identifier

In [12]:
# Drop unnecessary columns
columns_to_drop = ['rowid', 'kepid', 'kepoi_name', 'kepler_name', 'koi_disposition',
                     'koi_pdisposition', 'koi_teq_err1', 'koi_teq_err2', 'koi_pdisposition', 'koi_tce_plnt_num', 'koi_tce_delivname']

exoplanets = exoplanet_data.drop(columns_to_drop, axis=1)




In [14]:
print(exoplanets.count())

koi_score            8054
koi_fpflag_nt        9564
koi_fpflag_ss        9564
koi_fpflag_co        9564
koi_fpflag_ec        9564
koi_period           9564
koi_period_err1      9110
koi_period_err2      9110
koi_time0bk          9564
koi_time0bk_err1     9110
koi_time0bk_err2     9110
koi_impact           9201
koi_impact_err1      9110
koi_impact_err2      9110
koi_duration         9564
koi_duration_err1    9110
koi_duration_err2    9110
koi_depth            9201
koi_depth_err1       9110
koi_depth_err2       9110
koi_prad             9201
koi_prad_err1        9201
koi_prad_err2        9201
koi_teq              9201
koi_insol            9243
koi_insol_err1       9243
koi_insol_err2       9243
koi_model_snr        9201
koi_steff            9201
koi_steff_err1       9096
koi_steff_err2       9081
koi_slogg            9201
koi_slogg_err1       9096
koi_slogg_err2       9096
koi_srad             9201
koi_srad_err1        9096
koi_srad_err2        9096
ra                   9564
dec         

## TARGET FEATURE: - koi-disposition

# DATA EXPLORATION

In [None]:
import pandas as pd
import altair as alt
import numpy as np

# Assuming your dataset is stored in a DataFrame called 'data'
correlation_matrix = exoplanets.corr().reset_index().melt('index', var_name='column')

heatmap = alt.Chart(correlation_matrix).mark_rect().encode(
    x='index:O',
    y='column:O',
    color=alt.Color('value:Q', scale=alt.Scale(scheme='blueorange'), legend=alt.Legend(title='Correlation'))
).properties(
    width=400,
    height=400,
    title='Correlation Heatmap'
)

heatmap


In [None]:
correlation_scores = exoplanets.corr()['koi_score'].drop('koi_score')
correlation_scores = correlation_scores.sort_values(ascending=False)
print(correlation_scores)


Positive Correlation
- koi_steff_err2       0.333595 - The error in the effective temperature of the star.                                                       
- koi_slogg_err2       0.228382 -  The error in the base-10 logarithm of the surface gravity of the star.

Negative Correlation:
- koi_depth           -0.301010 - The transit depth of the exoplanet.
- koi_teq             -0.302279 - The equilibrium temperature of the exoplanet.
- koi_steff_err1      -0.372432 -  The error in the effective temperature of the star.

# MODEL

In [None]:
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error, r2_score


# Split the data into features (X) and target variable (y)
X = exoplanets.drop(columns=['koi_score'])
y = exoplanets['koi_score']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

# Train the decision tree model
model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R2 Score:", r2)