Exoplanet Explorer Machine Learning Models

Background

Over a period of nine years in deep space, the NASA Kepler space telescope has been out on a planet-hunting mission to discover hidden planets outside of our solar system.

To help process this data, I created multiple machine learning models capable of classifying candidate exoplanets from the raw dataset.

Exploring and Selecting the Data

This dataset is a cumulative record of all observed Kepler "objects of interest" and contains an extensive data directory.

Exoplanet Achive Information: The disposition or label in the literature for the exoplanet candidate. One of CANDIDATE, FALSE POSITIVE, NOT DISPOSITIONED or CONFIRMED. (koi_disposition)

Project Disposition Columns: NASA flags used to identify or assign the foreign body.

Transit Properties: Calculated parameters of the object such as Orbital Period, Transit Epoch, Planet-Star Radius Ratio, Planet-Star Distance over Star Radius and Impact Parameter. _Transit properties contain uncertainty values and are identified with a suffix err. The margin of error is NOT included in the model

Stellar Parameters: Stellar parameters are observational data used to determine stellar physics. These include effective temperature, surface gravity, metallicity, radius, mass, and ageCalculated parameters of the object such as Orbital Period, Transit Epoch, Planet-Star Radius Ratio, Planet-Star Distance over Star Radius and Impact Parameter. _Stellar properties contain uncertainty values and are identified with a suffix err. The margin of error is NOT included in the model

KIC Parameters: Physical properties and target identifier.

Full Directory of Data Columns Definitions

Pipeline

1. Preprocessing the Data

Data was preprocess prior to fitting the model.
Removed unnecesary features and performed feature selection
Use MinMaxScaler to scale the numerical data.
Applied LabelEncoder to the categorical labels (koi_disposition) for future Neural Network
Split the data into a training and testing data subset.

2. Create Models

Support Vector Machine Linear Classifier from sklearn.svm library
Logistic Regression from sklearn.linear_mode library
Random Forest from sklearn.ensemble library
kNN Model from sklearn.neighbors library
Neural Networks from tensorflow.keras.models library

3. Hypertune Model

Idenitified parameters using .get_params().keys()
Use GridSearch to tune model parameters.

Model Summary

Model Name	Test Score	Hypertuned Score
SVM - Linear	0.798	0.820
Logistic Regression	0.804	0.818
Random Forest	0.908	0.905
kNN	0.797	0.802
Neural Network / Deep Learning	0.805	0.817

After running several models, the algorithm with the highest degree of accuracy for classifying Exoplanets is Random Forest with an accuracy score of 91%. All models tested came quite close with an average score of 80%. While I did hypertune many of these models, more parameters can be assessed in order to improve the accuracy. Secondly, the dataset includes "candidate" exoplanets which provides little to no insight since this candidate can be a false positive or a confirmed exoplanet. The model may be improved by removing all candidate exoplanets and train only on confirmed confirmed exoplanets and non-exoplanets. I ran a confusion matrix to better understand the RandomForest model.

Overall the accuracy of this model is quite high. It appears that this model is extremely accurate at correctly classifying false positives with an F1 score of 0.99. In other words, this model is best a predicted objects of interest that are NOT exoplanets.

Data Source

Exoplanet Data Source

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Images		Images
Models		Models
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
exolplanet_model.ipynb		exolplanet_model.ipynb
exoplanet_data.csv		exoplanet_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exoplanet Explorer Machine Learning Models

Background

Exploring and Selecting the Data

Pipeline

1. Preprocessing the Data

2. Create Models

3. Hypertune Model

Model Summary

Data Source

About

Releases

Packages

Languages

lilstarhunter/exoplanetExplorer

Folders and files

Latest commit

History

Repository files navigation

Exoplanet Explorer Machine Learning Models

Background

Exploring and Selecting the Data

Pipeline

1. Preprocessing the Data

2. Create Models

3. Hypertune Model

Model Summary

Data Source

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages