# Predicting Whether a Planet Has a Shorter Year than Earth
Using the Open Exoplanet Catalogue database: https://github.com/OpenExoplanetCatalogue/open_exoplanet_catalogue/

## Data License
Copyright (C) 2012 Hanno Rein

Permission is hereby granted, free of charge, to any person obtaining a copy of this database and associated scripts (the "Database"), to deal in the Database without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Database, and to permit persons to whom the Database is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Database. A reference to the Database shall be included in all scientific publications that make use of the Database.

THE DATABASE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATABASE OR THE USE OR OTHER DEALINGS IN THE DATABASE.

## Setup

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

planets = pd.read_csv('../../lab_09/data/planets.csv')
planets.head()

## EDA

In [None]:
fig = plt.figure(figsize=(7, 7))
sns.heatmap(
    planets.drop(columns='discoveryyear').corr(), 
    center=0, vmin=-1, vmax=1, square=True, annot=True,
    cbar_kws={'shrink': 0.8}
)

In [None]:
planets[['period', 'semimajoraxis', 'eccentricity', 'mass']].info()

In [None]:
planets[['period', 'semimajoraxis', 'eccentricity', 'mass']].describe()

## Creating the `shorter_year_than_earth` column

In [None]:
planets['shorter_year_than_earth'] = planets.period < planets.query('name == "Earth"').period.iat[0]
planets.shorter_year_than_earth.value_counts()

## Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

data = planets[['shorter_year_than_earth', 'semimajoraxis', 'mass', 'eccentricity']].dropna()
y = data.pop('shorter_year_than_earth')
X = data

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=0, stratify=y
)

lm = LogisticRegression(random_state=0).fit(X_train, y_train)
lm.score(X_test, y_test) 

## Evaluation
Make predictions

In [None]:
preds = lm.predict(X_test)

Get performance metrics

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_test, preds))

In [None]:
from ml_utils.classification import plot_roc

plot_roc(y_test, lm.predict_proba(X_test)[:,1])

In [None]:
from ml_utils.classification import confusion_matrix_visual

confusion_matrix_visual(y_test, preds, ['>=', 'shorter'])

<hr>
<div>
    <a href="./exercise_2.ipynb">
        <button>&#8592; Previous Solution</button>
    </a>
    <a href="./exercise_4.ipynb">
        <button style="float: right;">Next Solution &#8594;</button>
    </a>
</div>
<hr>