# Predicting Star Temperature with Elastic Net Linear Regression
Using the Open Exoplanet Catalogue database: https://github.com/OpenExoplanetCatalogue/open_exoplanet_catalogue/

## Data License
Copyright (C) 2012 Hanno Rein

Permission is hereby granted, free of charge, to any person obtaining a copy of this database and associated scripts (the "Database"), to deal in the Database without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Database, and to permit persons to whom the Database is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Database. A reference to the Database shall be included in all scientific publications that make use of the Database.

THE DATABASE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATABASE OR THE USE OR OTHER DEALINGS IN THE DATABASE.

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

stars = pd.read_csv('../../lab_10/data/stars.csv')
stars.head()

## EDA

In [None]:
stars.info()

In [None]:
stars.describe()

In [None]:
sns.heatmap(stars.corr(), vmin=-1, vmax=1, center=0, annot=True, fmt='.1f')

## Train test split

In [None]:
from sklearn.model_selection import train_test_split

data = stars[[
    'metallicity', 'temperature', 'magJ', 'radius', 
    'magB', 'magH', 'magK', 'mass', 'planets'
]].dropna()

y = data.pop('temperature')
X = data

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=0
)

## Grid search for best hyperparameters in elastic net pipeline

In [None]:
%%capture 
# don't show warning messages or output for this cell
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

pipeline = Pipeline([
    ('scale', MinMaxScaler()), 
    ('net', ElasticNet(random_state=0))
])

search_space = {
    'net__alpha': [0.1, 0.5, 1, 1.5, 2, 5],
    'net__l1_ratio': np.linspace(0, 1, num=10),
    'net__fit_intercept': [True, False]
}

elastic_net = GridSearchCV(pipeline, search_space, cv=5).fit(X_train, y_train)

Check the best hyperparameters:

In [None]:
elastic_net.best_params_

## R<sup>2</sup>

In [None]:
elastic_net.score(X_test, y_test) # R-squared

## Model equation

In [None]:
[
    (coef, feature) for coef, feature in 
    zip(elastic_net.best_estimator_.named_steps['net'].coef_, X_train.columns)
]

In [None]:
elastic_net.best_estimator_.named_steps['net'].intercept_

## Residuals

In [None]:
from ml_utils.regression import plot_residuals
plot_residuals(y_test, elastic_net.predict(X_test))

<hr>
<div>
    <a href="../../lab_10/red_wine.ipynb">
        <button>&#8592; Lab 10</button>
    </a>
    <a href="./exercise_2.ipynb">
        <button style="float: right;">Next Solution &#8594;</button>
    </a>
</div>
<hr>