# Exercise 2: Part A (sklearn)

This file is included to show how to solve the first part (A) with scikit-learn instead of statsmodels. sklearn is useful but lacks a lot of the nice tools that statsmodels provides, which is why using statsmodels is the recommended way to solve the assignments.

In [0]:
from sklearn.linear_model import LinearRegression, LogisticRegression
import numpy as np
import pandas as pd
from scipy import stats


In [0]:
def fstatistic(lm: LinearRegression, X: np.array, y: np.array, decimal_places=4):
    rsq = lm.score(X, y)
    f = (rsq / (1 - rsq)) * ((len(X) - len(X[0]) - 1) / len(X[0]))
    p_value = 1 - stats.f.cdf(f, len(X[0]), len(X))

    f = np.round(f, decimal_places)
    p_value = np.round(p_value, decimal_places)

    return pd.DataFrame({"F-statistic": [f], "Prob (F-statistic)": [p_value]})

In [0]:
def task1_data():
    auto = pd.read_csv("../data/Auto.csv", index_col=0)
    auto = auto.loc[auto["horsepower"].str.isnumeric()]
    auto_X = auto["horsepower"].astype(float).values
    auto_y = auto.index.values
    return auto_X, auto_y

In [0]:
# 8a)
## Use the lm() function to perform a simple linear regression with mpg as the response and horsepower as the predictor. Use the summary() function to print the results.
X, y = task1_data()
X = X.reshape(-1, 1)
lm = LinearRegression().fit(X, y)

## i. Is there a relationship between the predictor and the response?
### calculate the F-statistic if it is ~0 => relationship
f = fstatistic(lm, X, y)
print(f)

In [0]:
## ii. How strong is the relationship between the predictor and the response?
### calcualte R2 = variance explained by variable
print("R2: {:.2%}".format(lm.score(X, y)))

In [0]:
## iii. Is the relationship between the predictor and the response positive or negative?
print(pd.DataFrame({"intercept": [lm.intercept_], "coefficient": lm.coef_}))

In [0]:
## iv. What is the predicted mpg associated with a horsepower of 98? What are the associated 95% confidence and prediction intervals?
print("Prediction: {}".format(lm.predict([[98]])[0]))