<div style="padding:30px 0px;">
    <h1 align="center" style="padding:50px">Tuning Linear Regression Models</h1>
    <p align="center" style="font-size:small;">Seth Pruitt<br>spruitt@norstal.com<br>www.github.com/faradical</p>
</div>

## Importing Dependencies

In [55]:
import pandas as pd
import numpy as np

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

## Generating synthetic data because I am too lazy to source and clean a proper data set

In [56]:
# Generate a random regression dataset
X, y = make_regression(n_samples=1000, n_features=5, noise=10, random_state=42)

# Split the data into traing and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
pd.DataFrame(X_train).head()

Unnamed: 0,0,1,2,3,4
0,-1.86654,-0.600424,1.321304,1.007514,1.419603
1,-1.245739,0.214094,-0.446515,0.173181,0.856399
2,-0.856852,-2.854627,-0.626315,-1.126054,0.494064
3,-0.436764,0.826047,-0.369527,-1.606577,-0.131473
4,-0.783754,-0.288549,-1.371674,1.734937,-0.709441


## Creating a regular Linear Regression Model

In [57]:
# Create a linear regression object
lin_reg = LinearRegression()

# Fit the model to the data
lin_reg.fit(X_train, y_train)

# Get the training score of the model
lin_reg.score(X_test, y_test)

0.9729771410342374

## Creating a GridSearchCV optimizer object to build a tuned model

[Linear Regression Documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)

In [58]:
# Define a dictionary of hyperparameters to search over
param_grid = {
    'copy_X': [True, False],
    'fit_intercept': [True, False]
}

# Perform a grid search with cross-validation
grid = GridSearchCV(LinearRegression(), param_grid, cv=5)
grid.fit(X_train, y_train)

## Evaluate the tuned models performance

In [59]:
# Print the best hyperparameters and resulting score
print("Best Hyperparameters:", grid.best_params_)
print("Best R-squared Score:", grid.best_score_)

Best Hyperparameters: {'copy_X': True, 'fit_intercept': True}
Best R-squared Score: 0.9747222453785562


In [60]:
grid.score(X_test, y_test)

0.9729771410342374

As we can see, simple linear regression models do not get much benefit from tuning, but now that we know how to do it, we can apply this process to other models.