# Model Tutorial: Linear Regression

The purpose of this notebook is to demonstrate how to train and predict linear regression models used in this project. First, we will demonstrate the basic code, and then reproduce the results using a custom class `LM` to make the code consistent for multiple models.

## Model Description



## Setup

In [None]:
import sys
sys.path.append('../src')
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
# Local modules
from fmda_models import LM
import reproducibility

## Data Read and Split

In [None]:
df = pd.read_pickle("../data/rocky_2023_06-08.pkl")

In [None]:
# Set seed for reproducibility
reproducibility.set_seed(123)

# Create Data
X_train, X_test, y_train, y_test = train_test_split(df[["Ed", "Ew"]], df['fm'], test_size=.2)

## Manually Code LR

In [None]:
# create model instance
lm = LinearRegression()
# fit model
lm.fit(X_train, y_train)

In [None]:
fitted = lm.predict(X_train)
preds = lm.predict(X_test)

In [None]:
# Calculate RMSE for the training data
rmse_train = np.sqrt(mean_squared_error(y_train, fitted))

# Calculate R-squared for the training data
r2_train = r2_score(y_train, fitted)

# Calculate RMSE for the test data
rmse_test = np.sqrt(mean_squared_error(y_test, preds))

# Calculate R-squared for the test data
r2_test = r2_score(y_test, preds)

print("RMSE for training data:", rmse_train)
print("R-squared for training data:", r2_train)
print("RMSE for test data:", rmse_test)
print("R-squared for test data:", r2_test)

## Reproduce using LM Class

We now use a class `LM` that reproduces the code above. The purpose of the class is to have different machine learning models with the same methods for concise code.

The `LM` class uses all defaults with no hyperparameter tuning.

In [None]:
model = LM()
model.fit(X_train, y_train)
fitted = model.predict(X_train)
preds = model.predict(X_test)

In [None]:
# Calculate RMSE for the training data
rmse_train = np.sqrt(mean_squared_error(y_train, fitted))

# Calculate R-squared for the training data
r2_train = r2_score(y_train, fitted)

# Calculate RMSE for the test data
rmse_test = np.sqrt(mean_squared_error(y_test, preds))

# Calculate R-squared for the test data
r2_test = r2_score(y_test, preds)

print("RMSE for training data:", rmse_train)
print("R-squared for training data:", r2_train)
print("RMSE for test data:", rmse_test)
print("R-squared for test data:", r2_test)