# 01 – Linear and Regularized Models
     Interpretable Regression Under Structural Assumptions

## Objective

This notebook provides a rigorous treatment of **linear regression and its regularized variants**, covering:

- Ordinary Least Squares (OLS)
- Ridge (L2) regression
- Lasso (L1) regression
- Elastic Net
- Bias–variance tradeoff
- Coefficient stability and interpretability

It answers:

    When do linear models succeed, when do they fail, and how does regularization control complexity?


## Why Linear Models Still Matter

Despite modern ensembles, linear models remain critical because they:

- Are highly interpretable
- Provide stable baselines
- Scale well to large datasets
- Support causal and policy analysis
- Form the backbone of many production systems

Regularization transforms fragile linear models into robust estimators.


## Imports and Dataset



In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns


df = pd.read_csv("D:/GitHub/Data-Science-Techniques/datasets/Superviased-regression/synthetic_customer_ltv_regression_complete.csv")
df.head()

Unnamed: 0,customer_id,signup_year,signup_month,days_since_signup,tenure_months,avg_monthly_spend,purchase_frequency,discount_sensitivity,returns_rate,email_open_rate,ad_click_rate,loyalty_score,support_tickets,churn_risk_score,credit_score_proxy,customer_lifetime_value
0,1,2022,8,899.094991,29,123.916907,3,0.401322,0.043396,0.042156,0.023647,0.123574,1,0.959716,671.029435,2691.193107
1,2,2019,9,2017.615223,66,204.814055,5,0.26684,0.338968,0.540674,0.180153,0.323954,1,0.78927,746.074773,11690.801889
2,3,2020,3,1720.937794,57,218.905816,3,0.028719,0.041845,0.517227,0.173583,0.26843,2,0.53341,601.164043,13094.093874
3,4,2022,3,1001.962036,33,188.02806,4,0.421602,0.140611,0.512366,0.277571,0.498941,3,0.699054,722.688139,6251.644013
4,5,2018,4,2522.620983,84,142.413565,6,0.192419,0.051116,0.462827,0.123844,0.500634,2,0.439348,659.860235,16474.610236


## Step 1 – Define Target and Features


In [2]:
target = "customer_lifetime_value"

X = df.drop(columns=[target, "customer_id"])
y = df[target]


## Step 2 – Train/Test Split


In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


## Step 3 – Feature Scaling

Linear and regularized models **require scaled inputs**.


In [5]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


## Step 4 – Ordinary Least Squares (OLS)

OLS minimizes squared error without any complexity penalty.


In [6]:
from sklearn.linear_model import LinearRegression

ols = LinearRegression()
ols.fit(X_train_scaled, y_train)

ols_pred = ols.predict(X_test_scaled)


### Evaluation

In [7]:
from sklearn.metrics import mean_squared_error, r2_score

rmse_ols = mean_squared_error(y_test, ols_pred, squared=False)
r2_ols = r2_score(y_test, ols_pred)

rmse_ols, r2_ols




(np.float64(2580.158044811771), 0.8372608981288542)

## OLS Assumptions

- Linearity
- No multicollinearity
- Homoscedastic errors
- Independent observations
- Normally distributed residuals (for inference)

Violations reduce reliability.


## Step 5 – Ridge Regression (L2)

Ridge adds a penalty on coefficient magnitude.


In [8]:
from sklearn.linear_model import Ridge

ridge = Ridge(alpha=1.0)
ridge.fit(X_train_scaled, y_train)

ridge_pred = ridge.predict(X_test_scaled)


## Step 6 – Lasso Regression (L1)

Lasso enforces sparsity and performs feature selection.


In [9]:
from sklearn.linear_model import Lasso

lasso = Lasso(alpha=0.01)
lasso.fit(X_train_scaled, y_train)

lasso_pred = lasso.predict(X_test_scaled)


  model = cd_fast.enet_coordinate_descent(


## Step 7 – Elastic Net

Elastic Net combines Ridge and Lasso penalties.


In [10]:
from sklearn.linear_model import ElasticNet

enet = ElasticNet(alpha=0.01, l1_ratio=0.5)
enet.fit(X_train_scaled, y_train)

enet_pred = enet.predict(X_test_scaled)


  model = cd_fast.enet_coordinate_descent(


## Step 8 – Model Comparison


In [11]:
results = pd.DataFrame({
    "Model": ["OLS", "Ridge", "Lasso", "ElasticNet"],
    "RMSE": [
        rmse_ols,
        mean_squared_error(y_test, ridge_pred, squared=False),
        mean_squared_error(y_test, lasso_pred, squared=False),
        mean_squared_error(y_test, enet_pred, squared=False)
    ],
    "R2": [
        r2_ols,
        r2_score(y_test, ridge_pred),
        r2_score(y_test, lasso_pred),
        r2_score(y_test, enet_pred)
    ]
})

results




Unnamed: 0,Model,RMSE,R2
0,OLS,2580.158045,0.837261
1,Ridge,2580.170296,0.837259
2,Lasso,2580.230937,0.837252
3,ElasticNet,2580.085171,0.83727


## Step 9 – Coefficient Analysis


In [15]:
coef_df = pd.DataFrame({
    "Feature": X.columns,
    "OLS": ols.coef_,
    "Ridge": ridge.coef_,
    "Lasso": lasso.coef_,
    "ElasticNet": enet.coef_
})

coef_df.sort_values(['OLS'])


Unnamed: 0,Feature,OLS,Ridge,Lasso,ElasticNet
12,churn_risk_score,-1247.743595,-1247.254314,-1248.579065,-1234.276958
7,returns_rate,-246.911576,-246.878954,-246.545317,-246.137625
11,support_tickets,-202.962153,-202.898191,-202.64252,-201.843991
0,signup_year,-114.766101,-196.763464,-199.339143,-864.924611
13,credit_score_proxy,-36.008474,-36.049951,-36.427692,-35.851587
6,discount_sensitivity,-21.346922,-21.323822,-19.841733,-22.271354
9,ad_click_rate,1.813289,1.859323,1.528734,2.560185
8,email_open_rate,8.658371,8.637977,7.589587,10.666559
1,signup_month,37.176239,27.011677,27.253624,-56.488812
10,loyalty_score,87.392779,87.812792,87.437033,97.440703


## Bias–Variance Tradeoff

| Model | Bias | Variance |
|-----|------|----------|
| OLS | Low | High |
| Ridge | Medium | Lower |
| Lasso | Medium | Lower |
| ElasticNet | Balanced | Balanced |


## Step 10 – Pipelines (Correct Implementation)


In [13]:
from sklearn.pipeline import Pipeline

ridge_pipeline = Pipeline(steps=[
    ("scaler", StandardScaler()),
    ("model", Ridge(alpha=1.0))
])

ridge_pipeline.fit(X_train, y_train)


## Common Mistakes (Avoided)

- `[neg] -` No scaling
- `[neg] -` Interpreting raw coefficients without standardization
- `[neg] -` Ignoring multicollinearity
- `[neg] -` Using OLS on high-dimensional data


## Summary Table

| Model | Strength |
|-----|---------|
| OLS | Interpretability |
| Ridge | Stability |
| Lasso | Feature selection |
| ElasticNet | Robust compromise |


## Key Takeaways

- Linear models require discipline
- Regularization is not optional
- Scaling is mandatory
- Coefficients tell a story
- Pipelines prevent leakage


## Next Notebook

04_Supervised_Learning/

└── [02_tree_based_models.ipynb](02_tree_based_regression.ipynb)


<br><br><br><br><br>



# Complete: [Data Science Techniques](https://github.com/lei-soares/Data-Science-Techniques)

- [00_Data_Generation_and_Simulation](https://github.com/lei-soares/Data-Science-Techniques/tree/main/00_Data_Generation_and_Simulation)


- [01_Exploratory_Data_Analysis_(EDA)](https://github.com/lei-soares/Data-Science-Techniques/tree/main/01_Exploratory_Data_Analysis_(EDA))


- [02_Data_Preprocessing](https://github.com/lei-soares/Data-Science-Techniques/tree/main/02_Data_Preprocessing)


- [03_Feature_Engineering](https://github.com/lei-soares/Data-Science-Techniques/tree/main/03_Feature_Engineering)


- [04_Supervised_Learning](https://github.com/lei-soares/Data-Science-Techniques/tree/main/04_Supervised_Learning)


- [05_Unsupervised_Learning](https://github.com/lei-soares/Data-Science-Techniques/tree/main/05_Unsupervised_Learning)


- [06_Model_Evaluation_and_Validation](https://github.com/lei-soares/Data-Science-Techniques/tree/main/06_Model_Evaluation_and_Validation)


- [07_Model_Tuning_and_Optimization](https://github.com/lei-soares/Data-Science-Techniques/tree/main/07_Model_Tuning_and_Optimization)


- [08_Interpretability_and_Explainability](https://github.com/lei-soares/Data-Science-Techniques/tree/main/08_Interpretability_and_Explainability)


- [09_Pipelines_and_Workflows](https://github.com/lei-soares/Data-Science-Techniques/tree/main/09_Pipelines_and_Workflows)


- [10_Natural_Language_Processing_(NLP)](https://github.com/lei-soares/Data-Science-Techniques/tree/main/10_Natural_Language_Processing_(NLP))


- [11_Time_Series](https://github.com/lei-soares/Data-Science-Techniques/tree/main/11_Time_Series)


- [12_Anomaly_and_Fraud_Detection](https://github.com/lei-soares/Data-Science-Techniques/tree/main/12_Anomaly_and_Fraud_Detection)


- [13_Imbalanced_Learning](https://github.com/lei-soares/Data-Science-Techniques/tree/main/13_Imbalanced_Learning)


- [14_Deployment_and_Production_Concepts](https://github.com/lei-soares/Data-Science-Techniques/tree/main/14_Deployment_and_Production_Concepts)


- [15_Business_and_Experimental_Design](https://github.com/lei-soares/Data-Science-Techniques/tree/main/15_Business_and_Experimental_Design)




<br><br><br><br><br>

[Panfugo Dados](www.pantufodados.com)


[Pantufo Dados - YouTube Channel](https://www.youtube.com/@pantufodados)