# models.py Usage

This file provides examples and documentation for the key methods implemented in the `OLS`, `GLS`, and `Ridge` regression models. The examples illustrate how to use these methods, including fitting models, making predictions, and obtaining variance estimates.

---

### `OLS.fit(X, y, use_gradient_descent=False, max_iter=1000, alpha=0.01, tol=1e-6)`
Fits the Ordinary Least Squares (OLS) model to the dataset.

#### Parameters:
- **X** (`numpy.ndarray` or `pandas.DataFrame`): Feature matrix of shape `(n_samples, n_features)`.
- **y** (`numpy.ndarray` or `pandas.Series`): Response vector of length `n_samples`.
- **use_gradient_descent** (`bool`, optional): Whether to use gradient descent for optimization (default: False).
- **max_iter** (`int`, optional): Maximum iterations for gradient descent (default: 1000).
- **alpha** (`float`, optional): Learning rate for gradient descent (default: 0.01).
- **tol** (`float`, optional): Convergence tolerance for gradient descent (default: 1e-6).

#### Returns:
None. Fits the model and stores coefficients in `self.beta`.

---

### `OLS.predict(X)`
Predicts response values using the fitted OLS model.

#### Parameters:
- **X** (`numpy.ndarray` or `pandas.DataFrame`): Feature matrix of shape `(n_samples, n_features)`.

#### Returns:
- **predictions** (`numpy.ndarray`): Predicted values for each sample.

---

### `OLS.estimate_variance(X, y)`
Estimates the variance-covariance matrix of the coefficients.

#### Parameters:
- **X** (`numpy.ndarray` or `pandas.DataFrame`): Feature matrix of shape `(n_samples, n_features)`.
- **y** (`numpy.ndarray` or `pandas.Series`): Response vector of length `n_samples`.

#### Returns:
- **variance_matrix** (`numpy.ndarray`): Variance-covariance matrix of the coefficients.

---

### `GLS.fit(X, y, sigma)`
Fits the Generalized Least Squares (GLS) model to the dataset.

#### Parameters:
- **X** (`numpy.ndarray` or `pandas.DataFrame`): Feature matrix of shape `(n_samples, n_features)`.
- **y** (`numpy.ndarray` or `pandas.Series`): Response vector of length `n_samples`.
- **sigma** (`numpy.ndarray`): Covariance matrix of the errors.

#### Returns:
None. Fits the model and stores coefficients in `self.beta`.

---

### `Ridge.fit(X, y)`
Fits the Ridge regression model to the dataset.

#### Parameters:
- **X** (`numpy.ndarray` or `pandas.DataFrame`): Feature matrix of shape `(n_samples, n_features)`.
- **y** (`numpy.ndarray` or `pandas.Series`): Response vector of length `n_samples`.

#### Returns:
None. Fits the model and stores coefficients in `self.beta`.

---

### `Ridge.estimate_variance(X, y)`
Estimates the variance-covariance matrix of the coefficients.

#### Parameters:
- **X** (`numpy.ndarray` or `pandas.DataFrame`): Feature matrix of shape `(n_samples, n_features)`.
- **y** (`numpy.ndarray` or `pandas.Series`): Response vector of length `n_samples`.

#### Returns:
- **variance_matrix** (`numpy.ndarray`): Variance-covariance matrix of the coefficients.

---

## Example

Below is an example demonstrating the use of `OLS`, `GLS`, and `Ridge` regression models using synthetic datasets.


In [2]:
%cd ..
from stats_module.models import *
import numpy as np
import time

# Generate synthetic data
np.random.seed(42)
n_samples, n_features = 100, 5
X = np.random.randn(n_samples, n_features)
true_beta = np.random.randn(n_features)
y = X @ true_beta + np.random.normal(0, 0.5, n_samples)

  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


/Users/michaelmontemurri/Downloads/Classes/MATH 533 - Regression and ANOVA/regression_module


In [3]:
# OLS Example
ols_model = OLS(include_intercept=True)
ols_model.fit(X, y)
ols_predictions = ols_model.predict(X)
ols_variance = ols_model.estimate_variance(X, y)
print("OLS Coefficients:", ols_model.beta)
print("OLS Variance Matrix:\n", ols_variance)

#get summary
ols_summary = summary(ols_model, X, y)
print("OLS Summary:\n", ols_summary)

OLS Coefficients: [-0.04722303  0.91280318  1.88309814 -1.35978426  0.5945449  -0.53310417]
OLS Variance Matrix:
 [[ 2.56521028e-01  3.83891918e-02 -1.70855572e-02  2.33032373e-02
   2.53067798e-02]
 [ 3.83891918e-02  2.11354611e-01 -2.56234838e-02 -6.56783237e-03
  -5.44148650e-03]
 [-1.70855572e-02 -2.56234838e-02  2.05558025e-01  5.90663881e-04
  -9.11745231e-03]
 [ 2.33032373e-02 -6.56783237e-03  5.90663881e-04  2.12604044e-01
   2.20167249e-05]
 [ 2.53067798e-02 -5.44148650e-03 -9.11745231e-03  2.20167249e-05
   1.74128093e-01]]
OLS Summary:
 {'coefficients': array([-0.04722303,  0.91280318,  1.88309814, -1.35978426,  0.5945449 ,
       -0.53310417]), 'r_squared': np.float64(0.966360359561293)}


In [4]:
# Now using the gradient descent method
gd_model = OLS(include_intercept=True)
gd_model.fit(X, y, use_gradient_descent=True, max_iter=1000, alpha=0.1, tol=1e-6)
gd_predictions = gd_model.predict(X)
gd_variance = gd_model.estimate_variance(X, y)
print("GD Coefficients:", gd_model.beta)

gd_summary = summary(gd_model, X, y)
print("GD Summary:\n", gd_summary)


Converged after 50 iterations
GD Coefficients: [-0.04689984  0.91160579  1.88231535 -1.35919347  0.59415345 -0.53332083]
GD Summary:
 {'coefficients': array([-0.04689984,  0.91160579,  1.88231535, -1.35919347,  0.59415345,
       -0.53332083]), 'r_squared': np.float64(0.9663600385784428)}


In [5]:
# GLS Example
sigma = np.diag(np.random.uniform(0.5, 1.5, size=n_samples))
gls_model = GLS(include_intercept=True)
gls_model.fit(X, y, sigma)
gls_predictions = gls_model.predict(X)
print("GLS Coefficients:", gls_model.beta)

gls_summary = summary(gls_model, X, y)
print("GLS Summary:\n", gls_summary)

GLS Coefficients: [-0.07718822  0.90604494  1.8879837  -1.35086407  0.59440973 -0.51890699]
GLS Summary:
 {'coefficients': array([-0.07718822,  0.90604494,  1.8879837 , -1.35086407,  0.59440973,
       -0.51890699]), 'r_squared': np.float64(0.9661133093962327)}


In [6]:
# We can see that this reduces to OLS when sigma is the identity matrix
gls_model = GLS(include_intercept=True)
gls_model.fit(X, y, np.eye(n_samples))
gls_predictions = gls_model.predict(X)
gls_variance = gls_model.estimate_variance(X, y)
print("GLS Coefficients:", gls_model.beta)
print("OLS Coefficients:", ols_model.beta)

GLS Coefficients: [-0.04722303  0.91280318  1.88309814 -1.35978426  0.5945449  -0.53310417]
OLS Coefficients: [-0.04722303  0.91280318  1.88309814 -1.35978426  0.5945449  -0.53310417]


In [7]:
# Ridge Example
ridge_model = Ridge(alpha=1.0, include_intercept=True)
ridge_model.fit(X, y)
ridge_variance = ridge_model.estimate_variance(X, y)
print("Ridge Coefficients:", ridge_model.beta)
print("Ridge Variance Matrix:\n", ridge_variance)

ridge_summary = summary(ridge_model, X, y)
print("Ridge Summary:\n", ridge_summary)    

Ridge Coefficients: [-0.04342327  0.89623975  1.85912983 -1.34226342  0.58723277 -0.52952223]
Ridge Variance Matrix:
 [[ 2.50812226e-01  3.67571136e-02 -1.62575127e-02  2.23438619e-02
   2.43379561e-02]
 [ 3.67571136e-02  2.07622273e-01 -2.46116995e-02 -6.40530344e-03
  -5.37450180e-03]
 [-1.62575127e-02 -2.46116995e-02  2.02155997e-01  5.90937969e-04
  -8.78351079e-03]
 [ 2.23438619e-02 -6.40530344e-03  5.90937969e-04  2.08982812e-01
  -3.97043760e-05]
 [ 2.43379561e-02 -5.37450180e-03 -8.78351079e-03 -3.97043760e-05
   1.71806002e-01]]
Ridge Summary:
 {'coefficients': array([-0.04342327,  0.89623975,  1.85912983, -1.34226342,  0.58723277,
       -0.52952223]), 'r_squared': np.float64(0.9661983879234871)}


In [12]:
# Now lets generate data with p > n and see how Ridge performs compared to OLS

# We can see that the variance estimates of the OLS are extremely large, indicating the solution is unstable
# Ridge on the other hand, provides a more stable solution.

n_samples, n_features = 20, 50
X = np.random.randn(n_samples, n_features)
true_beta = np.random.randn(n_features)
y = X @ true_beta + np.random.normal(0, 0.5, n_samples)

ols_model = OLS(include_intercept=True)
ols_model.fit(X, y)
ols_predictions = ols_model.predict(X)
ols_variance = ols_model.estimate_variance(X, y)
print("OLS Variance Diagonal:\n", np.diag(ols_variance))

ridge_model = Ridge(alpha=1.0, include_intercept=True)
ridge_model.fit(X, y)
ridge_variance = ridge_model.estimate_variance(X, y)
print("Ridge Variance Diagonal:\n", np.diag(ridge_variance))

# Compare summaries
ols_summary = summary(ols_model, X, y)
ridge_summary = summary(ridge_model, X, y)


OLS Variance Diagonal:
 [ 1.20834216e+18  2.04717669e+18  2.95280200e+18 -4.08602343e+18
 -1.12891927e+18  2.95315824e+18 -4.51184942e+17 -9.46710441e+17
 -3.40234448e+18 -1.81641217e+18 -5.09228360e+18  7.89927873e+17
  4.37104358e+18  8.75074285e+17  3.48451179e+18  7.26130854e+18
 -2.99334628e+18 -2.96987808e+18  5.40259765e+18  2.62568932e+18
  2.39206883e+18  6.71112101e+17  7.32080565e+18  1.47293260e+19
  6.23603708e+18  9.17333989e+18 -1.85607765e+18 -3.00453863e+17
 -3.36391200e+17 -7.09685654e+18 -1.13617535e+18  1.43262352e+19
 -5.01195161e+17  2.72436257e+18 -5.15799590e+17 -1.90049454e+18
  6.81042997e+18  3.10140907e+18  9.14651282e+17  4.95704364e+18
 -2.14570719e+18 -1.13790555e+18  1.20277698e+18 -3.37915505e+17
 -2.11316373e+18  9.29032094e+17  4.55253840e+18 -1.27497646e+18
  1.37278408e+18 -1.12966011e+18]
Ridge Variance Diagonal:
 [-0.01482019 -0.00811928 -0.00476724 -0.00728944 -0.00480565 -0.00383843
 -0.00669914 -0.01030419 -0.00338391 -0.00517571 -0.00774725 -0

In [11]:
# Now lets compare the time taken to fit the models with a large number of features

n_samples, n_features = 20000, 10000
X = np.random.randn(n_samples, n_features)
true_beta = np.random.randn(n_features)
y = X @ true_beta + np.random.normal(0, 0.5, n_samples)

start = time.time()
ols_model = OLS(include_intercept=True)
ols_model.fit(X, y)
ols_predictions = ols_model.predict(X)
end = time.time()
print("OLS Time:", end - start)


start = time.time()
gd_model = OLS(include_intercept=True)
gd_model.fit(X, y, use_gradient_descent=True, max_iter=1000, alpha=0.1, tol=1e-6)
gd_predictions = gd_model.predict(X)
end = time.time()
print("GD Time:", end - start)


OLS Time: 153.27701091766357
Converged after 341 iterations
GD Time: 71.31898999214172
