# Generalized Linear Models (GLMs)

GLMs extend linear regression to handle non-Normal data (e.g., Binomial, Poisson).  
Core components:  
- **Linear Predictor**: $ \eta_i = \beta_0 + \beta_1 x_{1i} + \dots + \beta_p x_{pi} $.  
- **Link Function**: $ g(\mu_i) = \eta_i $.  
- **Variance Function**: $ \text{Var}(Y_i) = \phi V(\mu_i) $.  

| **Distribution**     | **Link Function $g(\mu)$**                                      | **Variance Function $V(\mu)$**         | **Typical Use Case**                   |
|-----------------------|---------------------------------------------------------------|----------------------------------------|----------------------------------------|
| **Normal**            | $g(\mu) = \mu$ (Identity)                                     | $V(\mu) = 1$                           | Continuous data with constant variance |
| **Binomial**          | $g(\mu) = \log\left(\frac{\mu}{1-\mu}\right)$ (Logit)         | $V(\mu) = \mu(1 - \mu)$               | Binary or proportion data              |
|                       | $g(\mu) = \Phi^{-1}(\mu)$ (Probit)                            | $V(\mu) = \mu(1 - \mu)$               | Binary data, alternative link          |
|                       | $g(\mu) = \log(-\log(1-\mu))$ (Complementary Log-Log)         | $V(\mu) = \mu(1 - \mu)$               | Binary data, asymmetric relationships  |
| **Poisson**           | $g(\mu) = \log(\mu)$ (Log)                                    | $V(\mu) = \mu$                        | Count data                             |
| **Gamma**             | $g(\mu) = \frac{1}{\mu}$ (Inverse)                           | $V(\mu) = \mu^2$                      | Positive continuous data               |
|                       | $g(\mu) = \log(\mu)$ (Log)                                    | $V(\mu) = \mu^2$                      | Alternative link for skewed data       |
| **Inverse Gaussian**  | $g(\mu) = \frac{1}{\mu^2}$                                    | $V(\mu) = \mu^3$                      | Skewed positive continuous data         |

## Notes:
- The **canonical link function** for each distribution is highlighted in the table. Other link functions may be used, but the canonical link often has desirable statistical properties.
- The variance functions reflect the relationship between the response mean and its variability, inherent to the exponential family distributions.
- For custom scenarios, quasi-likelihood methods can allow more flexibility in specifying $V(\mu)$.



In [1]:
import numpy as np
import statsmodels.api as sm

# Generate synthetic data
np.random.seed(42)  # For reproducibility
n_samples = 100
x = np.random.uniform(-10, 10, n_samples)  # Predictor
beta_0 = 2.5  # Intercept
beta_1 = 0.7  # Slope
y = beta_0 + beta_1 * x + np.random.normal(0, 2, n_samples)  # Response with noise

# Add constant for the intercept
X = sm.add_constant(x)

# Fit a Gaussian GLM
gaussian_model = sm.GLM(y, X, family=sm.families.Gaussian())
gaussian_results = gaussian_model.fit()

# Print the summary of results
print(gaussian_results.summary())

                 Generalized Linear Model Regression Results                  
Dep. Variable:                      y   No. Observations:                  100
Model:                            GLM   Df Residuals:                       98
Model Family:                Gaussian   Df Model:                            1
Link Function:               Identity   Scale:                          3.2922
Method:                          IRLS   Log-Likelihood:                -200.46
Date:                Sun, 12 Jan 2025   Deviance:                       322.63
Time:                        18:37:51   Pearson chi2:                     323.
No. Iterations:                     3   Pseudo R-squ. (CS):             0.9895
Covariance Type:            nonrobust                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.4704      0.182     13.547      0.0

In [7]:
import torch
import torch.nn as nn
import torch.optim as optim
from tabulate import tabulate
import statsmodels.api as sm

# 1. Generate synthetic Poisson data
torch.manual_seed(42)
n_samples = 100
X = torch.cat([torch.ones(n_samples, 1), torch.randn(n_samples, 1)], dim=1)  # Add intercept
beta_true = torch.tensor([0.5, 1.0])
eta = X @ beta_true  # Linear predictor
y = torch.poisson(torch.exp(eta))  # Poisson distributed response

# Convert to numpy for statsmodels
X_np = X.numpy()
y_np = y.numpy()

# 2. Define PyTorch Poisson GLM model
class PoissonGLM(nn.Module):
    def __init__(self, n_features):
        super(PoissonGLM, self).__init__()
        self.beta = nn.Parameter(torch.zeros(n_features))  # Initialize coefficients

    def forward(self, X):
        eta = X @ self.beta  # Linear predictor
        mu = torch.exp(eta)  # Mean (inverse of log link)
        return mu

    def log_likelihood(self, X, y):
        mu = self.forward(X)
        return torch.sum(y * torch.log(mu) - mu)  # Poisson log-likelihood

# 3. Initialize PyTorch model and optimizer
model = PoissonGLM(n_features=X.shape[1])
optimizer = optim.Adam(model.parameters(), lr=0.01)

# 4. Train PyTorch model
for epoch in range(10000):
    optimizer.zero_grad()
    neg_log_likelihood = -model.log_likelihood(X, y)  # Negative log-likelihood
    neg_log_likelihood.backward()  # Compute gradients
    optimizer.step()  # Update parameters

    if epoch % 1000 == 0:
        print(f"Epoch {epoch}: Negative Log-Likelihood = {neg_log_likelihood.item()}")

# 5. Compute standard errors using Hessian in PyTorch
beta_hat = model.beta.detach()
mu = model.forward(X).detach()
hessian = torch.zeros((X.shape[1], X.shape[1]))
for i in range(X.shape[0]):
    x_i = X[i].view(-1, 1)
    hessian += (x_i @ x_i.T) * mu[i]

hessian_inv = torch.inverse(hessian)
stderr = torch.sqrt(torch.diag(hessian_inv))  # Standard errors

# Calculate PyTorch statistics
t_values = beta_hat / stderr
p_values = 2 * (1 - torch.distributions.Normal(0, 1).cdf(torch.abs(t_values)))
confidence_intervals = torch.stack([beta_hat - 1.96 * stderr, beta_hat + 1.96 * stderr], dim=1)

# 6. Statsmodels Poisson GLM
sm_model = sm.GLM(y_np, X_np, family=sm.families.Poisson())
sm_results = sm_model.fit()

# 7. Create combined table for PyTorch and Statsmodels results
# Create a table for comparison of coefficients and standard errors
table = []
for i in range(len(beta_hat)):
    table.append([
        f"Beta_{i}",  # Coefficient name
        beta_hat[i].item(),  # PyTorch coefficient estimate
        stderr[i].item(),  # PyTorch standard error
        sm_results.params[i],  # Statsmodels coefficient estimate
        sm_results.bse[i],  # Statsmodels standard error
    ])

# Print the comparison table
print("\nComparison of Coefficients and Standard Errors:")
print(tabulate(table, headers=["Coefficient", "PT Coeff", "PT StdErr", "SM Coeff", "SM StdErr"]))



Epoch 0: Negative Log-Likelihood = 100.0
Epoch 1000: Negative Log-Likelihood = -128.48660278320312
Epoch 2000: Negative Log-Likelihood = -128.48660278320312
Epoch 3000: Negative Log-Likelihood = -128.48660278320312
Epoch 4000: Negative Log-Likelihood = -128.48660278320312
Epoch 5000: Negative Log-Likelihood = -128.48660278320312
Epoch 6000: Negative Log-Likelihood = -128.48660278320312
Epoch 7000: Negative Log-Likelihood = -128.48660278320312
Epoch 8000: Negative Log-Likelihood = -128.48660278320312
Epoch 9000: Negative Log-Likelihood = -128.48660278320312

Comparison of Coefficients and Standard Errors:
Coefficient      PT Coeff    PT StdErr    SM Coeff    SM StdErr
-------------  ----------  -----------  ----------  -----------
Beta_0           0.413635    0.0935555    0.413635    0.0935555
Beta_1           1.05245     0.0706126    1.05245     0.0706126
