# Regression: Interactions & Polynomials

## Setup

Load the packages and configure environment.

In [None]:
%matplotlib inline

import matplotlib.pylab as plt
import numpy as np
import pandas as pd

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

## Interaction Terms

### Advertising Data

Using the Advertising data from ISL.

In [None]:
# download the data set directly from the web using pandas
url = "https://raw.githubusercontent.com/olearydj/INSY7120/refs/heads/main/notebooks/data/Advertising.csv"
data = pd.read_csv(url)

In [None]:
# recall that we need to drop the duplicated row numbers in the first column
sales = data.drop(data.columns[0], axis=1)
sales.head()

If we are interested in a model based on radio, TV and their interaction, first get the **main effects**:

In [None]:
# get the predictors of interest
X = sales[['radio', 'TV']]
y = sales[['sales']]

Then use `PolynomialFeatures` from SKL to transform the features before fitting the model. In this case:

- `degree=2` limited to two-way interactions (products of two variables) between features
- `interaction_only=True` generates only the interaction terms (e.g., $radio \times tv$), without the squared terms (e.g., $radio^2$)
- `include_bias=False` lets LinearRegression compute the intercept

The process below first specifies the transformation and then applies it with the fit method.

In [None]:
# generate interaction terms
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)
X_interact = poly.fit_transform(X)

# inspect result - no head method for numpy, slice
X_interact[:5]

We can see that the first two columns are the original values for radio and TV and the third is their product.

To confirm the features created, use `poly.get_feature_names_out()`

In [None]:
poly.get_feature_names_out()

After transforming the input features, we can continue fitting the model and evaluating the results, as before.

In [None]:
mlr_interact = LinearRegression()

# use the transformed predictors!
mlr_interact.fit(X_interact, y)

In [None]:
# look at the estimated model parameters
print(f"Model Coefficients: {mlr_interact.coef_}")
print(f"Model Intercept: {mlr_interact.intercept_}")

In [None]:
# Make predictions with interaction data!
y_pred = mlr_interact.predict(X_interact)

# Evaluate the model
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y, y_pred)

print(f"Multiple Linear Regression Model, with Interaction Terms:")
print(f"Mean Squared Error: {mse:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"R² Score: {r2:.4f}")

Compare without interaction.

In [None]:
mlr = LinearRegression()
mlr.fit(X, y)

# for just r2, use score method of fitted model
# this generates predictions implicitly
# for other metrics you need to predict first
r2 = mlr.score(X, y)
print(f"R² Score: {r2:.4f}")

Define a function to simplify.

In [None]:
def quick_fit(X, y):
    model = LinearRegression()
    model.fit(X, y)
    r2 = model.score(X,y)
    print(f"R² Score: {r2:.4f}")

Compare with SLR using radio.

In [None]:
X = sales[['radio']]
quick_fit(X, y)

Compare with SLR using TV.

In [None]:
X = sales[['TV']]
quick_fit(X, y)

SLR radio (0.332) < SLR TV (0.612) < MLR radio + TV (0.897) < MLR radio * TV (0.968)

### Credit Data

Use `Credit` dataset from ISL.

In [None]:
# download the data set directly from the web using pandas
url = "https://raw.githubusercontent.com/olearydj/INSY7120/refs/heads/main/notebooks/data/Credit.csv"
credit = pd.read_csv(url)

In [None]:
credit.columns = credit.columns.str.lower()
credit = pd.get_dummies(credit, drop_first=True, dtype=int)
credit.head()

Predict `balance` from `income` (quant) and `student` (qual).

In [None]:
# get the predictors of interest
X = credit[['income', 'student_Yes']]
y = credit[['balance']]

In [None]:
poly = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)
X_interact = poly.fit_transform(X)

# inspect result - no head method for numpy, slice
X_interact[:5]

In [None]:
poly.get_feature_names_out()

In [None]:
quick_fit(X_interact, y)

In [None]:
quick_fit(X, y)

In [None]:
# Create a figure with two subplots side by side
fig, axes = plt.subplots(1, 2, figsize=(16, 7))
fig.suptitle('Income vs Balance by Student Status: Without vs With Interaction', fontsize=16)

# Get student and non-student data
students = credit[credit['student_Yes'] == 1]
non_students = credit[credit['student_Yes'] == 0]

# Common x range for prediction lines
x_range = np.linspace(credit['income'].min(), credit['income'].max(), 100)

# ------ Left plot: Model without interaction (X) ------
# Scatter all data points
axes[0].scatter(non_students['income'], non_students['balance'], alpha=0.5, color='blue', label='Non-Student')
axes[0].scatter(students['income'], students['balance'], alpha=0.5, color='red', label='Student')

# Fit model without interaction
model_no_interact = LinearRegression().fit(X, y)

# Predict for non-students and students - use DataFrames to match training data
X_pred_non = pd.DataFrame({'income': x_range, 'student_Yes': np.zeros(100)})
X_pred_stu = pd.DataFrame({'income': x_range, 'student_Yes': np.ones(100)})
y_pred_non = model_no_interact.predict(X_pred_non)
y_pred_stu = model_no_interact.predict(X_pred_stu)

# Plot regression lines
axes[0].plot(x_range, y_pred_non, 'b-', linewidth=2, label='Non-Student Line')
axes[0].plot(x_range, y_pred_stu, 'r-', linewidth=2, label='Student Line')
axes[0].set_title('Without Interaction (Main Effects Only)')
axes[0].set_xlabel('Income')
axes[0].set_ylabel('Balance')
axes[0].legend()
axes[0].grid(alpha=0.3)

# ------ Right plot: Model with interaction (X_interact) ------
# Scatter all data points
axes[1].scatter(non_students['income'], non_students['balance'], alpha=0.5, color='blue', label='Non-Student')
axes[1].scatter(students['income'], students['balance'], alpha=0.5, color='red', label='Student')

# Fit model with interaction
model_interact = LinearRegression().fit(X_interact, y)

# Prepare prediction data for interaction model
X_interact_pred_non = poly.transform(X_pred_non)  # Transform with interaction for non-students
X_interact_pred_stu = poly.transform(X_pred_stu)  # Transform with interaction for students
y_interact_pred_non = model_interact.predict(X_interact_pred_non)
y_interact_pred_stu = model_interact.predict(X_interact_pred_stu)

# Plot regression lines
axes[1].plot(x_range, y_interact_pred_non, 'b-', linewidth=2, label='Non-Student Line')
axes[1].plot(x_range, y_interact_pred_stu, 'r-', linewidth=2, label='Student Line') 
axes[1].set_title('With Interaction')
axes[1].set_xlabel('Income')
axes[1].set_ylabel('Balance')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.subplots_adjust(top=0.9)
plt.show()

**Left (Without Interaction):** The lines are *parallel*. The model says students carry ~\$200 more balance than non-students at any income level, but the *rate* at which balance increases with income is identical for both groups. This is the "main effects only" model:

$$\text{balance} = \beta_0 + \beta_1 \cdot \text{income} + \beta_2 \cdot \text{student}$$

**Right (With Interaction):** The lines have *different slopes*. For non-students, balance increases more steeply with income. For students, the slope is flatter - their balance increases more slowly as income rises. The interaction term captures this:

$$\text{balance} = \beta_0 + \beta_1 \cdot \text{income} + \beta_2 \cdot \text{student} + \beta_3 \cdot (\text{income} \times \text{student})$$

Without the interaction, the model *forces* the relationship between income and balance to be the same for both groups. The interaction term *allows* that relationship to differ - which better matches the data pattern where high-income students don't accumulate balance as quickly as high-income non-students.

## Polynomial Terms

Use `Auto` dataset.

In [None]:
# download the data set directly from the web using pandas
url = "https://raw.githubusercontent.com/olearydj/INSY7120/refs/heads/main/notebooks/data/Auto.csv"
cars = pd.read_csv(url)

In [None]:
cars.head()

`autos` includes question marks for some horsepower values (5 rows). For this example we'll simply convert them to `NaN` and drop those rows.

In [None]:
cars['horsepower'] = pd.to_numeric(cars['horsepower'], errors='coerce')
cars_clean = cars.dropna(subset=['horsepower'])

Same procedure as before, except `interaction_only=False` (the default).

In [None]:
# get the predictors of interest
X = cars_clean[['horsepower']]
y = cars_clean[['mpg']]

In [None]:
# generate polynomial terms
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2, interaction_only=False, include_bias=False)
X_interact = poly.fit_transform(X)

# inspect result - no head method for numpy, slice
X_interact[:5]

In [None]:
poly.get_feature_names_out()

In [None]:
cars_power = LinearRegression()
cars_power.fit(X_interact, y)

In [None]:
# look at the estimated model parameters
print(f"Model Coefficients: {cars_power.coef_}")
print(f"Model Intercept: {cars_power.intercept_}")

In [None]:
# Make predictions with interaction data!
y_pred = cars_power.predict(X_interact)

# Evaluate the model
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y, y_pred)

print(f"Multiple Linear Regression Model, with Polynomial Terms:")
print(f"Mean Squared Error: {mse:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"R² Score: {r2:.4f}")

## Many Terms: The Curse of Dimensionality

What happens when we use multiple predictors with polynomial and interaction terms? The number of features can explode quickly.

Let's use all quantitative predictors from the Auto dataset. First, inspect the columns.

In [None]:
cars_clean.dtypes

Use `loc` to select a range of columns by name. Here we grab all numeric predictors from `cylinders` through `year`, excluding `mpg` (our target) and `name`/`origin` (non-numeric).

In [None]:
# Select quantitative predictors using loc with column range
X_multi = cars_clean.loc[:, 'cylinders':'year']
y = cars_clean[['mpg']]

print(f"Original predictors: {X_multi.shape[1]}")
X_multi.head()

**Alternative selection methods:** There are several ways to select columns in pandas:

In [None]:
# Method 1: loc with column range (used above)
X1 = cars_clean.loc[:, 'cylinders':'year']

# Method 2: select_dtypes - all numeric columns, then drop target
X2 = cars_clean.select_dtypes(include='number').drop(columns=['mpg'])

# Method 3: iloc with positional indices (columns 1-6)
X3 = cars_clean.iloc[:, 1:7]

# Method 4: explicit column list
cols = ['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year']
X4 = cars_clean[cols]

# Verify all methods give same result
print(f"All methods equivalent: {X1.equals(X2) and X2.equals(X3) and X3.equals(X4)}")

Now apply `PolynomialFeatures` with degree=3 and `interaction_only=False` to generate all polynomial and interaction terms.

In [None]:
poly_full = PolynomialFeatures(degree=3, interaction_only=False, include_bias=False)
X_poly = poly_full.fit_transform(X_multi)

print(f"Original features: {X_multi.shape[1]}")
print(f"After degree=3 polynomial expansion: {X_poly.shape[1]}")

From 6 predictors to 83 features! This includes:
- 6 original terms (degree 1)
- 21 degree-2 terms (6 squared + 15 two-way interactions)
- 56 degree-3 terms (6 cubed + various combinations)

In [None]:
# View a sample of the generated feature names
feature_names = poly_full.get_feature_names_out()
print(f"First 10: {list(feature_names[:10])}")
print(f"Last 10: {list(feature_names[-10:])}")

In [None]:
# Fit and evaluate
model_poly = LinearRegression()
model_poly.fit(X_poly, y)

r2_poly = r2_score(y, model_poly.predict(X_poly))
print(f"R² with 83 features: {r2_poly:.4f}")

The R² is very high, but with 83 features and only 392 observations, we're at serious risk of overfitting. The model may be fitting noise rather than signal. This is the **curse of dimensionality** - more features require exponentially more data to estimate reliably.

In practice, we'd use techniques like cross-validation (next lecture) and regularization to manage this complexity.

## PolynomialFeatures Reference

### Terms Generated by Configuration

Given 2 predictors (X₁, X₂), `include_bias=False` (see note in the section that follows):

#### SLR (1 predictor: X₁)

| degree | Terms |
|--------|-------|
| 1 | X₁ |
| 2 | X₁, X₁² |
| 3 | X₁, X₁², X₁³ |

#### MLR (2 predictors: X₁, X₂) — no PolynomialFeatures

| degree | Terms |
|--------|-------|
| 1 | X₁, X₂ |

#### PolynomialFeatures with `interaction_only=True`

| degree | Terms | Notes |
|--------|-------|-------|
| 1 | X₁, X₂ | main effects only |
| 2 | X₁, X₂, X₁X₂ | adds cross-term |
| 3 | X₁, X₂, X₁X₂ | same as degree=2* |

*With only 2 features, degree=3 adds nothing new for `interaction_only=True` since a 3-way interaction requires 3 distinct features.

#### PolynomialFeatures with `interaction_only=False` (default)

| degree | Terms |
|--------|-------|
| 1 | X₁, X₂ |
| 2 | X₁, X₂, X₁², X₂², X₁X₂ |
| 3 | X₁, X₂, X₁², X₂², X₁X₂, X₁³, X₂³, X₁²X₂, X₁X₂² |

#### Summary

What each setting excludes/includes

| Setting | Squared terms (X₁²) | Cross-terms (X₁X₂) | Higher powers (X₁³) | Mixed powers (X₁²X₂) |
|---------|---------------------|--------------------|--------------------|----------------------|
| `interaction_only=True` | ❌ | ✅ | ❌ | ❌ |
| `interaction_only=False` | ✅ | ✅ | ✅ | ✅ |

### The `include_bias` Parameter

`include_bias` adds a column of 1s (the constant term, β₀) to the feature matrix.

| `include_bias` | Terms for degree=2, interaction_only=False |
|----------------|-------------------------------------------|
| `False` | X₁, X₂, X₁², X₂², X₁X₂ |
| `True` | **1**, X₁, X₂, X₁², X₂², X₁X₂ |

#### Why it matters

| Scenario | `include_bias` | Why |
|----------|----------------|-----|
| Using with `LinearRegression()` | `False` | LR adds its own intercept via `fit_intercept=True` (default) |
| Manual matrix math (X'X)⁻¹X'y | `True` | You need the 1s column to estimate β₀ |
| Using a model with no intercept | `True` | Must provide the constant term yourself |

#### The danger of getting it wrong

```python
# BAD: redundant intercept → multicollinearity
poly = PolynomialFeatures(degree=2, include_bias=True)  # adds 1s column
model = LinearRegression(fit_intercept=True)             # adds another intercept

# GOOD: let LinearRegression handle it
poly = PolynomialFeatures(degree=2, include_bias=False)
model = LinearRegression()  # fit_intercept=True by default
```

#### Bottom line

For typical sklearn workflows, always use `include_bias=False` and let the regression model handle the intercept.