# Polynomial Regresion

When you apply `PolynomialFeatures` from scikit-learn to a dataset with say n=9 features $(X_1, X_2, \ldots, X_9)$, it generates new features by creating all possible polynomial and interaction terms up to the specified `degree`.

### How `PolynomialFeatures` Works
- **Input**: An example dataset with 9 numerical features: $[X_1, X_2, X_3, X_4, X_5, X_6, X_7, X_8, X_9]$.
- **Parameters**:
  - `degree`: The maximum degree of the polynomial features (e.g., `degree=2` generates terms up to $(X_i^2)$.
  - `interaction_only`: 
    - If `False` (default), includes all polynomial terms and interactions. 
    - If `True`, only includes interaction terms (e.g., $X_1 \cdot X_2$, but not $X_1^2$).
  - `include_bias`: If `True` (default), includes a constant term (1) as a feature.
- **Output**: A new feature matrix containing:
  - The original features.
  - Polynomial terms (e.g., $X_1^2, X_1^3, \ldots$) for each feature, up to `degree`.
  - Interaction terms (e.g., $X_1 \cdot X_2, X_1 \cdot X_3, \ldots$) for all feature combinations, up to `degree`.
  - A bias term (if `include_bias=True`).

### Number of Features Generated
The number of output features depends on the `degree`, `interaction_only`, and `include_bias`. The formula for the total number of features (including the bias term) when `interaction_only=False` is given by the number of combinations of powers $a_1, a_2, \ldots, a_9$ such that $0 \leq a_1 + a_2 + \cdots + a_9 \leq \text{degree}$, where $a_i$ are the powers of features $X_i$. Mathematically, for $n$ features (here, $n=9$) and degree $d$, the number of features (including bias) is: $\binom{n + d}{d}$

If `include_bias=False`, subtract 1 from the result. 

If `interaction_only=True`, the formula changes to focus only on interaction terms.

### Recommendations
- **Scaling**: Spply `StandardScaler` or `MinMaxScaler` **before** `PolynomialFeatures` to ensure features are on the same scale, preventing large polynomial terms from dominating.
- **Feature Selection**: With 715 features for `degree=4`, consider dimensionality reduction (e.g., PCA) or feature selection (e.g., `SelectKBest`) after `PolynomialFeatures` to avoid overfitting and reduce computational cost.
- **Sparsity**: If your data is sparse, set `PolynomialFeatures(sparse=True)` to maintain sparsity in the output (available in newer scikit-learn versions).
- **Validation**: Test lower degrees (e.g., 2 or 3) first, as `degree=4` with 9 features may overfit unless you have a large dataset.
- **include_bias=False**: as `LinearRegression(fit_intercept=True)` to avoid redundancy.


In [1]:
import numpy as np
from sklearn.preprocessing import PolynomialFeatures

# Simulated data: 2 samples, 1 bias, 2 features
np.random.seed(282)
X = np.random.rand(2, 3)
print("X")
print(X)

X
[[0.36842629 0.39172236 0.12117024]
 [0.30199636 0.92621505 0.99207497]]


In [2]:
# PolynomialFeatures with degree=1, include_bias=False, interaction_only=False
# WIll result in exactly the same number of features as the original features
poly1FF = PolynomialFeatures(degree=1, include_bias=False, interaction_only=False)
X_poly1 = poly1FF.fit_transform(X)
features = poly1FF.get_feature_names_out(input_features=['X1', 'X2', 'X3'])
print("XPoly")
print(features)
print(X_poly1.round(3))

XPoly
['X1' 'X2' 'X3']
[[0.368 0.392 0.121]
 [0.302 0.926 0.992]]


In [3]:
np.allclose(X, X_poly1)

True

In [4]:
# Add degree=1, include_bias as True
poly1TF = PolynomialFeatures(degree=1, include_bias=True, interaction_only=False)
X_poly2 = poly1TF.fit_transform(X)
features = poly1TF.get_feature_names_out(input_features=['X1', 'X2', 'X3'])
print("XPoly2")
print(features)
print(X_poly2.round(3))

XPoly2
['1' 'X1' 'X2' 'X3']
[[1.    0.368 0.392 0.121]
 [1.    0.302 0.926 0.992]]


In [5]:
# Add degree=1, interaction_only
poly1FT = PolynomialFeatures(degree=1, include_bias=False, interaction_only=True)
X_poly1FT = poly1FT.fit_transform(X)
features = poly1FT.get_feature_names_out(input_features=['X1', 'X2', 'X3'])
print("XPoly1FT")
print(features)
print(X_poly1FT.round(3))

XPoly1FT
['X1' 'X2' 'X3']
[[0.368 0.392 0.121]
 [0.302 0.926 0.992]]


In [15]:
# Add degree=3, include_bias as False, interaction_only False
poly3FF = PolynomialFeatures(degree=3, include_bias=False, interaction_only=False)
X_poly3FF = poly3FF.fit_transform(X)
features = poly3FF.get_feature_names_out(input_features=['X1', 'X2', 'X3'])
print("XPoly2FF")
print(features)
print(X_poly3FF.round(3))

XPoly2FF
['X1' 'X2' 'X3' 'X1^2' 'X1 X2' 'X1 X3' 'X2^2' 'X2 X3' 'X3^2' 'X1^3'
 'X1^2 X2' 'X1^2 X3' 'X1 X2^2' 'X1 X2 X3' 'X1 X3^2' 'X2^3' 'X2^2 X3'
 'X2 X3^2' 'X3^3']
[[0.368 0.392 0.121 0.136 0.144 0.045 0.153 0.047 0.015 0.05  0.053 0.016
  0.057 0.017 0.005 0.06  0.019 0.006 0.002]
 [0.302 0.926 0.992 0.091 0.28  0.3   0.858 0.919 0.984 0.028 0.084 0.09
  0.259 0.277 0.297 0.795 0.851 0.912 0.976]]


In [16]:
poly3FF = PolynomialFeatures(degree=3, include_bias=False, interaction_only=False)
X1 = np.random.rand(1, 4)
X_poly3FF = poly3FF.fit_transform(X1)
features = poly3FF.get_feature_names_out(input_features=['X1', 'X2', 'X3', 'X4'])
print(features)

['X1' 'X2' 'X3' 'X4' 'X1^2' 'X1 X2' 'X1 X3' 'X1 X4' 'X2^2' 'X2 X3' 'X2 X4'
 'X3^2' 'X3 X4' 'X4^2' 'X1^3' 'X1^2 X2' 'X1^2 X3' 'X1^2 X4' 'X1 X2^2'
 'X1 X2 X3' 'X1 X2 X4' 'X1 X3^2' 'X1 X3 X4' 'X1 X4^2' 'X2^3' 'X2^2 X3'
 'X2^2 X4' 'X2 X3^2' 'X2 X3 X4' 'X2 X4^2' 'X3^3' 'X3^2 X4' 'X3 X4^2'
 'X4^3']


### When to apply Polynomial Transformation
Polynomial regression (using PolynomialFeatures with a linear model like LinearRegression or Ridge) is appropriate when:

1. **Non-linear Relationships:** The relationship between features and the target (house value) is non-linear (e.g., house value may increase quadratically with income up to a point).
2. **Feature Interactions:** Interactions between features (e.g., Latitude * Longitude) capture combined effects critical to the target.
3. **Sufficient Data:** The dataset (~20,640 samples) can support additional features without severe overfitting, especially with regularization.
4. **Domain Knowledge:** Features like income or location are known to have non-linear or interactive effects on house prices.

### When to Avoid Polynomial Regression:

1. **Noisy or Weak Features:** Polynomial terms for noisy or weakly predictive features amplify noise (e.g., Population’s outliers).
2. **High Dimensionality:** Applying high-degree polynomials to all features causes feature explosion (e.g., degree=2 for 8 features yields 45 features), risking overfitting and computational cost.
3. **Categorical Features:** Polynomial terms are meaningless for one-hot encoded variables like ocean_proximity.