<a href="https://colab.research.google.com/github/harunpirim/IME775/blob/main/week-10/notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>---

# Week 10: Principles of Nonlinear Feature Engineering
**IME775: Data Driven Modeling and Optimization**
ðŸ“– **Reference**: Watt, Borhani, & Katsaggelos (2020). *Machine Learning Refined* (2nd ed.), **Chapter 10**
---
## Learning Objectives
- Understand limitations of linear models
- Apply polynomial feature transformation
- Implement nonlinear regression and classification
- Connect feature engineering to model capacity


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

## Introduction (Section 10.1)
### The Limitation of Linear Models
Linear models: $f(x) = w^T \tilde{x}$
Can only represent **linear** relationships.
### The Solution: Feature Engineering
Transform input features nonlinearly, then apply linear model:
$$f(x) = w^T \phi(x)$$
Where $\phi(x)$ is a nonlinear feature transformation.


In [None]:
# Nonlinear data that linear model can't fit
np.random.seed(42)
x = np.linspace(-3, 3, 100)
y = x**2 + 0.5 * np.random.randn(100)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Linear fit
ax1 = axes[0]
coeffs_linear = np.polyfit(x, y, 1)
y_linear = np.polyval(coeffs_linear, x)
ax1.scatter(x, y, alpha=0.7, s=30)
ax1.plot(x, y_linear, 'r-', linewidth=2, label='Linear fit')
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.set_title('Linear Model: Poor Fit')
ax1.legend()
ax1.grid(True, alpha=0.3)
# Polynomial fit
ax2 = axes[1]
coeffs_poly = np.polyfit(x, y, 2)
y_poly = np.polyval(coeffs_poly, x)
ax2.scatter(x, y, alpha=0.7, s=30)
ax2.plot(x, y_poly, 'g-', linewidth=2, label='Polynomial (degree 2)')
ax2.set_xlabel('x')
ax2.set_ylabel('y')
ax2.set_title('Polynomial Features: Good Fit')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
fig

## Nonlinear Regression (Section 10.2)
### Polynomial Features
Transform $x$ into $\phi(x) = [1, x, x^2, \ldots, x^D]$
The model becomes:
$$f(x) = w_0 + w_1 x + w_2 x^2 + \cdots + w_D x^D$$
**Still linear in parameters** $w$, so we can use least squares!
### Multi-Dimensional Polynomials
For $x \in \mathbb{R}^n$, include all monomials up to degree $D$:
- Degree 1: $x_1, x_2$
- Degree 2: $x_1^2, x_1 x_2, x_2^2$
- Etc.


In [None]:
# Polynomial regression example
np.random.seed(42)
n = 50
X_train = np.sort(np.random.uniform(-3, 3, n)).reshape(-1, 1)
y_train = np.sin(X_train.ravel()) + 0.3 * np.random.randn(n)
X_test = np.linspace(-3, 3, 200).reshape(-1, 1)
fig2, ax2 = plt.subplots(figsize=(10, 6))
ax2.scatter(X_train, y_train, alpha=0.7, label='Training data')
for degree in [1, 3, 9]:
    poly = PolynomialFeatures(degree=degree)
    X_poly_train = poly.fit_transform(X_train)
    X_poly_test = poly.transform(X_test)
    model = LinearRegression()
    model.fit(X_poly_train, y_train)
    y_pred = model.predict(X_poly_test)
    ax2.plot(X_test, y_pred, linewidth=2, label=f'Degree {degree}')
ax2.plot(X_test, np.sin(X_test), 'k--', linewidth=1, alpha=0.5, label='True function')
ax2.set_xlabel('x')
ax2.set_ylabel('y')
ax2.set_title('Polynomial Regression with Different Degrees (ML Refined, Section 10.2)')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_ylim(-2, 2)
fig2

## Nonlinear Two-Class Classification (Section 10.4)
### The Idea
Apply same feature transformation, then linear classifier:
$$f(x) = \text{sign}(w^T \phi(x))$$
### Example: XOR Problem
XOR is not linearly separable in $\mathbb{R}^2$, but is linearly separable with polynomial features!


In [None]:
# XOR problem
np.random.seed(42)
n_per_class = 50
# Generate XOR data
X_xor = np.vstack([
    np.random.randn(n_per_class, 2) + [1, 1],
    np.random.randn(n_per_class, 2) + [-1, -1],
    np.random.randn(n_per_class, 2) + [1, -1],
    np.random.randn(n_per_class, 2) + [-1, 1]
])
y_xor = np.array([0]*2*n_per_class + [1]*2*n_per_class)
fig3, axes = plt.subplots(1, 2, figsize=(14, 5))
# Original space
ax1 = axes[0]
ax1.scatter(X_xor[y_xor==0, 0], X_xor[y_xor==0, 1], c='blue', s=30, alpha=0.7)
ax1.scatter(X_xor[y_xor==1, 0], X_xor[y_xor==1, 1], c='red', s=30, alpha=0.7)
ax1.set_xlabel('$x_1$')
ax1.set_ylabel('$x_2$')
ax1.set_title('XOR Problem: Not Linearly Separable')
ax1.grid(True, alpha=0.3)
# Add feature x1*x2
X_xor_poly = np.column_stack([X_xor, X_xor[:, 0] * X_xor[:, 1]])
ax2 = axes[1]
ax2.scatter(X_xor_poly[y_xor==0, 0], X_xor_poly[y_xor==0, 2], c='blue', s=30, alpha=0.7)
ax2.scatter(X_xor_poly[y_xor==1, 0], X_xor_poly[y_xor==1, 2], c='red', s=30, alpha=0.7)
ax2.axhline(0, color='black', linewidth=2, linestyle='--')
ax2.set_xlabel('$x_1$')
ax2.set_ylabel('$x_1 \\cdot x_2$')
ax2.set_title('With Feature $x_1 x_2$: Linearly Separable!')
ax2.grid(True, alpha=0.3)
plt.tight_layout()
fig3

## The Bias-Variance Trade-off
### Model Complexity
| Aspect | Low Complexity | High Complexity |
|--------|----------------|-----------------|
| Features | Few | Many |
| Degree | Low | High |
| Bias | High | Low |
| Variance | Low | High |
| Training error | High | Low |
| Test error | Can be high | Can be high |
### The Goal
Find the **sweet spot** where total error (biasÂ² + variance) is minimized.


## Summary
| Concept | Key Idea |
|---------|----------|
| **Nonlinear features** | Transform inputs, keep linear model |
| **Polynomial features** | $\phi(x) = [1, x, x^2, \ldots]$ |
| **Model capacity** | Controlled by feature complexity |
| **Trade-off** | Balance bias and variance |
---
## References
- **Primary**: Watt, J., Borhani, R., & Katsaggelos, A. K. (2020). *Machine Learning Refined* (2nd ed.), Chapter 10.
- **Supplementary**: Bishop, C. M. (2006). *Pattern Recognition and Machine Learning*, Chapter 3.
## Next Week
**Principles of Feature Learning & Cross-Validation** (Chapter 11): Automatic feature learning and model selection.
