<a href="https://colab.research.google.com/github/lubaochuan/ml_python/blob/main/ISLP_chapter3_assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Understanding Linear Regression

This guided lab builds intuition for linear regression through **hands-on experiments**.
You will explore:
- Model fitting and interpretation
- RÂ² and model evaluation
- Train/test splits and generalization
- Biasâ€“variance intuition with interactive sliders
- Failure cases: nonlinearity and outliers

ðŸ‘‰**Sudent name:**

ðŸ‘‰**Date:**

## Step 1: Setup and Data Generation

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split
from ipywidgets import interact, IntSlider

np.random.seed(42)

X = np.linspace(0, 10, 40)
y = 3 * X + 5 + np.random.normal(0, 3, size=len(X))

X = X.reshape(-1, 1)

## Step 2: Visualize the Data

In [None]:

plt.scatter(X, y)
plt.xlabel("X")
plt.ylabel("y")
plt.title("Observed Data")
plt.show()


## Step 3: Fit a Simple Linear Regression Model

In [None]:

model = LinearRegression()
model.fit(X, y)

y_pred = model.predict(X)

print("Intercept:", model.intercept_)
print("Slope:", model.coef_[0])


## Step 4: RÂ² (Goodness of Fit)

In [None]:

r2 = r2_score(y, y_pred)
print("RÂ² score:", r2)



**Interpretation of RÂ²:**
- Measures how much of the variation in y is explained by the model
- RÂ² = 1 means perfect explanation
- High RÂ² does NOT guarantee good predictions on new data


## Step 5: Train/Test Split (Generalization)

In [None]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

model_tt = LinearRegression()
model_tt.fit(X_train, y_train)

train_r2 = r2_score(y_train, model_tt.predict(X_train))
test_r2 = r2_score(y_test, model_tt.predict(X_test))

print("Training RÂ²:", train_r2)
print("Test RÂ²:", test_r2)



**Questions:**
1. Why is test RÂ² usually lower?
2. What does this say about generalization?

ðŸ‘‰**Your answer:**

## Step 6: Biasâ€“Variance Intuition with Model Complexity

In [None]:

def polynomial_fit(degree):
    coeffs = np.polyfit(X.flatten(), y, degree)
    y_hat = np.polyval(coeffs, X)

    plt.scatter(X, y, label="Data")
    plt.plot(X, y_hat, label=f"Degree {degree}")
    plt.legend()
    plt.title("Biasâ€“Variance Tradeoff")
    plt.show()

interact(polynomial_fit, degree=IntSlider(min=1, max=10, step=1, value=1))



**Interpretation:**
- Low degree â†’ high bias (underfitting)
- High degree â†’ high variance (overfitting)


## Step 7: Failure Case â€“ Nonlinear Relationship

In [None]:

X_nl = np.linspace(0, 10, 40)
y_nl = X_nl**2 + np.random.normal(0, 5, size=len(X_nl))

X_nl = X_nl.reshape(-1, 1)

model_nl = LinearRegression()
model_nl.fit(X_nl, y_nl)

plt.scatter(X_nl, y_nl)
plt.plot(X_nl, model_nl.predict(X_nl))
plt.title("Linear Model on Nonlinear Data")
plt.show()



**Key idea:** Linear models struggle when the true relationship is nonlinear.


## Step 8: Failure Case â€“ Outliers

In [None]:

X_out = X.copy()
y_out = y.copy()

# Add extreme outliers
y_out[0] += 30
y_out[-1] -= 25

model_out = LinearRegression()
model_out.fit(X_out, y_out)

plt.scatter(X_out, y_out)
plt.plot(X_out, model_out.predict(X_out))
plt.title("Effect of Outliers on Linear Regression")
plt.show()



**Key idea:** Least squares is sensitive to extreme outliers.


## Questions


1. What does RÂ² measure, and what does it not measure?
2. Why is test performance more important than training performance?
3. How does model complexity relate to bias and variance?
4. Why does linear regression fail on nonlinear data?
5. Why are outliers especially problematic for linear regression?

ðŸ‘‰**Your answer:**

## Key Takeaways


- Linear regression is powerful but limited
- RÂ² must be interpreted carefully
- Generalization matters more than fit
- Biasâ€“variance tradeoff guides model choice
- Understanding failure cases is essential
