# Unit 3: Polynomial Regression

This notebook explores how polynomial regression can be used to model non-linear relationships in data, particularly between engine size and CO₂ emissions.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

In [None]:
# Load dataset (make sure cars.csv is in the same directory)
df = pd.read_csv('cars.csv')
df[['Volume', 'CO2']].head()

In [None]:
# Select feature and target
X = df[['Volume']].values
y = df['CO2'].values

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# Create polynomial features (degree 2)
poly = PolynomialFeatures(degree=2)
X_poly_train = poly.fit_transform(X_train)
X_poly_test = poly.transform(X_test)

# Fit the model
model = LinearRegression()
model.fit(X_poly_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_poly_test)
print("R² Score:", r2_score(y_test, y_pred))

In [None]:
# Visualize results
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.scatter(X_test, y_pred, color='red', label='Predicted')
plt.xlabel('Engine Volume')
plt.ylabel('CO2 Emissions')
plt.title('Polynomial Regression (Degree 2)')
plt.legend()
plt.grid(True)
plt.show()

## 🧠 Reflection

- **What?** Applied polynomial regression to capture curvature in the relationship between engine volume and emissions.
- **So What?** Non-linear models like this can better fit real-world data that does not follow a straight-line trend.
- **What Next?** Consider experimenting with higher-degree polynomials or regularization to reduce overfitting.