# Notebook 5 — Linear Regression (Simple & Multiple)

**Dataset:** House Prices — Advanced Regression Techniques (Kaggle)

**Purpose:** Demonstrate simple linear regression and multiple linear regression with feature selection, metrics, and residual analysis.

## Setup & Data note
Download `train.csv` from the Kaggle House Prices competition or another house-prices CSV and place it in the working directory.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

CSV = 'train.csv'
try:
    df = pd.read_csv(CSV)
    print('Loaded house prices dataset:', df.shape)
    display(df[['SalePrice']].head())
except Exception as e:
    print('Could not load train.csv — please download and place it in the working directory.\n', e)


## Simple Linear Regression example
Predict `SalePrice` from a single variable (e.g., `GrLivArea`)


In [None]:
try:
    X = df[['GrLivArea']].fillna(df['GrLivArea'].median())
    y = df['SalePrice']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    lr = LinearRegression()
    lr.fit(X_train, y_train)
    y_pred = lr.predict(X_test)
    print('RMSE:', mean_squared_error(y_test, y_pred, squared=False))
    print('R2:', r2_score(y_test, y_pred))
except Exception as e:
    print('Simple regression failed — check column names.\n', e)


## Multiple Regression example (basic feature selection)

In [None]:
try:
    features = ['OverallQual','GrLivArea','GarageCars','TotalBsmtSF','FullBath']
    X = df[features].fillna(df.median())
    y = df['SalePrice']
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    lr = LinearRegression()
    lr.fit(X_train, y_train)
    y_pred = lr.predict(X_test)
    print('RMSE:', mean_squared_error(y_test, y_pred, squared=False))
    print('R2:', r2_score(y_test, y_pred))
except Exception as e:
    print('Multiple regression failed — check that the selected features exist.\n', e)
