# House Pricing — Notebook Upload Template

This is a **template** notebook to show the expected *format* for a typical “House Pricing Data Analytics” lab submission.

✅ What graders usually expect:
- The file is a **.ipynb** notebook
- **All cells are executed** (outputs are visible)
- Your code uses the dataset provided in your lab (CSV / DataFrame)
- Plots render correctly

> Replace the `TODO` sections with your own code from the lab and run **Kernel → Restart & Run All** before downloading and uploading.


## 0) Setup

In [None]:
# TODO: import the required libraries
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
# If your lab uses seaborn:
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures


## 1) Load dataset

In [None]:
# TODO: load the dataset used in your lab
# Example (change the path/filename to your dataset):
# df = pd.read_csv("kc_house_data.csv")

# Safety check:
# display(df.head())


## 1.1) Display data types (dtypes)

In [None]:
# TODO: show data types
# df.dtypes


## 1.2) Drop columns and run describe()

In [None]:
# TODO: drop columns (adjust names based on your dataset), then describe
# df.drop(["id", "Unnamed: 0"], axis=1, inplace=True, errors="ignore")
# df.describe()


## 1.3) value_counts on floors and convert to DataFrame

In [None]:
# TODO: value_counts for floors, then to_frame()
# df["floors"].value_counts().to_frame()


## 1.4) Boxplot: price vs waterfront

In [None]:
# TODO: boxplot comparing price outliers for waterfront vs non-waterfront
# sns.boxplot(x="waterfront", y="price", data=df)
# plt.show()


## 1.5) regplot: sqft_above vs price

In [None]:
# TODO: regplot to visualize correlation
# sns.regplot(x="sqft_above", y="price", data=df, scatter_kws={"alpha":0.3})
# plt.show()


## 1.6) Simple Linear Regression: sqft_living → price (R²)

In [None]:
# TODO: fit linear regression and compute R^2
# X = df[["sqft_living"]]
# y = df["price"]
# lm = LinearRegression()
# lm.fit(X, y)
# r2 = lm.score(X, y)
# r2


## 1.7) Multiple Linear Regression (multiple features → price)

In [None]:
# TODO: use multiple features (choose the ones required by your lab)
# features = ["sqft_living", "bedrooms", "bathrooms", "floors", "grade"]
# X = df[features]
# y = df["price"]
# lm = LinearRegression()
# lm.fit(X, y)
# lm.score(X, y)


## 1.8) Pipeline: scaling + linear regression (R²)

In [None]:
# TODO: pipeline with StandardScaler + LinearRegression
# features = ["sqft_living", "bedrooms", "bathrooms", "floors", "grade"]
# X = df[features]
# y = df["price"]

# pipe = Pipeline([
#     ("scale", StandardScaler()),
#     ("model", LinearRegression())
# ])
# pipe.fit(X, y)
# pipe.score(X, y)


## 1.9) Train/Test split + Ridge (alpha=0.1) — Test R²

In [None]:
# TODO: split, fit Ridge, and score on the test set
# features = ["sqft_living", "bedrooms", "bathrooms", "floors", "grade"]
# X = df[features]
# y = df["price"]

# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=1)

# ridge = Ridge(alpha=0.1)
# ridge.fit(X_train, y_train)
# ridge.score(X_test, y_test)


## 1.10) Polynomial (degree=2) + Ridge (alpha=0.1) — Test R²

In [None]:
# TODO: polynomial transform + ridge in a pipeline and score on the test set
# features = ["sqft_living", "bedrooms", "bathrooms", "floors", "grade"]
# X = df[features]
# y = df["price"]

# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=1)

# poly_ridge = Pipeline([
#     ("poly", PolynomialFeatures(degree=2, include_bias=False)),
#     ("scale", StandardScaler(with_mean=False)),
#     ("ridge", Ridge(alpha=0.1))
# ])
# poly_ridge.fit(X_train, y_train)
# poly_ridge.score(X_test, y_test)


## Final checklist before upload
1. **Kernel → Restart & Run All** (no red errors)
2. All figures/outputs visible
3. **File → Download** the notebook as **.ipynb**
4. Upload that file (must be **< 10 MB**)
