
# 🏠 House Prices - Advanced Regression Techniques

This notebook is built for the Kaggle competition using the Ames Housing dataset.  
We'll perform:
- Data loading and exploration
- Basic data cleaning
- Feature selection
- Model training with RandomForestRegressor
- Submission file generation


In [5]:

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.impute import SimpleImputer

train_df = pd.read_csv("train.csv")
test_df = pd.read_csv("test.csv")


print("Train shape:", train_df.shape)
print("Test shape:", test_df.shape)


Train shape: (1460, 81)
Test shape: (1459, 80)


In [6]:

numerical_cols = train_df.select_dtypes(include=['int64', 'float64']).columns.tolist()
numerical_cols.remove('Id')
numerical_cols.remove('SalePrice')

X = train_df[numerical_cols]
y = train_df['SalePrice']
X_test = test_df[numerical_cols]

imputer = SimpleImputer(strategy='mean')
X = imputer.fit_transform(X)
X_test = imputer.transform(X_test)


In [7]:

model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X, y)

predictions = model.predict(X_test)
submission = pd.DataFrame({
    "Id": test_df["Id"],
    "SalePrice": predictions
})
submission.to_csv("submission.csv", index=False)
print("✅ submission.csv created.")


✅ submission.csv created.
