# House Prices Prediction with Linear Regression
This notebook demonstrates a basic machine learning pipeline for predicting house prices using linear regression.

## Step 1: Import necessary libraries
We use pandas and numpy for data manipulation, matplotlib and seaborn for visualization, and sklearn for building the model.

In [None]:
# Data manipulation
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine learning
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

import nbtest

## Step 2: Load the dataset
We load the training dataset containing house features and sale prices.

In [None]:
# Load training data
import os

if os.path.exists("./input/train.csv"):
    train_data_path = "./input/train.csv"
elif os.path.exists("../input/train.csv"):
    train_data_path = "../input/train.csv"
else:
    raise FileNotFoundError("train.csv not found in ./input/ or ../input/")

data = pd.read_csv(train_data_path)

# Preview the first few rows
data.head()

## Step 3: Prepare the data
We remove the 'Id' column (not useful for prediction) and handle missing values by dropping them for simplicity.

In [None]:
# Remove ID column
data = data.drop('Id', axis=1)

# Preview cleaned data
data.head()

## Step 4: Feature selection
We use only numeric features for simplicity, and separate the target column 'SalePrice'.

In [None]:
# Use only numeric features
numeric_data = data.select_dtypes(include=[np.number])

# Fill missing values with the mean of each column
numeric_data = numeric_data.fillna(numeric_data.mean())

# Separate features and target variable
X = numeric_data.drop("SalePrice", axis=1)
y = numeric_data["SalePrice"]

## Step 5: Train-test split
We split the data into training and testing sets to evaluate our model.

In [None]:
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 6: Train a Linear Regression model

In [None]:
# Initialize the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

## Step 7: Evaluate the model
We evaluate how well our model performs on the test data using Mean Squared Error (MSE).

In [None]:
# Predict on test data
predictions = model.predict(X_test)

# Calculate mean squared error
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error:", mse)

## Step 8: Visualize predictions
We compare predicted vs actual sale prices.

In [None]:
# Plot actual vs predicted prices
plt.scatter(y_test, predictions)
plt.xlabel("Actual Sale Price")
plt.ylabel("Predicted Sale Price")
plt.title("Actual vs Predicted House Prices")
plt.show()