# House Price Prediction With a Single Variable Linear Regression

This notebook explores a simple linear regression model using a single variable as a feature to predict the traget of house prices based on numerical features. The project focuses on understanding the full machine learning workflow, from data inspection to model evaluation.

**Goal:** Predict house prices using a single variable linear regression  
**Tools:** Python, pandas, scikit-learn, matplotlib


### Import Necessary Tools & Libraries

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

### Load the Dataset

In [None]:
df = pd.read_csv("Housing_Price_Data.csv")

### Inspect Data

In [None]:
df.info()

In [None]:
df.head()

### Clean & Select Relevant Data

In [None]:
df = df[['price','area']]

### Split Train/Test data

In [None]:
X = df[['area']]   
y = df['price']    

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

### Create The Model

In [None]:
model = LinearRegression()

### Train the Model (Fit it)

In [None]:
model.fit(X_train, y_train)

### Make Predictions

In [None]:
y_pred = model.predict(X_test)

### Visualize actual data and regression line

In [None]:


# Scatter plot of actual data (test set)
plt.scatter(X_test, y_test, color='red', label='Actual Prices')

# Sort X values for a clean regression line
X_line = X_test.sort_values(by='area')
y_line = model.predict(X_line)

# Plot regression line
plt.plot(X_line, y_line, color='blue', label='Regression Line')

# Labels & title
plt.xlabel('Area (sq ft)')
plt.ylabel('Price')
plt.title('House Price Prediction with Linear Regression')
plt.legend()

plt.show()


### Create a Predictions Dataframe

In [None]:
predictions = pd.DataFrame({
    'area': X_test['area'],
    'actual_price': y_test,
    'predicted_price': y_pred
})

### Save to CSV

In [None]:
predictions.to_csv('single_variable_prediction.csv', index=False)

### Evaluate the Model

In [None]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error (MSE):", mse)
print("R² Score:", r2)

## Model Evaluation

### Mean Squared Error (MSE)

The Mean Squared Error for this model is **3.68 × 10¹²**.  
Because house prices are large numerical values, the MSE is expected to be large since it represents squared price differences.

This value indicates that, on average, the model’s predictions deviate significantly from the true prices, which is consistent with the relatively low R² score and the fact that the model relies on only a single feature (area).

For this reason, MSE is best interpreted **relative to other models** rather than in isolation.


### R² Score
The R² score for this model is **0.27**, meaning that approximately **27% of the variance in house prices** is explained by house area alone.

This indicates a **weak to moderate linear relationship** between area and price. While house size contributes to price, a large portion of the variation is influenced by other factors such as location, number of bedrooms, condition, and amenities.

Because this model uses only a **single feature**, the R² score serves as a **baseline**. Including additional features in a multiple linear regression model would likely improve performance.
