# Task 1: Predict House Prices using Linear Regression
This notebook is part of the Machine Learning Internship – Week 1.  
The goal is to predict median house prices using the California Housing dataset with a Linear Regression model.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score


In [None]:
# Load the California Housing dataset
data = fetch_california_housing()

# Convert to DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)
df['MedHouseVal'] = data.target  # Target column (median house value)


In [None]:
# Preview the dataset
print("First 5 rows:")
print(df.head())

# Check data structure
print("\nData Information:")
print(df.info())

# Show basic statistics
print("\nSummary Statistics:")
print(df.describe())


In [None]:
# Define features and target variable
X = df.drop('MedHouseVal', axis=1)
y = df['MedHouseVal']


In [None]:
# Split the dataset into training and testing (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)


In [None]:
# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error (MSE):", mse)
print("R² Score:", r2)


In [None]:
# Plot actual vs predicted values
plt.figure(figsize=(8, 5))
plt.scatter(y_test, y_pred, alpha=0.5, color='blue')
plt.xlabel("Actual Median House Value")
plt.ylabel("Predicted Median House Value")
plt.title("Actual vs Predicted House Prices")
plt.grid(True)
plt.show()


In [None]:
# Plot prediction error distribution
errors = y_test - y_pred
plt.figure(figsize=(8, 5))
plt.hist(errors, bins=50, color='green', edgecolor='black')
plt.title("Prediction Error Distribution")
plt.xlabel("Prediction Error")
plt.ylabel("Frequency")
plt.grid(True)
plt.show()


## Summary
This week I learned the fundamentals of regression using supervised learning in Python.  
I worked with the California Housing dataset to build a Linear Regression model that predicts median house prices  
based on features like house age, number of rooms, and location.  
I used scikit-learn to train and evaluate the model, and visualized both the prediction results  
and error distribution. The R² score helped measure how well the model fit the data.  
Overall, I gained hands-on experience in applying machine learning for real-world regression tasks.
