**Importing Libraries**

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error



*NumPy and Pandas are imported for numerical operations and data manipulation, respectively.

The train_test_split function from Scikit-learn is imported to split the dataset into training and testing sets.

The LinearRegression model from Scikit-learn is imported for linear regression.

The mean_squared_error function from Scikit-learn is imported to evaluate the model's performance.

The fetch_california_housing function from Scikit-learn is imported to load the California housing dataset.*

**Loading the California Housing Dataset**

In [2]:
from sklearn.datasets import fetch_california_housing


# Fetch the California housing dataset
california = fetch_california_housing()

# Create a DataFrame for the feature data
X = pd.DataFrame(california.data, columns=california.feature_names)

# Create a Series for the target data
y = pd.Series(california.target, name='target')




*The California housing dataset is loaded into a variable.

The feature data is converted into a DataFrame with appropriate column names using Pandas.

The target data is converted into a Series using Pandas.*

In [3]:
X

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
0,8.3252,41.0,6.984127,1.023810,322.0,2.555556,37.88,-122.23
1,8.3014,21.0,6.238137,0.971880,2401.0,2.109842,37.86,-122.22
2,7.2574,52.0,8.288136,1.073446,496.0,2.802260,37.85,-122.24
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25
...,...,...,...,...,...,...,...,...
20635,1.5603,25.0,5.045455,1.133333,845.0,2.560606,39.48,-121.09
20636,2.5568,18.0,6.114035,1.315789,356.0,3.122807,39.49,-121.21
20637,1.7000,17.0,5.205543,1.120092,1007.0,2.325635,39.43,-121.22
20638,1.8672,18.0,5.329513,1.171920,741.0,2.123209,39.43,-121.32


In [4]:
y

0        4.526
1        3.585
2        3.521
3        3.413
4        3.422
         ...  
20635    0.781
20636    0.771
20637    0.923
20638    0.847
20639    0.894
Name: target, Length: 20640, dtype: float64

 **Splitting the Dataset**

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


*The dataset is split into training and testing sets, with 80% of the data used for training and 20% for testing.

A random state is set to ensure reproducibility.

**Using random_state=42 ensures**:

Reproducibility: The data split is consistent across runs.

Convention: 42 is commonly used as a standard reference value.

Debugging: Easier to compare results and debug.*

**Training a Model**

In [6]:
model = LinearRegression()
model.fit(X_train, y_train)


*A linear regression model is instantiated and trained using the training data.*

**Making Predictions**

In [7]:
y_pred = model.predict(X_test)


*The trained model makes predictions on the test data.*

**Evaluating the Model**


In [8]:
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')


Mean Squared Error: 0.5558915986952442


*The performance of the model is evaluated using the mean squared error metric, which measures the average squared difference between predicted and actual values.*






**Comparing Test Set with the Model's Prediction**

In [9]:
print(y_test.head())


20046    0.47700
3024     0.45800
15663    5.00001
20484    2.18600
9814     2.78000
Name: target, dtype: float64


*This will show you the first few actual median house values from the test set that the model's predictions (y_pred) are being compared against.*





