#### Importing necessary libraries

In this block, we import the necessary libraries for the implementation, including pandas for data manipulation, numpy for numerical computations, matplotlib for data visualization, seaborn for additional visualization, sklearn for machine learning functionalities, and pickle for model serialization.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import pickle

#### Loading the Boston House Pricing Dataset

In this block, we load the Boston House Pricing Dataset using the load_boston() function from sklearn.datasets module.

In [None]:
boston = load_boston()

#### Checking the keys and description of the dataset

In this block, we check the keys and the description of the dataset using the keys() and DESCR attributes of the loaded dataset object.

In [None]:
print(boston.keys())
print(boston.DESCR)

#### Creating a DataFrame from the dataset

In this block, we create a DataFrame from the dataset using pandas, where the data is stored in the boston.data attribute and the feature names are used as column names. We also add a 'Price' column to the DataFrame, which contains the target variable (house prices).

In [None]:
dataset = pd.DataFrame(boston.data, columns=boston.feature_names)
dataset['Price'] = boston.target

#### Exploratory Data Analysis (EDA)

In this block, we perform exploratory data analysis (EDA) on the dataset. We check the information and summary statistics of the dataset using the info() and describe() methods of the DataFrame, respectively.

In [None]:
dataset.info()
dataset.describe()

#### Checking for missing values in the dataset

In this block, we check for missing values in the dataset using the isnull() method of the DataFrame, followed by sum() to count the number of missing values in each column.

In [None]:
dataset.isnull().sum()

#### Checking correlation between features

In this block, we check the correlation between features in the dataset using the corr() method of the DataFrame. This helps us identify any linear relationships between the features and the target variable.

In [None]:
dataset.corr()

#### Visualizing the correlation using pairplot

In this block, we create a pairplot using seaborn to visualize the scatterplots between pairs of features in the dataset, along with the distributions of each feature.

In [None]:
sns.pairplot(dataset)

#### Splitting the dataset into independent and dependent features

In this block, we split the dataset into independent features (X) and the dependent feature (y) which is the target variable.

In [None]:
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]

#### Performing train-test split

In this block, we perform train-test split using the train_test_split() function from sklearn.model_selection module, where we specify the test size as 0.3 (30% of the data) and set a random state for reproducibility.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

#### Standardizing the dataset

We standardize the independent features (X_train and X_test) using the StandardScaler() class from sklearn.preprocessing module. This scales the features to have zero mean and unit variance, which can be important for some machine learning algorithms.

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

#### Training the Linear Regression model

In this block, we instantiate a LinearRegression model and fit it to the training data using the fit() method. This trains the model to learn the relationship between the independent features and the target variable.

In [None]:
regressor = LinearRegression()
regressor.fit(X_train, y_train)

#### Making predictions on the test data

In this block, we use the trained Linear Regression model to make predictions on the test data using the predict() method. This generates predicted values for the target variable based on the learned model.

In [None]:
y_pred = regressor.predict(X_test)

#### Evaluating the model performance

In this block, we evaluate the performance of the Linear Regression model using common regression metrics. We calculate the Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) score using the respective functions from sklearn.metrics module. These metrics provide information about how well the model is predicting the target variable.

In [None]:

mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error (MSE):", mse)
print("Mean Absolute Error (MAE):", mae)
print("R-squared (R2) Score:", r2)


#### Saving the trained model to a file

In this block, we save the trained Linear Regression model to a file using the pickle.dump() function from pickle module. This allows us to serialize the model and store it for future use without retraining.

In [None]:
with open('linear_regression_model.pkl', 'wb') as file:
    pickle.dump(regressor, file)

#### Loading the trained model from the file

In this block, we load the trained Linear Regression model from the saved file using the pickle.load() function. This allows us to retrieve the serialized model and use it for making predictions without retraining.

In [None]:
with open('linear_regression_model.pkl', 'rb') as file:
    loaded_model = pickle.load(file)