# **Multiple Linear Regression in Machine Learning**
This notebook demonstrates how to implement Multiple Linear Regression using Python and the `scikit-learn` library. We will go through:
- Data preprocessing
- Model training
- Model evaluation
- Visualization of results

## **1. Importing Necessary Libraries**
We begin by importing the essential Python libraries required for data handling, visualization, and building a regression model.

In [None]:

import numpy as np  # For numerical operations
import pandas as pd  # For data manipulation
import matplotlib.pyplot as plt  # For visualization
import seaborn as sns  # For statistical visualizations
from sklearn.model_selection import train_test_split  # To split data into training and testing sets
from sklearn.linear_model import LinearRegression  # Multiple Linear Regression model
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score  # Performance evaluation metrics


## **2. Load and Explore Dataset**
We read the dataset into a Pandas DataFrame and perform some basic exploratory analysis.

In [None]:

# Load dataset (Ensure to update the file path if necessary)
file_path = "sample_data.csv"  
df = pd.read_csv(file_path)

# Display the first few rows of the dataset
df.head()


### **Checking for Missing Values**
Before proceeding, we check if there are any missing values in the dataset, as they can impact model performance.

In [None]:
df.isnull().sum()

## **3. Data Preprocessing**
We define independent (X) and dependent (y) variables. In Multiple Linear Regression, we have multiple independent variables affecting the dependent variable.

In [None]:

# Define Independent (X) and Dependent (y) Variables
X = df[['Feature1', 'Feature2', 'Feature3']]  # Replace with actual feature column names
y = df['Target']  # Replace with actual target column name


### **Splitting Data into Training and Testing Sets**
We split the dataset into **80% training** and **20% testing** to evaluate the model’s performance.

In [None]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## **4. Training the Multiple Linear Regression Model**
We create an instance of the `LinearRegression` model and train it using the training data.

In [None]:

# Initialize and Train the Model
model = LinearRegression()
model.fit(X_train, y_train)


### **Model Coefficients & Intercept**
The coefficients represent how much the target variable changes with each predictor variable.

In [None]:

print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)


## **5. Model Evaluation**
We use performance metrics such as:
- **Mean Absolute Error (MAE)**: Average absolute errors.
- **Mean Squared Error (MSE)**: Average squared differences.
- **Root Mean Squared Error (RMSE)**: Standard deviation of errors.
- **R-squared Score (R²)**: Measures how well independent variables explain variability in the dependent variable.

In [None]:

# Predicting on Test Data
y_pred = model.predict(X_test)

# Calculating Evaluation Metrics
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)
print("Root Mean Squared Error (RMSE):", rmse)
print("R-squared Score (R²):", r2)


## **6. Visualization of Predictions**
A scatter plot is used to compare actual vs predicted values.

In [None]:

plt.figure(figsize=(8,5))
sns.scatterplot(x=y_test, y=y_pred)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs Predicted Values")
plt.show()


## **Conclusion**
- We successfully implemented Multiple Linear Regression.
- We trained the model using multiple predictors.
- Evaluated its performance using statistical metrics.
- Visualized actual vs predicted values.

**Next Steps:** You can experiment with different features or techniques like feature scaling and polynomial regression to improve accuracy.