# Linear Regression in Machine Learning

## Introduction
This notebook demonstrates the implementation of **Linear Regression**, a fundamental supervised learning algorithm used for predicting continuous values.
We will:
- Load a dataset
- Perform **Exploratory Data Analysis (EDA)**
- Preprocess the data
- Train a **Linear Regression Model**
- Evaluate the model's performance
- Visualize the results


In [None]:
# Import necessary libraries
import numpy as np  # For numerical computations
import pandas as pd  # For handling datasets
import matplotlib.pyplot as plt  # For data visualization
import seaborn as sns  # For advanced visualization
from sklearn.model_selection import train_test_split  # For splitting data
from sklearn.linear_model import LinearRegression  # Linear Regression Model
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score  # Model evaluation metrics

## Load Dataset
We will load the dataset from a CSV file. Ensure the correct file path is provided.

In [2]:
# Load dataset
file_path = "sample_data.csv"  # Update this path accordingly
df = pd.read_csv(file_path)
df.head()  # Display first five rows

## Data Exploration
### Checking for missing values
It's essential to identify missing values before proceeding.

In [3]:
df.isnull().sum()  # Count missing values in each column

### Summary statistics
Provides insights into the dataset distribution.

In [4]:
df.describe()

## Data Visualization
Visualizing the relationship between **Experience** and **Salary**.

In [5]:
plt.figure(figsize=(8,5))
sns.scatterplot(x=df['Experience'], y=df['Salary'])
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.title("Experience vs Salary")
plt.show()

## Data Preprocessing
Splitting data into **independent (X)** and **dependent (y)** variables.

In [6]:
X = df[['Experience']]
y = df['Salary']

## Splitting Data
We divide the dataset into **80% training** and **20% testing** for evaluating model performance.

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Model Training
Using **Linear Regression** to train the model.

In [8]:
model = LinearRegression()
model.fit(X_train, y_train)

## Model Coefficients
Intercept and Coefficients explain how the model predicts values.

In [9]:
print("Intercept:", model.intercept_)
print("Coefficient:", model.coef_)

## Model Evaluation
We use several metrics to evaluate model accuracy.

In [10]:
y_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)
print("Root Mean Squared Error (RMSE):", rmse)
print("R-squared Score (R²):", r2)

## Visualization of Predictions
Comparing **actual vs predicted** values.

In [11]:
plt.figure(figsize=(8,5))
sns.scatterplot(x=y_test, y=y_pred)
plt.xlabel("Actual Salary")
plt.ylabel("Predicted Salary")
plt.title("Actual vs Predicted Salary")
plt.show()