# Linear Regression - Complete Lecture (Theory + Practical)
### 📘 Introduction to Linear Regression
Linear Regression is a statistical technique in Machine Learning used for predictive modeling.
It models the relationship between a dependent variable (target) and one or more independent variables (features) using a straight line.

The equation for simple linear regression is:
`y = β₀ + β₁x + ε`
- **y** is the dependent variable
- **x** is the independent variable
- **β₀** is the intercept
- **β₁** is the slope/coefficient
- **ε** is the error term

In practice, we use training data to learn the best values of β₀ and β₁ that minimize the error between actual and predicted y.

In [None]:
# Step 1: Import Required Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.datasets import load_diabetes

In [None]:
# Step 2: Load and Explore Dataset
data = load_diabetes()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
df.head()

In [None]:
# Step 3: Simple Linear Regression (Single Feature)
X = df[['bmi']]
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
model.intercept_, model.coef_

In [None]:
# Step 4: Visualize Regression Line
plt.figure(figsize=(8, 6))
plt.scatter(X_test, y_test, color='blue', label='Actual values')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Regression Line')
plt.xlabel('BMI')
plt.ylabel('Disease Progression')
plt.title('Simple Linear Regression using BMI')
plt.legend()
plt.show()

In [None]:
# Step 5: Evaluate the Model
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)
mse, rmse, r2

In [None]:
# Step 6: Multiple Linear Regression (All Features)
X_multi = df.drop('target', axis=1)
y_multi = df['target']
X_train_m, X_test_m, y_train_m, y_test_m = train_test_split(X_multi, y_multi, test_size=0.2, random_state=42)
model_multi = LinearRegression()
model_multi.fit(X_train_m, y_train_m)
y_pred_m = model_multi.predict(X_test_m)
mean_squared_error(y_test_m, y_pred_m), np.sqrt(mean_squared_error(y_test_m, y_pred_m)), r2_score(y_test_m, y_pred_m)

In [None]:
# Step 7: Coefficients and Feature Importance
coeff_df = pd.DataFrame(model_multi.coef_, X_multi.columns, columns=['Coefficient'])
coeff_df.sort_values(by='Coefficient', ascending=False)

In [None]:
# Step 8: Residual Analysis
residuals = y_test - y_pred
plt.figure(figsize=(8, 5))
sns.histplot(residuals, kde=True, bins=30)
plt.title('Distribution of Residuals')
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Step 9: Actual vs Predicted Values Plot
plt.figure(figsize=(8,6))
plt.scatter(y_test, y_pred, color='green')
plt.xlabel('Actual Target Values')
plt.ylabel('Predicted Values')
plt.title('Actual vs Predicted Values')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'k--', lw=2)
plt.show()

## 🏁 Summary
- Theoretical understanding of Linear Regression
- Simple and Multiple Linear Regression
- Visualization and evaluation
- Coefficients and residual analysis
- Practical applications using a real-world dataset

**Assignment:** Use another dataset (e.g., Housing Prices) and replicate the above process for hands-on learning.