

## 📘 1. **Understanding Simple Linear Regression Equations**

---

### ✅ **Definition**:

Simple Linear Regression is a way to **predict a value (Y)** based on **one input (X)**.

### 🧠 **Equation**:

$$
\hat{Y} = mX + b
$$

Where:

* $\hat{Y}$ = predicted value
* $X$ = input feature (independent variable)
* $m$ = slope (how much Y changes for each change in X)
* $b$ = intercept (value of Y when X = 0)

### 🎯 **Goal**:

Find the **best line** (values of $m$ and $b$) that fits the data.

---

### 🧪 **Example**:

Imagine you're predicting a person's **salary (Y)** based on their **years of experience (X)**.

| Years of Experience (X) | Salary (Y) |
| ----------------------- | ---------- |
| 1                       | 40,000     |
| 2                       | 50,000     |
| 3                       | 60,000     |

Linear regression tries to fit a line like:

$$
\hat{Y} = 10,000 \cdot X + 30,000
$$

So:

* For X = 1 → $\hat{Y} = 40,000$
* For X = 2 → $\hat{Y} = 50,000$
* For X = 3 → $\hat{Y} = 60,000$

Perfect fit! (Usually, it's not this perfect in real life.)


Awesome! Let's start with a **simple Python implementation** of **Simple Linear Regression** using both:

1. **Manual calculation** (to understand the math)
2. **Using scikit-learn** (to see how it's done in practice)

---

## 🧮 1. Manual Linear Regression (using NumPy)

Let’s implement the equation:

$$
\hat{Y} = mX + b
$$

We’ll use a small dataset:

```python
import numpy as np
import matplotlib.pyplot as plt

# Sample data: Experience (X) vs Salary (Y)
X = np.array([1, 2, 3, 4, 5])
Y = np.array([40000, 50000, 60000, 70000, 80000])

# Step 1: Mean of X and Y
mean_X = np.mean(X)
mean_Y = np.mean(Y)

# Step 2: Calculate m and b
numerator = np.sum((X - mean_X) * (Y - mean_Y))
denominator = np.sum((X - mean_X)**2)
m = numerator / denominator
b = mean_Y - m * mean_X

print(f"Slope (m): {m}")
print(f"Intercept (b): {b}")

# Step 3: Predict
Y_pred = m * X + b

# Plot
plt.scatter(X, Y, color='blue', label='Actual')
plt.plot(X, Y_pred, color='red', label='Predicted Line')
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.title("Simple Linear Regression")
plt.legend()
plt.grid(True)
plt.show()
```

🔍 **Explanation**:

* We compute the mean of X and Y
* Then apply the formula for slope `m` and intercept `b`
* Plot the original points and the predicted regression line

---

## ⚙️ 2. Using `scikit-learn` (Fast & Professional Way)

```python
from sklearn.linear_model import LinearRegression
import numpy as np

# Reshape X to be 2D for sklearn
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
Y = np.array([40000, 50000, 60000, 70000, 80000])

# Model
model = LinearRegression()
model.fit(X, Y)

# Results
print("Slope (m):", model.coef_[0])
print("Intercept (b):", model.intercept_)

# Predict
Y_pred = model.predict(X)

# Plot
import matplotlib.pyplot as plt

plt.scatter(X, Y, color='blue', label='Actual')
plt.plot(X, Y_pred, color='red', label='Predicted Line')
plt.xlabel("Years of Experience")
plt.ylabel("Salary")
plt.title("Simple Linear Regression with sklearn")
plt.legend()
plt.grid(True)
plt.show()
```

---

Let me know if you want:

* A small **coding exercise**
* To move on to **Topic 2: Cost Function**
* Or explanation of **how sklearn computes the line** behind the scenes
