# Simple Linear Regression: Explanation, Equation, and Example

## What is Simple Linear Regression?

Simple linear regression is a statistical method used to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable (independent variable, usually denoted as `x`), and the other is considered to be a dependent variable (usually denoted as `y`).

---

## Equation of Simple Linear Regression

The equation for a simple linear regression line is:

\[
y = \beta_0 + \beta_1 x + \epsilon
\]

Where:
- \( y \) = Dependent variable (what you want to predict)
- \( x \) = Independent variable (the predictor)
- \( \beta_0 \) = Intercept (value of \( y \) when \( x = 0 \))
- \( \beta_1 \) = Slope (change in \( y \) for a one-unit change in \( x \))
- \( \epsilon \) = Error term (difference between observed and predicted values)

---

## Example Dataset

Suppose we have data on the number of hours studied (`x`) and the corresponding exam scores (`y`) for five students:

| Hours Studied (x) | Exam Score (y) |
|-------------------|----------------|
|        2          |      65        |
|        3          |      70        |
|        5          |      75        |
|        7          |      80        |
|        9          |      85        |

We want to fit a simple linear regression model to predict exam scores based on hours studied.

---


In [None]:
%pip install matplotlib
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt

# Dataset
x = np.array([2, 3, 5, 7, 9])
y = np.array([65, 70, 75, 80, 85])

# Step 1: Calculate means
x_mean = np.mean(x)
y_mean = np.mean(y)

# Step 2: Calculate slope (beta_1)
numerator = np.sum((x - x_mean) * (y - y_mean))
denominator = np.sum((x - x_mean) ** 2)
beta_1 = numerator / denominator

# Step 3: Calculate intercept (beta_0)
beta_0 = y_mean - beta_1 * x_mean

# Print results
print(f"Mean of x = {x_mean:.2f}, Mean of y = {y_mean:.2f}")
print(f"Slope (β1) = {beta_1:.2f}")
print(f"Intercept (β0) = {beta_0:.2f}")
print(f"Regression Equation: y = {beta_0:.2f} + {beta_1:.2f}x")


### Interpretation

- The intercept (β0) is the predicted exam score when hours studied is 0.
- The slope (β1) means that for each additional hour studied, the exam score increases by about β1 points.


In [None]:
# Step 4: Visualization
plt.figure(figsize=(8,6))

# Scatter plot of actual data
plt.scatter(x, y, color='blue', label='Data Points')

# Regression line
y_pred = beta_0 + beta_1 * x
plt.plot(x, y_pred, color='red', label=f'Regression Line: y={beta_0:.2f}+{beta_1:.2f}x')

# Labels and title
plt.xlabel("Hours Studied")
plt.ylabel("Exam Score")
plt.title("Simple Linear Regression: Hours Studied vs Exam Score")
plt.legend()
plt.grid(True)
plt.show()


## Final Notes

- Simple Linear Regression is a way to find the **best straight line** that explains how one variable (x) affects another variable (y).
- In this example:
  - Regression Equation = **y = 60.75 + 2.74x**
  - Interpretation: Every extra hour studied adds about **2.74 points** to the exam score.