# Supervised Learning: Regression Models and Performance Metrics

**Assignment Code:** D-AG-008  
This notebook contains detailed answers (theory + coding) to all questions, with real‑world examples wherever applicable.  
All question numbers are kept **exactly the same** as in the assignment.


## Question 1: What is Simple Linear Regression (SLR)? Explain its purpose.

**Simple Linear Regression (SLR)** is a statistical and machine learning technique used to model the relationship between **one independent variable (X)** and **one dependent variable (Y)** by fitting a straight line to the observed data.

The main purpose of SLR is:
- To understand how changes in the independent variable affect the dependent variable
- To predict the value of the dependent variable for a given value of the independent variable
- To quantify the strength and direction (positive or negative) of the relationship

**In simple terms:** SLR tries to answer questions like *"If X increases by one unit, how much does Y change on average?"*

**Example:**
- Predicting house price (Y) based on house size (X)
- Predicting exam score (Y) based on number of study hours (X)


## Question 2: What are the key assumptions of Simple Linear Regression?

Simple Linear Regression relies on the following key assumptions:

1. **Linearity**  
   The relationship between X and Y is linear.

2. **Independence of errors**  
   The residuals (errors) are independent of each other.

3. **Homoscedasticity**  
   The variance of errors is constant for all values of X.

4. **Normality of errors**  
   The residuals are normally distributed.

5. **No significant outliers**  
   Extreme values can distort the regression line.

**Real‑life intuition:** If these assumptions are violated, predictions become unreliable, just like trying to draw a straight road through a hilly terrain.


## Question 3: Write the mathematical equation for a simple linear regression model and explain each term.

The mathematical equation of Simple Linear Regression is:

$$ y = \beta_0 + \beta_1 x + \epsilon $$

**Explanation of terms:**
- **y**: Dependent variable (target)
- **x**: Independent variable (feature)
- **β₀ (Intercept)**: Value of y when x = 0
- **β₁ (Slope)**: Change in y for a one‑unit change in x
- **ε (Error term)**: Represents noise or unexplained variation

**Example:**  
If y = exam score and x = hours studied, then β₁ tells us how many extra marks are gained per additional study hour.


## Question 4: Provide a real-world example where simple linear regression can be applied.

**Real‑world example: Salary Prediction**

- **Independent Variable (X):** Years of experience
- **Dependent Variable (Y):** Salary

A company may use simple linear regression to estimate an employee’s salary based on their years of experience.

**Other examples:**
- Predicting fuel consumption based on distance traveled
- Predicting electricity bill based on units consumed
- Predicting crop yield based on rainfall


## Question 5: What is the method of least squares in linear regression?

The **method of least squares** is used to estimate the regression coefficients by minimizing the **sum of squared errors** between actual and predicted values.

$$ \text{Minimize } \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

**Why squared errors?**
- Penalizes large errors more heavily
- Makes the optimization mathematically tractable

**Intuition:**  
The best‑fit line is the one that stays as close as possible to all data points on average.


## Question 6: What is Logistic Regression? How does it differ from Linear Regression?

**Logistic Regression** is a classification algorithm used when the target variable is **binary** (0 or 1).

| Feature | Linear Regression | Logistic Regression |
|-------|----------------|------------------|
| Output | Continuous | Probability (0–1) |
| Use case | Regression | Classification |
| Function | Straight line | Sigmoid curve |

**Real‑life example:**
- Linear Regression → Predicting house price
- Logistic Regression → Predicting whether an email is spam (Yes/No)


## Question 7: Name and briefly describe three common evaluation metrics for regression models.

1. **Mean Absolute Error (MAE)**  
   Average of absolute differences between actual and predicted values.

2. **Mean Squared Error (MSE)**  
   Average of squared differences; penalizes large errors.

3. **Root Mean Squared Error (RMSE)**  
   Square root of MSE; interpretable in original units.

**Example:** RMSE is commonly used in weather and demand forecasting.


## Question 8: What is the purpose of the R-squared metric in regression analysis?

**R-squared (R²)** measures how well the regression model explains the variability of the dependent variable.

$$ R^2 = 1 - \frac{SS_{res}}{SS_{tot}} $$

- R² = 1 → Perfect fit
- R² = 0 → Model explains nothing

**Interpretation:**  
An R² of 0.85 means 85% of the variation in Y is explained by X.


## Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.


In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# Model
model = LinearRegression()
model.fit(X, y)

print("Slope:", model.coef_[0])
print("Intercept:", model.intercept_)

**Output Explanation:**  
- Slope represents change in y per unit change in x
- Intercept represents y when x = 0


## Question 10: How do you interpret the coefficients in a simple linear regression model?

- **Intercept (β₀):** Expected value of Y when X = 0
- **Slope (β₁):** Average change in Y for a one‑unit increase in X

**Example:**  
If slope = 3, then for every additional hour studied, the exam score increases by 3 marks on average.
