# Supervised Learning: Regression Models and Performance Metrics — Completed Assignment (Colab Notebook)

**Student:** _(Add your name)_  
**Assignment Code:** DA-AG-008  
**Generated on:** 2025-11-02 10:50:01

---


## Question 1 : What is Simple Linear Regression (SLR)? Explain its purpose.

**Answer:**

Simple Linear Regression (SLR) models the relationship between two variables by fitting a straight line: one independent variable (predictor, X) and one dependent variable (response, Y). Its purpose is to predict the value of Y for a given X and to quantify the strength and direction of their linear relationship.

SLR assumes the relationship is linear and uses training data to estimate the best-fit line that minimizes the error between observed and predicted Y values.

Common uses: trend estimation, forecasting, and assessing effect size for a single predictor.


## Question 2: What are the key assumptions of Simple Linear Regression?

**Answer:**

Key assumptions of SLR:
1. **Linearity:** The relationship between X and Y is linear.
2. **Independence:** Observations (and errors) are independent.
3. **Homoscedasticity:** Constant variance of residuals across values of X.
4. **Normality of errors:** Residuals are approximately normally distributed (important for inference).
5. **No perfect multicollinearity:** (Not applicable for single predictor case) — predictor has variability.
6. **No influential outliers:** Extreme points can unduly affect the fitted line.

Violations affect validity of predictions and statistical tests; diagnostics (residual plots, QQ plots, tests) help check these.


## Question 3: Write the mathematical equation for a simple linear regression model and explain each term.

**Answer:**

Mathematical equation:

\[ y = \beta_0 + \beta_1 x + \varepsilon \]

Where:
- \(y\): Dependent (response) variable.
- \(x\): Independent (predictor) variable.
- \(\beta_0\): Intercept — the expected value of \(y\) when \(x = 0\).
- \(\beta_1\): Slope (coefficient) — the expected change in \(y\) for a one-unit increase in \(x\).
- \(\varepsilon\): Error term (residual) capturing the deviation of observed \(y\) from the line due to noise or unobserved factors.

The goal of regression is to estimate \(\beta_0\) and \(\beta_1\) from data (commonly via least squares).


## Question 4: Provide a real-world example where simple linear regression can be applied.

**Answer:**

Example: Predicting house price (Y) from house size (X).
- X: Size in square feet.
- Y: Sale price in USD.
SLR can estimate how much price increases per additional square foot, e.g., \(\beta_1 = 150\) means price increases by \$150 for each additional sq.ft, on average.

Other examples: predicting sales from advertisement spend, estimating weight from calorie intake, forecasting temperature from altitude (if roughly linear over a range).


## Question 5: What is the method of least squares in linear regression?

**Answer:**

The method of least squares estimates regression coefficients by minimizing the sum of squared residuals:

\[ \text{SSE} = \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1 x_i))^2. \]

The best-fitting line has coefficients \(\hat{\beta}_0, \hat{\beta}_1\) that minimize SSE. Closed-form solutions exist for SLR:

\[ \hat{\beta}_1 = \dfrac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}, \quad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}. \]

Least squares yields unbiased, minimum-variance linear estimators under the Gauss–Markov assumptions.


## Question 6: What is Logistic Regression? How does it differ from Linear Regression?

**Answer:**

**Logistic Regression** is a classification algorithm used for binary (or multi-class via extensions) outcomes. It models the probability that the dependent variable belongs to a class using the logistic (sigmoid) function applied to a linear combination of inputs.

Key differences from Linear Regression:
- **Outcome type:** Logistic — categorical (probabilities); Linear — continuous.
- **Modeling target:** Logistic models \(P(Y=1|X) = \sigma(\beta_0 + \beta_1 X)\) where \(\sigma\) is the sigmoid; Linear predicts numeric values directly.
- **Loss function:** Logistic uses log-loss (cross-entropy); Linear uses squared error (least squares).
- **Interpretation:** Logistic coefficients relate to log-odds; linear coefficients relate to change in response per unit change in predictor.

Logistic regression is used when you want class probabilities and decision boundaries; linear regression is for predicting continuous quantities.


## Question 7: Name and briefly describe three common evaluation metrics for regression models.

**Answer:**

Three common regression metrics:
1. **Mean Squared Error (MSE):** Average of squared differences between actual and predicted values. Penalizes larger errors more.
   \[ \text{MSE} = \dfrac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2. \]
2. **Root Mean Squared Error (RMSE):** Square root of MSE — has same units as the target, easier to interpret.
   \[ \text{RMSE} = \sqrt{\text{MSE}}. \]
3. **Mean Absolute Error (MAE):** Average absolute difference between actual and predicted. Less sensitive to outliers than MSE.
   \[ \text{MAE} = \dfrac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|. \]

Other metrics: R-squared, Mean Absolute Percentage Error (MAPE), adjusted R-squared.


## Question 8: What is the purpose of the R-squared metric in regression analysis?

**Answer:**

R-squared (\(R^2\)) measures the proportion of variance in the dependent variable that is explained by the model.

\[ R^2 = 1 - \dfrac{\text{SSE}}{\text{SST}} = \dfrac{\text{SSR}}{\text{SST}}, \]
where SST is total sum of squares and SSR is regression sum of squares.

Interpretation: \(R^2 = 0.7\) means 70% of the variability in Y is explained by the model. Note: high \(R^2\) doesn't imply causation; adding variables can increase \(R^2\) even if they are irrelevant (use adjusted \(R^2\) for multiple regression).


## Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.

(Include your Python code and output in the code box below.)

**Answer (Code):**

Below is code that generates a small synthetic dataset, fits an SLR using scikit-learn, prints slope and intercept, and shows the scatter + fitted line. Run in Colab.


In [None]:
# Question 9: Fit a Simple Linear Regression with scikit-learn
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Synthetic example data
np.random.seed(42)
X = 2.5 * np.random.randn(100, 1) + 1.5   # predictor
y = 4.0 + 3.5 * X.ravel() + np.random.randn(100) * 2.0  # response with noise

# Fit model
model = LinearRegression()
model.fit(X, y)
slope = model.coef_[0]
intercept = model.intercept_
print('Slope (beta_1):', slope)
print('Intercept (beta_0):', intercept)

# Plot data and fitted line
plt.figure(figsize=(6,4))
plt.scatter(X, y, label='Data')
x_line = np.linspace(X.min(), X.max(), 100).reshape(-1,1)
plt.plot(x_line, model.predict(x_line), color='orange', linewidth=2, label='Fitted line')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Simple Linear Regression Fit')
plt.legend()
plt.show()

## Question 10: How do you interpret the coefficients in a simple linear regression model?

**Answer:**

- **Intercept (\(\beta_0\))**: Expected value of the response \(y\) when the predictor \(x=0\). It is the point where the regression line crosses the Y-axis.
- **Slope (\(\beta_1\))**: The expected change in \(y\) for a one-unit increase in \(x\), holding other factors constant. A positive slope indicates \(y\) increases with \(x\); a negative slope indicates a decrease.

Interpretation must consider units of measurement and practical significance. Statistical tests (t-tests, confidence intervals) help determine if coefficients are significantly different from zero.


----

### Submission & Usage
- Each question appears immediately before its answer as requested.
- To run the code cell (Q9) and see outputs, open this notebook in **Google Colab** and run all cells.
- After running, download the executed notebook via **File → Download → Download .ipynb** or save a copy to Google Drive.

If you want, I can also add a PDF version after you run the notebook and upload the executed copy; tell me if you'd like that.
