## Question 1: What is Simple Linear Regression (SLR)? Explain its purpose.

**Answer:** Simple Linear Regression (SLR) models the linear relationship between a single independent variable (X) and a dependent variable (Y). Its purpose is to estimate how changes in X predict changes in Y (prediction) and to quantify the strength and direction of that linear relationship (inference).

---

## Question 2: What are the key assumptions of Simple Linear Regression?

**Answer:**

- **Linearity:** The relationship between X and the expected value of Y is linear.
- **Independence:** Observations (and their errors) are independent.
- **Homoscedasticity:** The residuals have constant variance across X.
- **Normality of errors:** Residuals are approximately normally distributed (important for inference).
- **No perfect measurement error / No extreme outliers:** Predictors and outcomes are measured reasonably and no undue influential points.

Note: multicollinearity is a multi-predictor concern and not applicable to SLR with only one predictor.

---

## Question 3: Write the mathematical equation for a simple linear regression model and explain each term.

**Answer:**

$$y = \beta_0 + \beta_1 x + \varepsilon$$

- $y$: dependent (response) variable.
- $x$: independent (predictor) variable.
- $\beta_0$: intercept (expected value of $y$ when $x=0$).
- $\beta_1$: slope (expected change in $y$ for a one-unit increase in $x$).
- $\varepsilon$: random error term (captures variation not explained by the linear part).

---

## Question 4: Provide a real-world example where simple linear regression can be applied.

**Answer:** Predicting a person's salary from their years of experience (one predictor: years of experience; one response: salary).

---

## Question 5: What is the method of least squares in linear regression?

**Answer:** The method of least squares chooses coefficients $(\beta_0,\beta_1)$ that minimize the sum of squared residuals:

$$\min_{\beta_0,\beta_1} \sum_{i=1}^n (y_i - (\beta_0 + \beta_1 x_i))^2.$$ 

Closed-form solutions for SLR are:

$$\hat{\beta}_1 = \dfrac{\sum (x_i-\bar{x})(y_i-\bar{y})}{\sum (x_i-\bar{x})^2}, \qquad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}.$$

---

## Question 6: What is Logistic Regression? How does it differ from Linear Regression?

**Answer:** Logistic Regression is a classification model used when the dependent variable is categorical (commonly binary). It models the log-odds (logit) of the probability of class 1 as a linear function of predictors and uses the logistic (sigmoid) function
to map predictions to probabilities between 0 and 1. Differences from linear regression:

- Output: Logistic predicts probabilities (0–1) and class labels; linear predicts continuous values.
- Link & loss: Logistic uses the logit link and optimizes cross-entropy (log-loss); linear uses identity link and least squares.
- Interpretation: Logistic coefficients represent log-odds changes per unit predictor; linear coefficients represent unit changes in the response.
- Predictions from linear regression can be outside [0,1], which is inappropriate for probabilities.

---

## Question 7: Name and briefly describe three common evaluation metrics for regression models.

**Answer:**

- **Mean Squared Error (MSE):** Average of squared residuals; penalizes larger errors more.
- **Root Mean Squared Error (RMSE):** Square root of MSE; has same units as the response and is easier to interpret.
- **Mean Absolute Error (MAE):** Average of absolute residuals; robust to outliers compared to MSE.

---

## Question 8: What is the purpose of the R-squared metric in regression analysis?

**Answer:** R-squared measures the proportion of variance in the dependent variable explained by the model: 

$$R^2 = 1 - \dfrac{\text{SS}_\text{res}}{\text{SS}_\text{tot}}.$$ 

It ranges from 0 to 1 (higher is generally better), indicating goodness of fit, but it doesn't imply causation and increases with additional predictors (use adjusted R² to penalize extra variables).

---

## Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.

**Answer (code + output):** The code cell below fits a simple linear regression model. If scikit-learn is not available in the environment, the same cell falls back to a closed-form OLS solution.

Run the cell to fit and print `slope (β1)` and `intercept (β0)`.

In [1]:
# Example: fit simple linear regression with scikit-learn (fallback if sklearn not available)
import numpy as np

# sample data
X = np.array([1, 2, 3, 4, 5], dtype=float)
y = 2.0 + 1.5 * X + np.array([0.1, -0.2, 0.3, -0.1, 0.0])

try:
    from sklearn.linear_model import LinearRegression
    model = LinearRegression()
    model.fit(X.reshape(-1,1), y)
    slope = float(model.coef_[0])
    intercept = float(model.intercept_)
    print("Method: scikit-learn LinearRegression")
except Exception as e:
    # fallback closed-form
    slope = np.sum((X - X.mean()) * (y - y.mean())) / np.sum((X - X.mean())**2)
    intercept = y.mean() - slope * X.mean()
    print("Method: fallback closed-form OLS (scikit-learn unavailable)")

print("Slope (β1):", slope)
print("Intercept (β0):", intercept)

Method: scikit-learn LinearRegression
Slope (β1): 1.4900000000000002
Intercept (β0): 2.05


**Observed output from running the example in this environment** (method used: `sklearn`):

- Slope (β1): `1.4900000000000002`
- Intercept (β0): `2.05`
- 
---

## Question 10: How do you interpret the coefficients in a simple linear regression model?

**Answer:**

- **Slope (β1):** The expected change in the dependent variable Y for a one-unit increase in X (units matter). A positive slope means Y increases with X; a negative slope means Y decreases as X increases.
- **Intercept (β0):** The expected value of Y when X = 0. Interpretation is meaningful only if X=0 lies within the domain of the data.
- **Caveats:** Consider statistical significance, confidence intervals, and whether the linear model assumptions hold before making strong claims from coefficients.

---