# Logistic Regression  

---




## Question 1: What is Simple Linear Regression (SLR)? Explain its purpose.

**Answer:**

Simple Linear Regression (SLR) is a statistical technique used to model the relationship between two variables: one independent variable (X) and one dependent variable (Y). It assumes a linear relationship between X and Y, meaning that changes in X correspond to proportional changes in Y.

**Purpose:**
The main purpose of SLR is to predict the value of the dependent variable based on the independent variable and to understand the strength and direction of their relationship. It is widely used for forecasting and to quantify the effect of X on Y.

## Question 2: What are the key assumptions of Simple Linear Regression?

**Answer:**

The key assumptions of Simple Linear Regression are:

1. **Linearity:** The relationship between the independent and dependent variable is linear.
2. **Independence:** The observations are independent of each other.
3. **Homoscedasticity:** The variance of residuals (errors) is constant across all levels of X.
4. **Normality of Errors:** The residuals should be normally distributed.
5. **No Multicollinearity:** (Applicable for multiple regression) — predictors should not be highly correlated.

## Question 3: Write the mathematical equation for a simple linear regression model and explain each term.

**Answer:**

The mathematical equation is:

    Y = β0 + β1X + ε

Where:
- **Y:** Dependent variable (the outcome to be predicted)
- **X:** Independent variable (predictor)
- **β0:** Intercept (value of Y when X = 0)
- **β1:** Slope (change in Y for a one-unit change in X)
- **ε:** Random error term (captures unexplained variation)

## Question 4: Provide a real-world example where simple linear regression can be applied.

**Answer:**

**Example:** Predicting house prices based on square footage.

If we collect data on house sizes (X) and their prices (Y), we can use simple linear regression to model how house price changes with size. The model helps estimate the average increase in price for every additional square foot.

## Question 5: What is the method of least squares in linear regression?

**Answer:**

The **method of least squares** is a mathematical approach used to find the best-fitting line for a set of data points. It works by minimizing the **sum of the squares of the residuals** (the vertical distances between observed and predicted values).

In other words, it finds coefficients (β0 and β1) that make the predictions as close as possible to the actual data.

## Question 6: What is Logistic Regression? How does it differ from Linear Regression?

**Answer:**

**Logistic Regression** is a classification algorithm used when the dependent variable is categorical (usually binary — e.g., 0 or 1). It estimates the probability that a given input belongs to a particular category using the **logistic (sigmoid) function**.

**Differences from Linear Regression:**

1. **Nature of Output:** Linear regression predicts continuous values, while logistic regression predicts probabilities between 0 and 1.
2. **Equation:** Logistic regression uses the log-odds (logit) function instead of a straight line.
3. **Error Function:** Linear regression minimizes Mean Squared Error; logistic regression uses Log-Loss (cross-entropy).
4. **Purpose:** Linear → prediction; Logistic → classification.

## Question 7: Name and briefly describe three common evaluation metrics for regression models.

**Answer:**

1. **Mean Absolute Error (MAE):** Average of absolute differences between actual and predicted values.
   - Formula: MAE = (1/n) * Σ|y_i - ŷ_i|
   - Measures average prediction error in original units.

2. **Mean Squared Error (MSE):** Average of squared differences between actual and predicted values.
   - Formula: MSE = (1/n) * Σ(y_i - ŷ_i)²
   - Penalizes larger errors more heavily.

3. **Root Mean Squared Error (RMSE):** Square root of MSE.
   - Formula: RMSE = √MSE
   - Easier to interpret since it’s in the same units as the target variable.

## Question 8: What is the purpose of the R-squared metric in regression analysis?

**Answer:**

**R-squared (R²)**, also known as the **coefficient of determination**, measures the proportion of variance in the dependent variable that is explained by the independent variable(s).

**Formula:** R² = 1 - (SS_res / SS_tot)

Where:
- SS_res = Sum of squared residuals
- SS_tot = Total sum of squares

**Purpose:**
- Indicates goodness of fit of the model.
- R² = 1 → perfect fit, R² = 0 → model explains nothing.

## Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.

**Answer (code + explanation):**

Below is an example code cell that fits a simple linear regression model using scikit-learn on a small sample dataset. The cell prints the learned slope and intercept.



---


In [1]:
# Question 9: Python code to fit a simple linear regression model and print slope & intercept
from sklearn.linear_model import LinearRegression
import numpy as np

# Example dataset
X = np.array([[1], [2], [3], [4], [5]])  # independent variable
y = np.array([2, 4, 5, 4, 5])            # dependent variable

# Create model and fit
model = LinearRegression()
model.fit(X, y)

# Get slope and intercept
slope = model.coef_[0]
intercept = model.intercept_

print(f"Slope: {slope}")
print(f"Intercept: {intercept}")

Slope: 0.6
Intercept: 2.2


## Question 10: How do you interpret the coefficients in a simple linear regression model?

**Answer:**

- **Intercept (β0):** The predicted value of Y when X = 0.
- **Slope (β1):** The expected change in Y for a one-unit increase in X.

**Example:** If β1 = 2, it means that for every 1-unit increase in X, the model predicts Y will increase by 2 units on average.