## Logistic Regression Questions/Answers

### 1) What is Simple Linear Regression (SLR)? Explain its purpose.
#### > Simple Linear Regression (SLR) is a statistical method used to model and understand the relationship between two variables one independent variable (X) and one dependent variable (Y).
#### **Purpose**
#### > **Prediction** : To predict the value of the dependent variable based on the independent variable.
#### **Example** : Predicting a student's exam score (Y) based on study hours (X).
#### > **Understanding Relationships** : To understand how changes in X are associated with changes in Y.
#### **Example** : Determining if advertising spend (X) affects sales revenue (Y).
#### > **Trend Analysis** : To identify and visualize trends in data, such as upward or downward patterns.
#### > **Quantifying Influence** : To measure the strength and direction of the relationship between two variables.

### 2) What are the key assumptions of Simple Linear Regression?
#### > The key assumptions of Simple Linear Regression (SLR) ensure that the model's estimates are valid, unbiased, and reliable.
#### **Assumptions**
#### > **Linearity** : The relationship between the independent variable (X) and the dependent variable (Y) is linear.
- This means Y changes at a constant rate with X.
- Check: Use a scatter plot of X vs. Y — the points should roughly form a straight line.
#### > **Independence of Errors** : The residuals (errors) should be independent of each other.
- In other words, one observation's error should not influence another's.
- Check: Plot residuals vs. time/order; there should be no visible pattern or trend.
#### > **Homoscedasticity (Constant Variance of Errors)** : The variance of the residuals should be constant across all values of X.
- If variance increases or decreases with X (a “fan” shape), the assumption is violated.
- Check: Residual plot — points should appear randomly scattered with equal spread.
#### > **Normality of Errors** : The residuals should be approximately normally distributed.
- This assumption is important for hypothesis testing and confidence intervals.
- Check: Use a histogram or Q-Q plot of residuals.
#### > **No or Minimal Multicollinearity (not relevant for simple regression but critical in multiple regression)** : In SLR, there's only one predictor, so this assumption is automatically satisfied.
#### > **No Measurement Error in X** : The independent variable (X) is measured accurately and without significant error.

### 3) Write the mathematical equation for a simple linear regression model and explain each term.
#### > The mathematical equation for a Simple Linear Regression (SLR) model is: **Y=b₀+b₁X+ε**
#### > **Y - Dependent Variable (Target)** : The outcome or response we want to predict or explain.
- Example: House Price, Exam Score, Sales Revenue.
#### > **X - Independent Variable (Predictor)** : The input or explanatory variable used to predict Y.
- Example: Square Footage, Hours Studied, Advertising Spend.
#### > **b₀ - Intercept (Constant Term)** : The predicted value of Y when X = 0. It represents where the regression line crosses the Y-axis.
#### > **b₁ - Slope (Regression Coefficient)** : The amount by which Y changes for a one-unit increase in X. It shows the strength and direction of the relationship.
#### > **ε (epsilon) - Error Term (Residual)** - The difference between the actual value and the predicted value of Y. It represents factors not captured by X (random noise).

### 4) Provide a real-world example where simple linear regression can be applied.
#### > **Predicting House Prices Based on Size**
#### **Scenario** : A real estate analyst wants to predict the price of a house (Y) based on its size in square feet (X).
#### **SLR is Suitable**:There are two variables — one dependent (Price) and one independent (Size).
- The relationship between house size and price is typically linear: larger houses tend to have higher prices.
#### **Model Equation**: Price=b₀+b₁(Size)+ε
#### > where,
- Price (Y): Dependent variable — what we want to predict.
- Size (X): Independent variable — predictor.
- b₀: Intercept (base price when size = 0).
- b₁: Slope (increase in price per additional square foot).
- ε: Random error (factors like location, amenities, etc.).
#### > **Output** : After analyzing the data, the regression equation might be: Price=50,000+200*(Size)
#### > **Interpretation** :
- The base price of any house is doller 50,000 (even if size = 0).
- For every additional square foot, the price increases by doller 200.
#### > **Use Case** : A buyer or real estate agent can predict house prices based on size:
- For a 1,000 sq. ft. house: Price=50,000+200(1000)=250,000

### 5) What is the method of least squares in linear regression?
#### > The method of least squares is the most common technique used to estimate the coefficients (b₀ and b₁) in a Simple Linear Regression (SLR) model.It finds the best-fitting line through the data by minimizing the sum of squared errors between the actual and predicted values.
#### Equation : Y=b₀+b₁X+ε where,
- Y = actual dependent variable
- X = independent variable
- b₀,b₁= coefficients to be estimated
- ε = error (residual)

### 6) What is Logistic Regression? How does it differ from Linear Regression?
#### > Logistic Regression is a statistical and machine learning technique used for classification problems, especially binary classification — where the target variable has two possible outcomes.Its the probability that a given input belongs to a particular class using the logistic (sigmoid) function.It predicts a value between 0 and 1, which represents the probability of belonging to the positive class.
#### The difference from Linear Regression is in the following ways:
##### > **Logistic Regression**
- The purpose is to predict a categorical outcome(example: 0 or 1)
- The output range is between 0 and 1 (interpreted as probability)
- Uses Sigmoid (logistic) function as activation function
- Uses Log Loss (Cross-Entropy Loss) as error function
- Classification is the type of problem
#### > **Linear Regression**
- The purpose is to predict a continuous value (example: house price, temperature)
- The output range is any real number (−∞ to +∞)
- Activation function is none (direct linear output)
- Uses Mean Squared Error (MSE) as error function
- Regression is the type of problem

### 7) Name and briefly describe three common evaluation metrics for regression models.
#### > The three common evaluation metrics used to assess regression models are:
#### > **Mean Absolute Error (MAE)** : It measures the average absolute difference between the predicted and actual values.
#### **Interpretation** : Lower MAE means better accuracy. It gives an idea of the average prediction error in the same units as the target variable.
#### > **Mean Squared Error (MSE)** : It measures the average of squared differences between predicted and actual values.
#### **Interpretation** : Penalizes larger errors more heavily due to squaring. A lower MSE indicates a better model fit.
#### > **R-squared (Coefficient of Determination)** : Indicates the proportion of variance in the dependent variable that is explained by the model.
#### **Interpretation** : R² ranges from 0 to 1. A higher value means the model explains more variability in the data.

### 8) What is the purpose of the R-squared metric in regression analysis?
#### > The purpose of the R-squared (R²) metric in regression analysis is to measure how well the independent variables explain the variability of the dependent variable in a model.It represents the proportion of variance in the target (dependent) variable that is explained by the regression model.
#### >**Formulae** : R²=1-SSres/SStot
#### where
- SSres= Sum of squared residuals (errors)
- SStot= Total sum of squares (variation in actual data)​	​
#### > **Interpretation** :
- R² = 1 (or 100%) - Perfect fit; model explains all variability in the data.
- R² = 0 - Model explains none of the variability; it performs no better than using the mean of the target variable.
- Higher R² values indicate a better fit, but not necessarily a better model (overfitting may occur).

	​


### 9) Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

model = LinearRegression()
model.fit(X, y)

print("Slope (Coefficient):", model.coef_[0])
print("Intercept:", model.intercept_)

Slope (Coefficient): 0.6
Intercept: 2.2


### 10)  How do you interpret the coefficients in a simple linear regression model?
#### > In Simple Linear Regression, the model predicts a dependent variable Y from a single independent variable X using the equation: Y=b₀+b₁X
#### > **Intercept(b₀)** : The predicted value of Y when X=0
#### > **Interpretation** :
- If X=0,then value of Y ie exactly b₀
- It sets the baseline of the regression line.
#### **Example** : if b₀=2,then when X=0,the predicted Y=2
#### > **Slope / Coefficient(b₁)** : The amount by which Y changes for a one-unit increase in X
#### > **Interpretation**
- Positive b₁-Y increases as X increases
- Negative b₁-Y decreases as X increases
#### **Example** : if b₁=0.5,then for every 1-unit increase in X,Yincreases by 0.5 units.
#### The slope tells you the strength and direction of the relationship, while the intercept anchors the line on the Y-axis.