# **Logistic Regression Assignment:**


##Question 1 : **What is Simple Linear Regression (SLR)? Explain its purpose.**

**Answer:**

**Simple Linear Regression (SLR)** is a statistical method used to study the relationship between two variables —
one independent variable (X), and one dependent variable (Y).

It tries to fit a straight line (called the regression line) through the data points so that it best predicts the value of Y based on X.

Equation of SLR: **Y=a+bX**

Where:

Y → Dependent variable (the one we want to predict)

X → Independent variable (the predictor)

a → Intercept (value of Y when X = 0)

b → Slope (change in Y for each unit change in X)

**Purpose of Simple Linear Regression:**

- **Prediction:** To predict the value of one variable (Y) based on another variable (X).
Example: Predicting house price (Y) based on size of the house (X).

- **Understanding Relationships:** To understand how strongly two variables are related and whether the relationship is positive or negative.

- **Trend Analysis:** To find patterns or trends in data — for example, how sales increase with advertising spend.

## Question 2: **What are the key assumptions of Simple Linear Regression?**

**Answer:**



- **Linearity:** Relationship between 𝑋 and 𝑌 is linear.

- **Independence:** Observations are independent of each other.

- **Homoscedasticity:** Constant variance of residuals across all values of 𝑋.

- **Normality of Errors:** Residuals (errors) are normally distributed.

- **No Multicollinearity:** Not applicable in simple regression (only one predictor).

## Question 3: **Write the mathematical equation for a simple linear regression model and explain each term.**

**Answer:**
Yi​=β0+β1​Xi​+εi​

| **Term**          | **Meaning**                        | **Explanation**                                                                                                                             |
| ----------------- | ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| ( Y_i )           | **Dependent Variable**             | The actual (observed) value of the variable we want to predict — e.g., marks, price, salary.                                                |
| ( X_i )           | **Independent Variable**           | The predictor variable used to estimate or explain changes in ( Y_i ).                                                                      |
| ( β0 )       | **Intercept**                      | The value of ( Y ) when ( X = 0 ). It represents where the regression line crosses the Y-axis.                                              |
| ( β1 )       | **Slope (Regression Coefficient)** | Shows how much ( Y ) changes when ( X ) increases by one unit. It represents the strength and direction of the relationship.                |
| ( ɛ) | **Error Term (Residual)**          | The difference between the actual value ( Y_i ) and the predicted value (\hat{Y_i}). It accounts for random noise or unexplained variation. |


## Question 4: **Provide a real-world example where simple linear regression can be applied.**

**Answer:**

**Real-World Example of Simple Linear Regression (SLR):**

**Example: Predicting House Prices Based on Size**
A real estate analyst wants to predict the price of a house (Y) based on its size in square feet (X).

He collects the following data:

| House Size (sq.ft) | Price (₹ in lakhs) |
| ------------------ | ------------------ |
| 1000               | 50                 |
| 1500               | 65                 |
| 2000               | 80                 |
| 2500               | 95                 |
| 3000               | 110                |


Applying Simple Linear Regression:

We assume a linear relationship:

Y= 𝛽0+𝛽1X+ɛ

After fitting the data using Python, suppose we get the model:
Y= 20+0.03X

**Interpretation:**
- **Intercept** (𝛽0=20) → Even if the house size were 0 sq.ft, the base price starts at ₹20 lakh (fixed cost, land value, etc.).

- **Slope** (𝛽1=0.03) → For every 1 sq.ft increase in house size, the price increases by ₹0.03 lakh (₹3,000).

## Question 5: **What is the method of least squares in linear regression?**

**Answer:**

The method of least squares is a mathematical technique used to find the best-fitting line in linear regression.
It works by minimizing the sum of the squares of the errors (residuals) between the observed values and the predicted values.

In simple words:It finds the line that makes the total prediction error as small as possible.



## Question 6: **What is Logistic Regression? How does it differ from Linear Regression?**

**Answer:**

- **Logistic Regression** is a statistical model used for binary classification problems — that is, problems where the outcome variable has two possible values (e.g., yes/no, 0/1, true/false). It predicts the probability that a given input belongs to a particular category.

| **Feature**           | **Linear Regression**                               | **Logistic Regression**                               |
| --------------------- | --------------------------------------------------- | ----------------------------------------------------- |
| **Purpose**           | Predicts **continuous values** (e.g., price, marks) | Predicts **categorical outcomes** (e.g., yes/no, 0/1) |
| **Output**            | Any real number (−∞ to +∞)                          | Probability (between 0 and 1)                         |
| **Equation**          | ( y = β₀ + β₁x + ... + βₙxₙX )                         | (P = 1 / (1 + e^-(β₀ + β₁x)))|
| **Error Measurement** | Uses Mean Squared Error (MSE)                  | Uses Log Loss (Cross-Entropy)                     |
| **Line Type**         | Straight Line                                       | S-shaped Sigmoid Curve                            |
| **Used For**          | Regression problems                                 | Classification problems                               |


## Question 7: **Name and briefly describe three common evaluation metrics for regression models.**

**Answer:**

When we build a regression model (like Linear Regression), we need to measure how well it predicts continuous outcomes.
Here are three widely used evaluation metrics:

 **1. Mean Absolute Error (MAE)**
 MAE=n1​i=1∑n​∣Yi​−Yi​^​∣

- Description:
It measures the average of the absolute differences between the actual values and the predicted values.

- Interpretation:
Smaller MAE → better accuracy.

- Easy to understand because it uses the same units as the target variable.

**Example:**
If MAE = 5, it means predictions are off by 5 units on average.

**2. Mean Squared Error (MSE)** MSE=n1​i=1∑n​(Yi​−Yi​^​)2

- Description:
It measures the average of the squared differences between actual and predicted values.

- Interpretation:
  - Penalizes larger errors more heavily because the errors are squared.
  - Useful when you want to emphasize big mistakes.

**3. R-squared (Coefficient of Determination)** R²=1−∑(Yi​−Yˉ)2∑(Yi​−Yi​^​)2​

Description:
It shows how much of the variation in the dependent variable (Y) is explained by the model.

**Interpretation:**
R² ranges from 0 to 1.
R² = 1: perfect prediction
R² = 0: model explains none of the variation



## Question 8: **What is the purpose of the R-squared metric in regression analysis?**

**Answer:**

**R-squared (R²)** — also called the Coefficient of Determination — is a statistical measure that tells us how well a regression model explains the variability of the dependent variable (Y) using the independent variable(s) (X).

**Purpose**
- It measures the goodness of fit of a regression model.

- In simple terms, it shows how much of the variation in the output (Y) is explained by the input (X).

## Question 9: **Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.**

**Answer:**

In [2]:
# Requires: scikit-learn, numpy

import numpy as np
from sklearn.linear_model import LinearRegression

# Example data (x: feature, y: target)
x = np.array([1, 2, 3, 4, 5])        # shape (n_samples,)
y = np.array([2.1, 3.9, 6.0, 8.1, 10.2])  # shape (n_samples,)

# scikit-learn expects 2D array for features
X = x.reshape(-1, 1)  # shape (n_samples, 1)

# Create and fit the model
model = LinearRegression()
model.fit(X, y)

# Extract slope (coefficient) and intercept
slope = model.coef_[0]
intercept = model.intercept_

print(f"Slope (coefficient): {slope}")
print(f"Intercept: {intercept}")

Slope (coefficient): 2.0399999999999996
Intercept: -0.05999999999999872


## Question 10: **How do you interpret the coefficients in a simple linear regression model?**

**Answer:**

A simple linear regression model is expressed as: Y= β0 + β1X + ɛ

Where:

- Y: Dependent variable (what we want to predict)
- X: Independent variable (predictor)
- β0: Intercept
- β1: Slope (coefficient of X)
- ɛ: Error term

**1. Intercept (β0)- "Starting point"**
It represents the predicted value of Y when X = 0

In other words, it’s where the regression line crosses the Y-axis.

Example:
If the equation is-  Y= 30 + 10X

then
𝛽0=30 When
X=0 the predicted value of Y is 30.

**2. Slope (β1)- "Rate of Change"**
- It shows how much Y changes when X increases by one unit, keeping all else constant.

- It also indicates the direction of the relationship:
  - If β1> 0 : Positive relationship (Y increases as X increases)
  - If β1< 0 : Negative relationship (Y decreases as X increases)

xample:
If β1=10,
For every 1-unit increase in X, Y increases by 10 units on average.