<a href="https://colab.research.google.com/github/mohit27-maker/pwAssigment/blob/main/Supervised_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***Supervised Learning: Regression***

**1. What is Simple Linear Regression (SLR)? Explain its purpose.**

= Simple Linear Regression (SLR) is a fundamental statistical method used to model and analyze the relationship between two quantitative variables:

> one independent variable (X) — the predictor or explanatory variable

> one dependent variable (Y) — the response or outcome variable

## Purpose of SLR

> Prediction
- To estimate or forecast the value of Y for a given X.
- Example: Predicting a person’s weight from their height.

> Understanding Relationships
- To quantify how strongly X influences Y, and in what direction.
- Example: Assessing whether more study hours lead to higher exam scores.

> Inference
- To test hypotheses about the relationship — for instance, whether the slope (β1) significantly differs from zero, which would suggest a meaningful association.


**2. What are the key assumptions of Simple Linear Regression?**

= Simple Linear Regression (SLR) rests on a few key statistical assumptions that ensure the model’s estimates and significance tests are valid.

a.  Linearity
- The relationship between the independent variable X and the dependent variable Y is linear—that is, changes in X produce proportional changes in Y.

b. Independence of Errors
- The residuals (errors) are independent of each other—no autocorrelation.

c. Normality of Errors
- The residuals are normally distributed around the regression line.

d. Equal Variance (Homoscedasticity)
- The variance of residuals is constant across all levels of X.

e. multicollinarity
- the features should not be related or should have least relation.


**3.Write the mathematical equation for a simple linear regression model and
explain each term.**

= the essential mathematical form of a Simple Linear Regression (SLR) model:
         Yi = β0 + β1 Xi + εi

> Yi — Dependent Variable (Response)
- Represents the observed value of the outcome for observation i.
- It’s the variable we’re trying to predict or explain.

> Xi — Independent Variable (Predictor)
- The known or measured input value for observation i.
- It explains variations in Yi.

> β0 — Intercept
- The expected (predicted) value of Y when X=0.
- Graphically, it’s where the regression line crosses the Y-axis.

> β1— Slope Coefficient
- The average change in Y when X increases by one unit.

> εi— Error Term (Residual)
- Captures the randomness or unmodeled factors affecting Yi.


**4. Provide a real-world example where simple linear regression can be
applied. **

= Predicting House Prices from Square Footage
Scenario : A real estate analyst wants an easy way to estimate how much a house will sell for based solely on its size.

Variables :
- Dependent variable Y: House selling price (in dollars)
- Independent variable X: House size (in square feet)

Model Price =
             β0 + β1(Square Footage) + ε


**5. What is the method of least squares in linear regression?**

=  The method of least squares in linear regression is a mathematical technique used to find the best-fitting line through a set of data points by minimizing the sum of the squared differences (errors) between the observed values and the values predicted by the line.

In simple linear regression, we model the relationship between:

    Y = β0 + β1X + ε
We want to find values of 𝛽0 and β1 such that the sum of squared residuals (errors) is minimum.
  
    β1 = ∑(Xi​−Xˉ)(Yi​−Yˉ)​ / ∑(Xi+x-)^2



**6. What is Logistic Regression? How does it differ from Linear Regression?**

= Logistic Regression is a supervised machine learning algorithm used for classification problems,it predicts the probability that an input belongs to a specific class.It is used for binary classification where the output can be one of two possible categories such as Yes/No, True/False or 0/1.

    P(Y=1∣X) = 1/1+e^-(β0 + β1X)


**7. Name and briefly describe three common evaluation metrics for regression
models.**

= three common evaluation metrics used to assess the performance of regression models

1. Mean Absolute Error (MAE)
- MAE measures the average absolute difference between the actual and predicted values.

>    MAE = 1/n ∑^ni=1 ​∣Yi​−Yi​^​∣

- It tells you how much the predictions are off, on average, regardless of direction (positive or negative).

2. Mean Squared Error (MSE)
- MSE measures the average of the squared differences between actual and predicted values.

>    MSE=1/n ​i=1∑^n​(Yi​−Yi​^​)^2

- Squaring penalizes larger errors more heavily.
- Sensitive to outliers.

3. R-squared (Coefficient of Determination)
- R² shows the proportion of variance in the dependent variable that is explained by the model.

>    R^2 = 1−​SSres​​/SStot

- SSres = Sum of squared residuals (errors)
- SStot = Total sum of squares


**8. What is the purpose of the R-squared metric in regression analysis?**

= R-squared — also called the Coefficient of Determination — measures how well a regression model explains the variability of the dependent variable (the output) based on the independent variables (inputs).

R² tells you how much of the variation in your target variable (Y) can be explained by the model.

>   R2=1-SSres/SStot

> Interpretation:
- R² = 1 → The model perfectly explains all variability (perfect fit).
- R² = 0 → The model explains none of the variability (no relationship).
- R² = 0.8 → The model explains 80% of the variation in the target variable; 20% is unexplained (random or due to noise).

**9. : Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.**


In [1]:
from sklearn.linear_model import LinearRegression
import numpy as np

X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

model = LinearRegression()
model.fit(X, y)

print("Slope (Coefficient):", model.coef_[0])
print("Intercept:", model.intercept_)


Slope (Coefficient): 0.6
Intercept: 2.2


**10. : How do you interpret the coefficients in a simple linear regression model?**

> Y=β0 ​+ β1​X + ε

where:
 - Y = dependent (target) variable
- X = independent (predictor) variable
- 𝛽0 = intercept
- 𝛽1= slope (coefficient of X)
- ε = error term

> Interpretation of Coefficients
1. Intercept (β₀):
- The predicted value of Y when X = 0.
- It tells you where the regression line crosses the Y-axis.

2. Slope (β₁):
- The average change in Y for a one-unit increase in X.
- It shows the strength and direction of the relationship between X and Y.