# ***Supervised Learning: Regression Models and Performance Metrics :-***

1.  What is Simple Linear Regression (SLR)? Explain its purpose.

   ANS:- Simple Linear Regression (SLR) is a statistical method that models the relationship between two variables one independent (predictor) and one dependent (response). It fits a straight line, expressed as $Y=\beta _0+\beta _1X+\epsilon$ , where $\beta _0$ is the intercept, $\beta _1$ the slope, and $\epsilon$  the error term.

   It helps in predicting the dependent variable's value based on changes in the independent variable. Additionally, it measures the strength and direction of this relationship, offering insights into how variations in the predictor influence outcomes. This makes SLR useful for forecasting, trend analysis, and understanding cause-effect dynamics in data-driven decision-making across diverse fields.



2. What are the key assumptions of Simple Linear Regression?

    ANS:-  The key assumptions of Simple Linear Regression are:

    - **Linearity:** The relationship between independent and dependent variables is linear.

    - **Independence:** Observations are independent of each other.

    - **Homoscedasticity:** Constant variance of errors across all levels of the independent variable.

    - **Normality:** Residuals are normally distributed.

    - **No multicollinearity:** Not relevant in SLR, but predictor should not be correlated with error terms.



3. Write the mathematical equation for a simple linear regression model and
explain each term.

ANS :- The mathematical equation for a Simple Linear Regression model is:

   $Y=\beta _0+\beta _1X+\epsilon$

- Y: Dependent variable (outcome to predict).
- X: Independent variable (predictor).
- $\beta _0$: Intercept, value of Y when X=0.
- $\beta _1$: Slope, change in Y for one-unit change in X.
- $\epsilon$ : Error term, capturing random variation unexplained by the model.


4.  Provide a real-world example where simple linear regression can be
applied.

    ANS :- A real-world example of Simple Linear Regression is predicting house prices based on square footage. Here, square footage acts as the independent variable, while house price is the dependent variable. By fitting a regression line, analysts can estimate how much price increases for each additional unit of area. This helps buyers, sellers, and real estate agents understand market trends, make informed decisions, and forecast property values using a straightforward, interpretable statistical model grounded in linear relationships.

In [4]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Example data: square footage (X) and house price (Y)

X = np.array([[1000], [1500], [2000], [2500], [3000]])
Y = np.array([200000, 250000, 300000, 350000, 400000])

# Fit model

model = LinearRegression()
model.fit(X, Y)

# Predict price for 2200 sq.ft

prediction = model.predict([[2200]])
print("Predicted Price:", prediction[0])

Predicted Price: 320000.0


5. What is the method of least squares in linear regression?

    ANS :- The method of least squares in linear regression is a mathematical technique used to estimate regression coefficients. It minimizes the sum of squared differences between observed values and predicted values from the regression line. By finding the line that produces the smallest total squared error it ensures the best possible fit to the data. This method provides unbiased, efficient estimates under classical assumptions, making it the standard approach for parameter estimation in regression analysis.


    The least squares method works by minimizing the sum of squared residuals between observed and predicted values.
    
    Steps:

- Assume model: $Y=\beta _0+\beta _1X+\epsilon$
- Residuals: $e_i=Y_i-(\beta _0+\beta _1X_i)$
- Objective: Minimize $S=\sum e_i^2=\sum (Y_i-\beta _0-\beta _1X_i)^2$
- Differentiate $S$ w.r.t. $\beta _0,\beta _1$, set derivatives to zero.
- Solve equations → obtain optimal $\beta _0,\beta _1$

6.  What is Logistic Regression? How does it differ from Linear Regression?

    ANS :- Logistic Regression is a statistical method used for binary classification, modeling the probability that a dependent variable belongs to a category. It uses the logistic (sigmoid) function to constrain outputs between 0 and 1.
    
    Unlike Linear Regression, which predicts continuous values using a straight line, Logistic Regression predicts categorical outcomes.
    
    The key difference lies in purpose: Linear Regression estimates numeric relationships, while Logistic Regression estimates probabilities and class membership, making it suitable for classification tasks.


7. Name and briefly describe three common evaluation metrics for regression
models.

    ANS :- Three common evaluation metrics for regression models are:

    - **Mean Absolute Error (MAE):** Average of absolute differences between predicted and actual values, showing overall prediction accuracy.

    - **Mean Squared Error (MSE):** Average of squared differences, penalizing larger errors more strongly.

    - **R-squared (Coefficient of Determination):** Proportion of variance in the dependent variable explained by the model, indicating goodness of fit.
    
    These metrics assess accuracy, error magnitude, and explanatory power.


8. What is the purpose of the R-squared metric in regression analysis?

    ANS:- The purpose of the R-squared metric is to measure how well a regression model explains the variability of the dependent variable. It represents the proportion of variance in the outcome accounted for by the independent variable(s). Values range from 0 to 1 where higher values indicate better explanatory power. Essentially, R-squared shows the goodness of fit how closely the regression predictions match the actual data, helping assess the model's effectiveness in capturing relationships.


9.  Write Python code to fit a simple linear regression model using scikit-learn
and print the slope and intercept.

In [5]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data

X = np.array([[1], [2], [3], [4], [5]])   # Independent variable
Y = np.array([2, 4, 5, 4, 5])             # Dependent variable

# Fit model

model = LinearRegression()
model.fit(X, Y)

# Print slope and intercept

print("Slope (β1):", model.coef_[0])
print("Intercept (β0):", model.intercept_)

Slope (β1): 0.6
Intercept (β0): 2.2
