# Supervised Learning: Regression odels and Performance Metrics

1. **What is Simple Linear Regression (SLR)? Explain its purpose.**

**Simple Linear Regression (SLR)**

- Simple Linear Regression is a statistical technique used to model the relationship between one independent variable (X) and one dependent variable (Y).
- It fits a straight line that best explains how changes in X affect Y.
- The model is represented as:
  
  Y = β₀ + β₁X + ε

  **β₀** is the intercept, **β₁** is the slope, and **ε** is the error term.

**Purpose**

- To predict the value of Y based on a given value of X.
- To understand the strength and direction of the relationship between X and Y.
- To analyze trends and make data-driven decisions using a simple linear pattern.

2. **What are the key assumptions of Simple Linear Regression?**

**Key Assumptions of Simple Linear Regression**

- **Linearity**  
  The relationship between the independent variable (X) and dependent variable (Y) must be linear.

- **Independence of Errors**  
  Residuals (errors) should be independent of each other.

- **Homoscedasticity**  
  The variance of residuals should remain constant across all values of X.

- **Normality of Residuals**  
  Residuals should follow a roughly normal distribution.

- **No Autocorrelation**  
  Residuals should not show patterns over time, especially in time-series data.

- **No Multicollinearity**  
  Automatically satisfied because SLR uses only one independent variable.


3. **Write the mathematical equation for a simple linear regression model and explain each term.**

**Simple Linear Regression Equation**

  Y = β₀ + β₁X + ε

**Explanation of Each Term**

- **Y**  
  The dependent variable (the value we want to predict).

- **X**  
  The independent variable (the predictor).

- **β₀ (Intercept)**  
  The value of Y when X = 0. It represents the starting point of the line.

- **β₁ (Slope)**  
  The amount by which Y changes for each one-unit increase in X.

- **ε (Error Term)**  
  The difference between the actual value and the predicted value.  
  It captures randomness, noise, or factors not included in the model.


4. **Provide a real-world example where simple linear regression can be applied.**

**Real-World Example of Simple Linear Regression**

- A company wants to predict **monthly sales revenue** based on **advertising spend**.
- Here, advertising spend (X) is the independent variable, and sales revenue (Y) is the dependent variable.
- By fitting a simple linear regression model, the company can estimate how much revenue increases for every extra unit of money spent on advertising.
- This helps in forecasting future sales and making budgeting decisions.


5. **What is the method of least squares in linear regression?**

**Method of Least Squares**

- The method of least squares is a technique used to find the best-fitting regression line for a set of data points.
- It works by minimizing the **sum of the squared differences** between the actual values (Y) and the predicted values (Ŷ) from the regression line.

**How It Works**

- For each data point, calculate the residual:  
  residual = (Y − Ŷ)
- Square each residual to avoid negative values.
- Add all squared residuals together.
- The regression algorithm chooses the line for which this total squared error is the smallest.

**Purpose**

- Ensures the line fits the data as closely as possible.
- Produces the most accurate estimates of the intercept and slope of the regression line.


6. **What is Logistic Regression? How does it differ from Linear Regression?**

**Logistic Regression**

- Logistic Regression is a statistical model used for **classification tasks**, where the target variable is categorical (such as 0/1, Yes/No, Spam/Not Spam).
- Instead of predicting a continuous value, it predicts the **probability** that an input belongs to a particular class.
- It uses the **sigmoid (logistic) function** to convert linear combinations of inputs into probabilities between 0 and 1.

**Key Equation**

  p = 1 / (1 + e^-(β₀ + β₁X))

  Where p is the probability of the positive class.

**Difference from Linear Regression**

- **Purpose**  
  - Linear Regression predicts continuous values.  
  - Logistic Regression predicts probabilities and class labels.

- **Output Range**  
  - Linear Regression outputs any real number.  
  - Logistic Regression outputs values between 0 and 1.

- **Model Type**  
  - Linear Regression fits a straight line.  
  - Logistic Regression fits an S-shaped curve using the sigmoid function.

- **Loss Function**  
  - Linear Regression uses Mean Squared Error (MSE).  
  - Logistic Regression uses Log Loss (Binary Cross-Entropy).

- **Applications**  
  - Linear Regression: predicting house prices, sales, etc.  
  - Logistic Regression: predicting fraud, disease diagnosis, email spam detection.


7. **Name and briefly describe three common evaluation metrics for regression models.**

**Mean Absolute Error (MAE)**  
- Measures the average absolute difference between actual and predicted values.  
- Lower MAE indicates better model performance.

**Mean Squared Error (MSE)**  
- Calculates the average of squared differences between actual and predicted values.  
- Penalizes larger errors more heavily due to squaring.

**R-squared (Coefficient of Determination)**  
- Represents how much of the variance in the dependent variable is explained by the model.  
- Ranges from 0 to 1, with higher values indicating a better fit.

8. **What is the purpose of the R-squared metric in regression analysis?**

**Purpose of R-squared in Regression**

- R-squared measures the proportion of the variance in the dependent variable (Y) that is explained by the independent variable(s) in the model.
- It indicates how well the regression line fits the data.
- The value ranges from 0 to 1:
  - 0 means the model explains none of the variation.
  - 1 means the model explains all of the variation.
- It helps determine the strength of the linear relationship and the overall goodness of fit of the model.


In [1]:
# 9. Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.
from sklearn.linear_model import LinearRegression
import numpy as np

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
Y = np.array([3, 4, 2, 5, 6])

# Create and fit the model
model = LinearRegression()
model.fit(X, Y)

# Print slope and intercept
print("Slope (β₁):", model.coef_[0])
print("Intercept (β₀):", model.intercept_)


Slope (β₁): 0.6999999999999998
Intercept (β₀): 1.9000000000000004


10. **How do you interpret the coefficients in a simple linear regression model?**

**Interpreting Coefficients in Simple Linear Regression**

- **Intercept (β₀)**  
  Represents the predicted value of Y when X = 0.  
  It shows the baseline level of the dependent variable.

- **Slope (β₁)**  
  Indicates the change in Y for a one-unit increase in X.  
  - If β₁ is positive: Y increases as X increases.  
  - If β₁ is negative: Y decreases as X increases.  
  - The magnitude of β₁ shows how strongly X influences Y.

- Overall, the coefficients describe how the independent variable impacts the dependent variable in a linear manner.