Question 1: What is Simple Linear Regression (SLR)? Explain its purpose?

 Answer
  
Simple Linear Regression (SLR) is a statistical method used to model the relationship between a single independent variable (predictor) and a single dependent variable (outcome). It assumes that there is a linear relationship between these two variables.

Purpose:

- Predictive Modeling: To predict the value of the dependent variable based on a given value of the independent variable.

- Relationship Strength: To determine how strongly the independent variable affects the dependent variable.

- Trend Analysis: To understand whether the dependent variable increases or decreases as the independent variable changes.

Question 2: What are the key assumptions of Simple Linear Regression?

Answer

For SLR to provide reliable results, four primary assumptions (often remembered by the acronym LINE) must be met:
1. Linearity: The relationship between the independent variable $X$ and the dependent variable $Y$ is linear.
2. Independence: The observations in the dataset are independent of each other (no autocorrelation).
3. Normality: For any fixed value of $X$, the residuals (errors) are normally distributed.
4. Homoscedasticity: The variance of the residual terms is constant across all levels of the independent variable.

Question 3: Write the mathematical equation for a simple linear regression model and explain each term ?

Answer

The mathematical equation for a Simple Linear Regression model is:$$Y = \beta_0 + \beta_1 X + \epsilon$$
Explanation of terms:
1. $Y$: The Dependent Variable (the target we want to predict).
2. $X$: The Independent Variable (the predictor used to make the prediction).
3. $\beta_0$: The Y-intercept, representing the expected value of $Y$ when $X = 0$.
4. $\beta_1$: The Slope coefficient, representing the average change in $Y$ for every unit increase in $X$.
5. $\epsilon$: The Error term (residual), which accounts for the variation in $Y$ that cannot be explained by the linear relationship with $X$.

Question 4: Provide a real-world example where simple linear regression can be applied?

Answer

Example: Predicting Salary based on Years of Experience.A company can use SLR to determine a fair salary for a new hire.
1. Independent Variable ($X$): Years of professional experience.
2. Dependent Variable ($Y$): Annual salary. By analyzing historical data of current employees, the model can predict the salary ($Y$) for a candidate with a specific number of years of experience ($X$).

Question 5: What is the method of least squares in linear regression?

Answer

- The Method of Least Squares (or Ordinary Least Squares - OLS) is the standard technique used to find the best-fitting line through a set of data points.

- It works by minimizing the sum of the squares of the vertical deviations (residuals) between each data point and the fitted line. By squaring the differences, the method ensures that positive and negative errors do not cancel each other out and gives more weight to larger outliers.

Question 6: What is Logistic Regression? How does it differ from Linear Regression?

Answer

Logistic Regression is a supervised machine learning algorithm used for classification tasks, specifically to predict the probability of a categorical dependent variable.

- - What is Logistic Regression?

1. It predicts the likelihood of an event occurring (e.g., whether an email is "Spam" or "Not Spam").

2. It outputs a probability value between 0 and 1.

3. It uses the Sigmoid Function (or Logistic function) to map any real-valued number into a probability.

4. It is widely used in binary classification (two classes) but can be extended to multi-class problems.

- - Key Differences from Linear Regression
1. Objective: Linear Regression is used to predict a continuous numerical value (like house prices), whereas Logistic Regression is used for classification to predict discrete classes (like Yes/No).
2. Output Type: The output of Linear Regression is continuous and can range from $-\infty$ to $+\infty$, while Logistic Regression output is a probability strictly between 0 and 1.
3. Curve Shape: Linear Regression fits a straight line to the data. Logistic Regression fits an S-shaped curve (Sigmoid curve).
4. Mathematical Function: Linear Regression uses a linear equation ($y = mx + c$), while Logistic Regression uses the Sigmoid Function ($y = \frac{1}{1 + e^{-z}}$).
5. Estimation Method: Linear Regression typically uses Ordinary Least Squares (OLS) to minimize the sum of squared errors. Logistic Regression uses Maximum Likelihood Estimation (MLE) to find the parameters that best explain the observed data.
6. Loss Function: Linear Regression uses Mean Squared Error (MSE), whereas Logistic Regression uses Log Loss (also known as Cross-Entropy Loss).
7. Assumptions: Linear Regression assumes the dependent variable follows a Normal distribution, while Logistic Regression assumes a Binomial distribution.
8. Thresholding: Logistic Regression requires a threshold (e.g., 0.5) to convert a probability into a final class label; Linear Regression does not use thresholds.

Question 7: Name and briefly describe three common evaluation metrics for regression models ?

Answer

- Mean Absolute Error (MAE): The average of the absolute differences between predicted and actual values. It is easy to interpret as it is in the same units as the target variable.

- Mean Squared Error (MSE): The average of the squared differences between predicted and actual values. It penalizes larger errors more heavily than MAE.

- Root Mean Squared Error (RMSE): The square root of the MSE. It brings the error metric back to the original units of the target variable, making it more intuitive while still penalizing large outliers.

Question 8: What is the purpose of the R-squared metric in regression analysis ?

Answer

R-squared ($R^2$), also known as the Coefficient of Determination, measures the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
1. Scale: It ranges from 0 to 1 (or 0% to 100%).
2. Interpretation: An $R^2$ of 0.85 means that 85% of the variation in the output can be explained by the model's inputs.It essentially tells you how well the regression line "fits" the actual data points.

In [2]:
# Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept ?

#Answer

import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data: X must be 2D for scikit-learn
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# Initialize and fit the model
model = LinearRegression()
model.fit(X, y)

# Retrieve coefficients
slope = model.coef_[0]
intercept = model.intercept_

print(f"Slope (Beta 1): {slope:.2f}")
print(f"Intercept (Beta 0): {intercept:.2f}")

Slope (Beta 1): 0.60
Intercept (Beta 0): 2.20


Question 10: How do you interpret the coefficients in a simple linear regression model ?

Answer

1. Intercept ($\beta_0$): This represents the baseline value. If the independent variable ($X$) is zero, the intercept is the predicted value of $Y$. (Note: In some contexts, like predicting weight based on height, the intercept may not have a practical physical meaning).

2. Slope ($\beta_1$): This represents the rate of change. For every one-unit increase in the independent variable ($X$), the dependent variable ($Y$) is expected to change by the amount of the slope coefficient (increasing if positive, decreasing if negative).