#                         Assignment-
#Supervised Learning: Regression Models and Performance Metrics



Q1-  What is Simple Linear Regression (SLR)? Explain its purpose
 >   Simple Linear Regression (SLR) is a statistical technique used to study the relationship between two variables — one independent variable (X) and one dependent variable (Y). It fits a straight line through the data points to predict the value of Y based on X. The main goal of SLR is to understand how changes in the independent variable affect the dependent variable. It helps in forecasting, trend analysis, and decision-making. For example, predicting a person’s weight based on their height. The fitted line is usually expressed as
𝑌
=
𝑎
+
𝑏
𝑋
+
𝜀
Y=a+bX+ε, where
𝑎
a is the intercept,
𝑏
b is the slope, and
𝜀
ε is the error term.

Q2 - What are the key assumptions of simple linear Regression?
   - Here are the key assumptions of simple linear regression:

1.  **Linearity:** The relationship between the independent variable (X) and the dependent variable (Y) must be linear. This means the data points should roughly form a straight line when plotted.

2.  **Independence of Errors:** The errors (residuals) should be independent of each other. This means that the error for one data point should not be related to the error for any other data point.

3.  **Homoscedasticity (Constant Variance of Errors):** The variance of the errors should be constant across all levels of the independent variable. This means the spread of the residuals should be roughly the same throughout the range of X values.

4.  **Normality of Errors:** The errors (residuals) should be approximately normally distributed. This assumption is more important for smaller sample sizes and for making inferences about the population parameters (like confidence intervals and p-values).

5.  **No Multicollinearity:** While simple linear regression only involves one independent variable, this assumption is crucial in multiple linear regression and states that the independent variables should not be highly correlated with each other.

(Q3 – Write the mathematical equation for a simple linear regression model and explain each term
  - The mathematical equation for Simple Linear Regression is:

      𝑌=𝑎+𝑏𝑋+
        Y=a+bX+ε

     Where:

     Y = Dependent variable (the outcome or value we want to predict)

     X = Independent variable (the input or predictor)

     a = Intercept (the value of Y when X = 0)

     b = Slope (the rate of change in Y for a one-unit increase in X)

     ε (epsilon) = Error term (the difference between actual and predicted values)

     This equation defines a straight line that best fits the relationship between X and Y, minimizing the error between observed and predicted values.

Q4 – Provide a real-world example where simple linear regression can be applied
   -  A common real-world example of Simple Linear Regression is predicting a person’s weight based on their height. As height increases, weight generally tends to increase in a roughly linear pattern. By collecting data from several individuals, we can build a regression model to estimate weight (dependent variable) from height (independent variable). Similarly, it can be used to predict house prices based on size, sales revenue based on advertising spend, or temperature effects on electricity consumption. This helps in forecasting and understanding relationships between two measurable variables.

Q5 – What is the method of least squares in linear regression?
  -  The method of least squares is a mathematical approach used to find the best-fitting line in a regression model. It works by minimizing the sum of the squares of the differences between the actual data points and the values predicted by the line. These differences are called residuals or errors. By squaring them, both positive and negative errors are treated equally. The line with the smallest total squared error is considered the “best fit.” This method ensures the most accurate representation of the relationship between the independent and dependent variables.

Q6 – What is Logistic Regression? How does it differ from Linear Regression?
  -  Logistic Regression is a supervised learning algorithm used for classification problems, where the output variable is categorical (e.g., Yes/No, 0/1, True/False). It predicts the probability that an observation belongs to a particular class using the sigmoid (logistic) function, which outputs values between 0 and 1.

  The main difference from Linear Regression is that linear regression predicts continuous values, while logistic regression predicts probabilities or binary outcomes. Linear regression uses a straight-line relationship, whereas logistic regression uses an S-shaped curve to model the data.

Q7 – Name and briefly describe three common evaluation metrics for regression models
   - Ans
     1-Mean Absolute Error (MAE): It measures the average absolute difference between predicted and actual values. Lower MAE means better accuracy.

     2- Mean Squared Error (MSE): It calculates the average of the squared differences between predicted and actual values. It penalizes larger errors more strongly.

     3- R-squared (Coefficient of Determination): It indicates how well the independent variable explains the variation in the dependent variable. An R² value closer to 1 means a better model fit.

Q8 – What is the purpose of the R-squared metric in regression analysis?
  -  The R-squared (R²) metric measures how well the independent variable(s) explain the variation in the dependent variable. It represents the proportion of variance in the dependent variable that can be predicted from the independent variable. R² values range from 0 to 1, where 0 means the model explains none of the variation, and 1 means it explains all the variation. A higher R² indicates a better fit of the model to the data. However, it doesn’t indicate whether the model is biased or if it fits new data well.

Q9 – Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept)

In [1]:
# Import required libraries
from sklearn.linear_model import LinearRegression
import numpy as np

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)   # Independent variable
y = np.array([2, 4, 5, 4, 5])                  # Dependent variable

# Create and train the model
model = LinearRegression()
model.fit(X, y)

# Print slope and intercept
print("Slope (Coefficient):", model.coef_[0])
print("Intercept:", model.intercept_)


Slope (Coefficient): 0.6
Intercept: 2.2


Q10 - How do you interpret the coefficients in a simple linear regression model?
  -   In a simple linear regression model, the coefficients represent the following:

      Intercept (a): This is the predicted value of the dependent variable (Y) when the independent variable (X) is zero. It's the point where the regression line crosses the Y-axis. In some cases, it might not have a meaningful interpretation if X cannot be zero in the real world.

      Slope (b): This represents the change in the dependent variable (Y) for a one-unit increase in the independent variable (X). It indicates the direction and strength of the linear relationship between the two variables. A positive slope means that as X increases, Y tends to increase. A negative slope means that as X increases, Y tends to decrease. The magnitude of the slope indicates how much Y changes for each unit change in X.

      For example, in the code provided, the slope is 0.6 and the intercept is 2.2. This means that for every one-unit increase in X, Y is predicted to increase by 0.6 units. When X is 0, Y is predicted to be 2.2.

