1.What is Simple Linear Regression (SLR)? Explain its purpose

Simple Linear Regression (SLR) models the relationship between one independent (predictor) variable and one dependent (outcome) variable using a straight line, aiming to find the "line of best fit" to understand, predict, and quantify how changes in the input affect the output, like predicting sales (Y) from advertising cost (X). Its purpose is to estimate the strength, direction, and nature of this linear connection, allowing for predictions and better decision-making in fields from business to science


Purpose of Simple Linear Regression
Understanding Relationships: Quantifies the strength and direction (positive or negative) of the association between two variables.
Prediction: Predicts the value of the dependent variable for a given value of the independent variable (e.g., predicting test scores based on study hours).
Inference: Determines how much the dependent variable changes for a unit change in the independent variable (the slope coefficient).
Foundation: Serves as a basis for more complex models, like Multiple Linear Regression (with more independent variables).

 2: What are the key assumptions of Simple Linear Regression?


 Linearity: A linear relationship exists between the dependent variable (Y) and the independent variable (X).


Independence of Errors: The error terms (residuals) are not correlated with each other; each observation is independent.


Homoscedasticity (Constant Variance): The variance of the error terms is the same across all levels of the independent variable (X).


Normality of Errors: The error terms (residuals) are normally distributed, especially crucial for hypothesis testing and confidence intervals.
Random Sampling: Data points are drawn independently and identically from the population.


No Multicollinearity (for Multiple Regression, but X should vary in SLR): The independent variable (X) itself isn't constant and has variation (no perfect correlation with itself).

3.Write the mathematical equation for a simple linear regression model and
explain each term


The mathematical equation for a simple linear regression model is \(Y=\beta _{0}+\beta _{1}X+\epsilon \), where \(Y\) is the dependent variable, \(X\) is the independent variable, \(\beta _{0}\) (beta-nought) is the y-intercept, \(\beta _{1}\) (beta-one) is the slope (coefficient), and \(\epsilon \) (epsilon) is the random error term, representing the unexplained variation. This model predicts \(Y\) based on a straight-line relationship with \(X\), with parameters \(\beta _{0}\) and \(\beta _{1}\) defining the line's position and steepness, while \(\epsilon \) accounts for real-world deviations from that perfect line.


Equation Breakdown:

 \(Y\) (Dependent Variable): The outcome or response variable you are trying to predict (e.g., house price).

\(X\) (Independent Variable): The input or explanatory variable used to make the prediction (e.g., square footage of the house).\(\beta _{0}\) (Intercept): The value of \(Y\) when \(X\) is zero; where the regression line crosses the y-axis.\(\beta _{1}\) (Slope/Coefficient): The amount \(Y\) changes for a one-unit increase in \(X\). It determines the steepness and direction of the line.\(\epsilon \) (Error Term): The difference between the actual observed \(Y\) value and the \(Y\) value predicted by the model. It accounts for random fluctuations or variables not included in the model.

4.Provide a real-world example where simple linear regression can be
applied.


A classic real-world example of simple linear regression is predicting house prices based on square footage, where you model how the size (independent variable) linearly affects the selling price (dependent variable) to help buyers estimate budgets or sellers price competitively. Another is a marketing firm predicting sales based on advertising spend, fitting a line to historical data to forecast future revenue from ad investments. Here's a deeper look at the house price example: Goal: Estimate a house's price.Variables:Independent Variable (x): Square footage (size of the house).Dependent Variable (y): Selling price.Data Collection: Gather data from recently sold houses in a specific area, noting each home's size and price.Model Building: Plot this data on a scatter plot. Simple linear regression finds the best-fit straight line (equation: \(Price=\beta _{0}+\beta _{1}\times Size\)) that minimizes errors.Application: Once the line is established, you can input a new house's square footage (e.g., 1,500 sq ft) to predict its likely price, such as $150,000 if the model suggests $100 per square foot. Other Examples: Healthcare: Predicting patient expenses based on age.Energy: Forecasting energy consumption from temperature.Sports: Relating training hours to player performance metrics.

5.What is the method of least squares in linear regression?


The Least Squares Method in linear regression finds the "line of best fit" by minimizing the total squared vertical distances (residuals) between observed data points and the predicted line, essentially finding the equation (\(y=mx+b\) or \(y=a+bx\)) that best represents the data's trend, crucial for prediction and understanding relationships between variables. This involves calculating the slope (\(m\) or \(b_{1}\)) and y-intercept (\(b\) or \(a\)) using formulas derived from sums of x, y, \(x^{2}\), and \(xy\) values, ensuring the errors (observed - predicted) are as small as possible when squared and summed.


Key Benefits & Limitations
Benefit: Provides a robust, objective way to find the single best linear model for data, useful for prediction and trend analysis.
Limitation: Highly sensitive to outliers (unusual data points), which can significantly skew the regression line.

6.What is Logistic Regression? How does it differ from Linear Regression?


Logistic regression predicts categorical outcomes (like yes/no, spam/not spam) using an S-shaped sigmoid curve to output probabilities (0 to 1), while linear regression predicts continuous values (like price, temperature) using a straight line, differing primarily in their output type (categorical vs. continuous), underlying function (sigmoid vs. linear), and primary use case (classification vs. regression).

Logistic Regression


Purpose: Classification problems (e.g., predicting if a customer will buy a product or not).
Output: A probability between 0 and 1, representing the likelihood of a class.
Function: Uses the Sigmoid (logistic) function, creating an S-shaped curve to map outputs to probabilities.
Math: Models the log-odds (logit) of the outcome as a linear combination of predictors, then transforms it.
Method: Uses Maximum Likelihood Estimation (MLE) to find coefficients.

Linear Regression



Purpose: Regression problems (e.g., predicting house prices, sales figures).Output: A continuous numerical value (can be positive or negative infinity).Function: Models a straight-line relationship between variables.Math: A simple linear equation (\(y=mx+b\)).Method: Uses Ordinary Least Squares (OLS) to minimize errors.


Key Differences Summarized


Problem Type: Logistic for classification, Linear for regression.
Dependent Variable: Logistic for categorical, Linear for continuous.
Curve/Line: Logistic uses an S-curve, Linear uses a straight line.
Output Range: Logistic 0-1 (probability), Linear (-∞ to +∞).
Estimation: Logistic uses MLE, Linear uses OLS.

 7: Name and briefly describe three common evaluation metrics for regression
models.


Three common regression metrics are Mean Absolute Error (MAE) for average absolute error, Mean Squared Error (MSE) which heavily penalizes large errors, and R-Squared (\(R^{2}\)) indicating the proportion of variance explained by the model; all help assess model performance, with lower MAE/MSE and higher \(R^{2}\) generally signifying better fits. Here are brief descriptions of three key metrics: Mean Absolute Error (MAE):Description: The average of the absolute differences between predicted and actual values.Use: Provides a straightforward measure of prediction error in the same units as the target variable, robust to outliers.Mean Squared Error (MSE):Description: Calculates the average of the squared differences between predicted and actual values.Use: Penalizes larger errors much more heavily than smaller ones due to the squaring, making it sensitive to outliers.R-Squared (\(R^{2}\)) (Coefficient of Determination):Description: Represents the proportion of the variance in the dependent variable that can be explained by the independent variables in the model, ranging from 0 to 1.Use: A higher \(R^{2}\) (closer to 1) indicates a better model fit, showing how well the model explains the data's variability.

 8: What is the purpose of the R-squared metric in regression analysis?


 The purpose of R-squared (Coefficient of Determination) in regression analysis is to measure the proportion of the variance in the dependent variable that is predictable from the independent variables, essentially showing how well the model fits the observed data, with values from 0 (no fit) to 1 (perfect fit) indicating the percentage of explained variation. It's a key "goodness-of-fit" metric that helps evaluate the model's strength, showing how much of the data's scatter is explained by the regression line or curve.
Key Functions & Interpretations:
Measures Fit: R-squared tells you how closely the data points cluster around the regression line.
Percentage of Explained Variance: A value of 0.75 (or 75%) means 75% of the variation in the dependent variable is explained by the model's independent variables.
Model Evaluation: Higher R-squared values (closer to 1 or 100%) generally suggest a better, more useful model, while lower values (closer to 0) indicate a poorer fit.
Range: Values typically range from 0 to 1, but can be negative if the model performs worse than just using the mean.
In Simple Terms:

9.Write Python code to fit a simple linear regression model using scikit-learn
and print the slope and intercept.
(Include your Python code and output in the code box below.)

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression

# 1. Prepare data (X must be 2D for scikit-learn)
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])

# 2. Create and fit the model
model = LinearRegression()
model.fit(X, y)

# 3. Retrieve and print results
slope = model.coef_[0]
intercept = model.intercept_

print(f"Slope: {slope}")
print(f"Intercept: {intercept}")


10.How do you interpret the coefficients in a simple linear regression model?



In a simple linear regression model, typically expressed as \(y=\beta _{0}+\beta _{1}x+\epsilon \), the coefficients are interpreted as follows: Slope (\(\beta _{1}\)): This represents the average change in the dependent variable (\(y\)) for every one-unit increase in the independent variable (\(x\)).Positive Slope: Indicates a direct relationship; as \(x\) increases, \(y\) tends to increase.Negative Slope: Indicates an inverse relationship; as \(x\) increases, \(y\) tends to decrease.Intercept (\(\beta _{0}\)): This represents the expected value of the dependent variable (\(y\)) when the independent variable (\(x\)) is zero.Practicality: While it serves as a mathematical baseline, it may not have a meaningful real-world interpretation if \(x=0\) is outside the range of observed data or physically impossible (e.g., a person with zero height).