### Question 1
**What is Simple Linear Regression (SLR)? Explain its purpose.**

- **Simple Linear Regression** (SLR) is a foundational statistical and machine learning technique that models the relationship between two continuous variables: one independent variable (predictor) and one dependent variable (response). The core idea is to fit a straight line to observed data points so that it best represents how the dependent variable changes as the independent variable varies. The line is described by the equation Y =β0+β1X +ϵ, where Y is the output, X is the feature, β0 is the intercept, β1 is the slope, and
ϵ captures errors or residuals.
- **The purpose** of SLR is to make quantitative predictions, uncover relationships, and forecast outcomes in fields like economics (predicting income by education level), biology, or engineering. SLR enables analysts to estimate the effect of one variable on another and quantify significance using metrics such as R-squared or p-values.

### Question 2
**What are the key assumptions of Simple Linear Regression?**

- Linearity: The relationship between independent and dependent variables must be linear; if not, the model's predictive accuracy and interpretability diminish.

- Independence: Observations and their error terms should be independent. Violation (like autocorrelation in time series) causes unreliable inference.

- Homoscedasticity: The variance of errors should remain constant across all levels of the independent variable. Heteroscedasticity (changing variance) leads to inefficient estimates.

- Normality of Errors: The residuals should be normally distributed, important for valid hypothesis testing and confidence intervals.

Checking these assumptions via residual plots, statistical tests (e.g., Durbin-Watson for independence), and transformations (e.g., log) is critical in practical regression modeling.

### Question 3
**Write the mathematical equation for a simple linear regression model and explain each term.**

The mathematical form of a simple linear regression model is:

Y = β0 + β1X + ϵ
Where:
- Y: Dependent variable (or response)—what you aim to predict.
- β0: Intercept—the expected value of Y when X=0
- β1 : Slope—shows how much Y will change for every one-unit increase in X.
- X: Independent variable (or predictor).
- ϵ: The error term—captures random variation or noise not explained by the model.

Together, these terms quantitatively describe the relationship and allow for hypothesis testing regarding the impact of X on Y using real data.


### Question 4
**Provide a real-world example where simple linear regression can be applied.**

A classic real-world application of simple linear regression is in real estate: predicting the sale price of houses based on their size (square footage). By collecting data from previous transactions, an analyst fits a line where the independent variable is house size and the dependent variable is price. This allows property agents to estimate the expected price of any new house simply by its square footage. Other examples include predicting crop yield by rainfall, employee productivity by hours of concentrated work, and forecasting sales by ad spend. The usability and interpretability of SLR make it widely applicable in these scenarios.

### Question 5
**What is the method of least squares in linear regression?**

The method of least squares is the algorithmic backbone of linear regression model fitting. It determines the best-fitting line by minimizing the sum of squared differences between actual observed values and their predicted values by the model. Mathematically, this involves solving for values of
β0 and β1 that minimize the objective function:
$$
Sum of Squared Errors = \sum_{i=1}^{n} (y_i - (\hat{y_i}))^2 = \sum_{i=1}^{n} (y_i - (b_0 + b_1x_i))^2
$$

The least squares estimates are found using calculus (derivatives/equations). This method ensures unique, unbiased estimates and forms the basis of regression in statistics and machine learning for model calibration and solution computation.




### Question 6
**What is Logistic Regression? How does it differ from Linear Regression?**

Logistic regression is a supervised machine learning algorithm used for binary classification problems, not regression. Unlike linear regression which estimates continuous outcomes, logistic regression predicts the probability that a categorical dependent variable takes a particular value—often yes/no or 1/0. It uses the logistic (sigmoid) function to map predictions between 0 and 1:
$$
P(Y = 1 \mid X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 X)}}
$$
Key differences include the type of output (probability/classification for logistic vs continuous values for linear), loss function (likelihood for logistic vs squared error for linear), and underlying assumptions. Logistic regression is widely used in medical diagnosis (disease present/absent), credit scoring, and spam detection.

### Question 7
**Name and briefly describe three common evaluation metrics for regression models.**
Three common evaluation metrics for regression models are:

- Mean Squared Error (MSE): The average of squared differences between actual and predicted values. Penalizes larger errors and is sensitive to outliers.

- Mean Absolute Error (MAE): The average absolute difference between actual and predicted outcomes. More robust to outliers than MSE but treats all errors equally.

- R-squared (Coeff. of Determination): Represents the proportion of variance in the dependent variable explained by the independent variable(s). Higher values indicate the model fits data well.
Choosing between these metrics depends on problem context, sensitivity to large errors, and interpretability needs for business or scientific settings.

### Question 8
**What is the purpose of the R-squared metric in regression analysis?**

The R-squared metric (R^2) quantifies how well the regression model explains the variance of the dependent variable. It ranges from 0 to 1, with values closer to 1 indicating that a greater proportion of variance is accounted for by the model.

R^2 helps modelers determine explanatory power and whether adding more features improves predictive capacity. It also aids in hypothesis testing and is widely referenced in both statistical reporting and business analytics to justify model selection and performance claims.

### Question 9
**Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept. Include your Python code and output in the code box below.**

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Example Data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
Y = np.array([2, 4, 5, 4, 5])

model = LinearRegression()
model.fit(X, Y)

print('Slope (Coefficient):', model.coef_[0])
print('Intercept:', model.intercept_)

Slope (Coefficient): 0.6
Intercept: 2.2


### Question 10
**How do you interpret the coefficients in a simple linear regression model?**

In simple linear regression, the model coefficient (slope, β1) quantifies the average change in the dependent variable
Y for each one-unit increase in the independent variable X. For example, if the coefficient is 2.5, every additional unit in X is associated with a 2.5 unit increase in Y, on average. The intercept (β0) indicates the expected value of Y when X=0. Proper interpretation also requires considering the scale, domain context, significance (t-tests), and confidence intervals to assess reliability. Coefficient interpretation is vital for drawing insights, decision-making, and communicating model results to non-technical stakeholders.