# Question 1 : What is Simple Linear Regression (SLR)? Explain its purpose.


Simple Linear Regression (SLR) is a statistical method used to model the relationship between two continuous variables: a dependent variable (the one you want to predict) and an independent variable (the one you use to predict).

Its purpose is to find the best-fitting straight line through the data points, allowing you to understand how changes in the independent variable are associated with changes in the dependent variable, and to make predictions about the dependent variable based on the independent variable.



# Question 2: What are the key assumptions of Simple Linear Regression?

The key assumptions of Simple Linear Regression are:

Linearity: The relationship between the independent and dependent variables is linear.

Independence: The observations are independent of each other.

Homoscedasticity: The variance of the errors is constant across all levels of the independent variable.

Normality: The errors are normally distributed.

No multicollinearity: (Although this is for multiple linear regression, it's worth mentioning as it's a common assumption in regression models) The independent variables are not highly correlated with each other.

# Question 3: Write the mathematical equation for a simple linear regression model and explain each term.

The mathematical equation for a simple linear regression model is:

$y = \beta_0 + \beta_1x + \epsilon$

Where:

*   $y$: The dependent variable (the one you are trying to predict).
*   $x$: The independent variable (the one you are using to predict).
*   $\beta_0$: The y-intercept, which is the predicted value of $y$ when $x$ is 0.
*   $\beta_1$: The slope of the line, which represents the change in $y$ for a one-unit change in $x$.
*   $\epsilon$: The error term, which represents the random error or the part of $y$ that cannot be explained by the linear relationship with $x$.

# Question 4: Provide a real-world example where simple linear regression can be applied

Simple linear regression can be applied in many real-world scenarios. One common example is predicting the price of a house based on its size.

In this case:

Dependent Variable (y): Price of the house
Independent Variable (x): Size of the house (e.g., square footage)
By collecting data on the prices and sizes of many houses, you could use simple linear regression to find a linear relationship between size and price. This would allow you to:

Understand how much the price typically increases for each additional square foot.

Predict the price of a new house based on its size.
Other examples include:

Predicting a student's test score based on the number of hours they studied.
Predicting a company's sales based on its advertising expenditure.
Predicting the yield of a crop based on the amount of fertilizer used.

# Question 5: What is the method of least squares in linear regression?

The method of least squares is a way to find the best-fitting line in linear regression. It works by minimizing the sum of the squared differences between the actual observed values and the values predicted by the linear model.

Imagine you have your data points plotted on a graph. The method of least squares draws a line that is as close as possible to all those points. The "closeness" is measured by the vertical distance from each point to the line (the error), and the method aims to make the sum of the squares of these errors as small as possible. This is why it's called "least squares."

By minimizing the sum of squared errors, the method of least squares gives you the values for the slope ($\beta_1$$\beta_1$) and y-intercept ($\beta_0$$\beta_0$) that define the line that best represents the relationship between your variables.

# Question 6: What is Logistic Regression? How does it differ from Linear Regression?

Logistic Regression is a statistical method used for binary classification problems, meaning it predicts a categorical outcome with two possible classes (e.g., yes/no, true/false, spam/not spam).

Here's how it differs from Linear Regression:

Type of Dependent Variable: Linear Regression predicts a continuous dependent variable, while Logistic Regression predicts a categorical dependent variable (specifically, the probability of belonging to a certain class).

Output: Linear Regression outputs a continuous value. Logistic Regression outputs a probability (a value between 0 and 1) that can then be used to classify the outcome.

Mathematical Function: Linear Regression uses a linear function to model the relationship. Logistic Regression uses a sigmoid function (also known as the logistic function) to map the linear output to a probability between 0 and 1.

# Question 7: Name and briefly describe three common evaluation metrics for regression models.

Here are three common evaluation metrics for regression models:

1.  **Mean Absolute Error (MAE):** This is the average of the absolute differences between the predicted values and the actual values. It gives you a sense of the typical magnitude of the errors, without considering their direction.

2.  **Mean Squared Error (MSE):** This is the average of the squared differences between the predicted values and the actual values. Squaring the errors penalizes larger errors more heavily than smaller ones.

3.  **Root Mean Squared Error (RMSE):** This is the square root of the Mean Squared Error. It is in the same units as the dependent variable, making it easier to interpret than MSE.

# Question 8: What is the purpose of the R-squared metric in regression analysis?

The purpose of the R-squared metric (also known as the coefficient of determination) in regression analysis is to measure the proportion of the variance in the dependent variable that is predictable from the independent variable(s).

In simpler terms, it tells you how well the regression model fits the observed data. An R-squared value of 1 means that the model perfectly predicts the dependent variable, while an R-squared value of 0 means that the model does not explain any of the variance in the dependent variable.

It's important to note that a high R-squared doesn't necessarily mean the model is good, and a low R-squared doesn't necessarily mean the model is bad. The interpretation of R-squared depends on the specific field of study and the nature of the data.



# Question 9: Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.

In [1]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Independent variable
y = np.array([2, 4, 5, 4, 5])  # Dependent variable

# Create a linear regression model
model = LinearRegression()

# Fit the model to the data
model.fit(X, y)

# Print the slope and intercept
print("Slope:", model.coef_[0])
print("Intercept:", model.intercept_)

Slope: 0.6
Intercept: 2.2


# Question 10: How do you interpret the coefficients in a simple linear regression model?

Here's how you interpret the coefficients in a simple linear regression model ($y = \beta_0 + \beta_1x + \epsilon$$y = \beta_0 + \beta_1x + \epsilon$):

Intercept ($\beta_0$$\beta_0$): The intercept represents the predicted value of the dependent variable ($y$$y$) when the independent variable ($x$$x$) is zero. However, in many real-world scenarios, an independent variable of zero might not be meaningful or even possible. In such cases, the intercept might not have a practical interpretation, but it's still necessary for defining the regression line.
Slope ($\beta_1$$\beta_1$): The slope is the most important coefficient in simple linear regression. It represents the change in the predicted value of the dependent variable ($y$$y$) for a one-unit increase in the independent variable ($x$$x$).
For example, in the house price example (where $y$$y$ is price and $x$$x$ is size in square feet):

The intercept ($\beta_0$$\beta_0$) would be the predicted price of a house with zero square feet (which doesn't make sense in reality).
The slope ($\beta_1$$\beta_1$) would tell you how much the predicted price increases for every additional square foot of the house. If the slope is, say, 150, it means that for every extra square foot, the house price is predicted to increase by $150 (holding other factors constant, which isn't explicitly modeled in simple linear regression).
It's crucial to interpret coefficients in the context of your specific problem and data.