# 1. What is Simple Linear Regression (SLR)? Explain its purpose.
  - Simple Linear Regression (SLR) is a statistical method used to understand and model the relationship between two variables — one independent variable (predictor) and one dependent variable (response). The main idea is to find a straight line, known as the regression line, that best fits the data points in a scatter plot and can be used to predict the value of the dependent variable based on the value of the independent variable. Mathematically, it is represented by the equation **Y = β₀ + β₁X + ε**, where **Y** is the dependent variable, **X** is the independent variable, **β₀** is the intercept, **β₁** is the slope of the line, and **ε** is the error term that accounts for the variation not explained by the model. The **purpose** of simple linear regression is to identify and quantify the relationship between the variables, determine how changes in one variable affect the other, and make accurate predictions or forecasts. For example, it can be used to predict sales based on advertising expenditure, or a student’s score based on hours of study. It also helps in understanding the strength and direction of the relationship, enabling better decision-making and insight into underlying data trends.


# 2.  What are the key assumptions of Simple Linear Regression?
   - The key assumptions of Simple Linear Regression are important to ensure the validity and accuracy of the model’s results. First, there should be a **linear relationship** between the independent variable (X) and the dependent variable (Y), meaning the change in Y is proportional to the change in X. Second, the **errors or residuals** (differences between observed and predicted values) should have a **mean of zero** and be **normally distributed**, ensuring unbiased estimates. Third, there must be **homoscedasticity**, which means the variance of errors remains constant across all values of X; if not, the model suffers from heteroscedasticity. Fourth, the **observations should be independent**, implying that the residuals of one observation are not correlated with those of another. Lastly, there should be **no significant outliers or influential data points** that distort the regression line. Violating these assumptions can lead to misleading conclusions, unreliable predictions, and poor model performance.


# 3. Write the mathematical equation for a simple linear regression model and explain each term.
  - The mathematical equation for a simple linear regression model is expressed as **Y = β₀ + β₁X + ε**, where each term has a specific meaning. Here, **Y** represents the dependent variable or the outcome we want to predict, and **X** is the independent variable or predictor used to make the prediction. **β₀** (beta zero) is the **intercept**, which indicates the value of Y when X equals zero—it represents the point where the regression line crosses the Y-axis. **β₁** (beta one) is the **slope coefficient**, showing how much Y changes for every one-unit change in X; it captures the strength and direction of the relationship between the two variables. Finally, **ε** (epsilon) is the **error term**, representing the difference between the actual observed values and the values predicted by the model. This error accounts for all other factors that may influence Y but are not included in the model. Together, these components describe how X and Y are related in a linear form.


# 4. Provide a real-world example where simple linear regression can be applied.
  - A real-world example of applying simple linear regression is in predicting a student’s exam score based on the number of hours they study. In this case, the **independent variable (X)** is the number of study hours, and the **dependent variable (Y)** is the exam score obtained by the student. By collecting data from several students on how many hours they studied and their corresponding scores, a regression line can be drawn to show the relationship between these two variables. The model can then be used to predict the likely score for any given number of study hours. For instance, if the regression equation is **Score = 30 + 5 × (Hours Studied)**, it suggests that for each additional hour of study, the student’s score increases by 5 marks, and if a student studies zero hours, the predicted score would be 30 marks. This example shows how simple linear regression helps in making predictions, understanding trends, and supporting decisions based on real-world data.


# 5. What is the method of least squares in linear regression?
   - The method of least squares is a mathematical technique used in linear regression to find the best-fitting line through a set of data points by minimizing the sum of the squared differences between the observed and predicted values. These differences, called **residuals**, represent the errors between the actual data points and the values estimated by the regression line. By squaring the residuals, both positive and negative errors are treated equally, and larger errors are given more weight. The goal of the least squares method is to find the values of the regression coefficients (the slope and intercept) that result in the smallest possible total of these squared errors. This ensures that the regression line is positioned in a way that best represents the overall trend of the data. The method of least squares provides the most accurate and unbiased estimates for the linear relationship between variables when the assumptions of linear regression are satisfied.


# 6. What is Logistic Regression? How does it differ from Linear Regression?
   - Logistic Regression is a statistical method used for predicting the probability of a binary outcome, meaning it is applied when the dependent variable can take only two possible values such as **yes/no**, **pass/fail**, or **0/1**. Unlike Linear Regression, which predicts a continuous numerical value, Logistic Regression predicts the likelihood that an observation belongs to a particular category. It uses the **logistic (sigmoid) function** to transform the output of a linear equation into a value between 0 and 1, representing probability. The key difference between Logistic and Linear Regression is that Linear Regression assumes a straight-line relationship between the variables and produces continuous output, whereas Logistic Regression models a non-linear relationship and produces a categorical or probabilistic output. In essence, Linear Regression is used for regression problems, while Logistic Regression is used for **classification** problems.


# 7. Name and briefly describe three common evaluation metrics for regression models.
   - Three common evaluation metrics for regression models are **Mean Absolute Error (MAE)**, **Mean Squared Error (MSE)**, and **R-squared (R²)**. **MAE** measures the average absolute difference between the predicted and actual values, showing how far the model’s predictions are from the true outcomes on average. It is simple to interpret and gives equal weight to all errors. **MSE**, on the other hand, calculates the average of the squared differences between predicted and actual values, penalizing larger errors more heavily and providing a clear sense of how well the model fits the data. Lastly, **R-squared (R²)**, also known as the coefficient of determination, explains the proportion of the variance in the dependent variable that can be explained by the independent variable. A higher R² value indicates that the model explains a greater portion of the data’s variability, meaning it fits the data better. These metrics together help assess the accuracy and effectiveness of regression models.


# 8. What is the purpose of the R-squared metric in regression analysis?
  - The purpose of the **R-squared (R²)** metric in regression analysis is to measure how well the independent variable explains the variability of the dependent variable in the model. It represents the proportion of the total variation in the dependent variable that can be accounted for by the regression line. The value of R² ranges from **0 to 1**, where **0** indicates that the model does not explain any variation in the dependent variable, and **1** indicates that the model explains all the variation perfectly. For example, an R² value of 0.8 means that 80% of the variation in the dependent variable is explained by the independent variable, while the remaining 20% is due to other factors or random errors. Thus, R-squared helps evaluate the **goodness of fit** of a regression model, showing how well the data points align with the predicted values. However, it does not indicate whether the model’s predictions are unbiased or if the relationship is causal.


# 9. Write Python code to fit a simple linear regression model using scikit-learn and print the slope and intercept.
  - fitting a simple linear regression model using scikit-learn involves finding the best-fitting straight line that represents the relationship between an independent variable (X) and a dependent variable (Y). The algorithm uses the method of least squares to determine the values of the slope (coefficient) and intercept that minimize the sum of the squared differences between the actual and predicted values of Y. The slope indicates how much the dependent variable changes with a one-unit change in the independent variable, while the intercept represents the expected value of Y when X equals zero. Scikit-learn’s LinearRegression() class automates this process by calculating these parameters based on the input data. Once the model is fitted, it can be used to make predictions, analyze relationships, and evaluate performance using various metrics. This approach provides a simple yet powerful way to understand and predict linear relationships in data.

In [1]:
# Import necessary libraries
from sklearn.linear_model import LinearRegression
import numpy as np

# Example data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Independent variable
y = np.array([2, 4, 5, 4, 5])                 # Dependent variable

# Create and fit the model
model = LinearRegression()
model.fit(X, y)

# Print the slope (coefficient) and intercept
print("Slope (Coefficient):", model.coef_[0])
print("Intercept:", model.intercept_)


Slope (Coefficient): 0.6
Intercept: 2.2


# 10. How do you interpret the coefficients in a simple linear regression model?
  - In a simple linear regression model, the **coefficients** represent the relationship between the independent variable (X) and the dependent variable (Y). The equation of the model is **Y = β₀ + β₁X + ε**, where **β₀** is the **intercept** and **β₁** is the **slope (coefficient)**. The **intercept (β₀)** indicates the predicted value of Y when X is zero—it shows where the regression line crosses the Y-axis. The **slope (β₁)** shows how much Y is expected to change for a one-unit increase in X. A positive slope means that as X increases, Y also increases, while a negative slope means that as X increases, Y decreases. For example, if β₁ = 2, it means that for every one-unit increase in X, the value of Y increases by 2 units on average. Interpreting these coefficients helps understand the direction and strength of the relationship between the variables, providing insights into how changes in one factor influence another.
