# What is Linear Regression?
Linear regression is a type of statistical analysis used to predict the relationship between two variables. It assumes a linear relationship between the independent variable and the dependent variable, and aims to find the best-fitting line that describes the relationship. The line is determined by minimizing the sum of the squared differences between the predicted values and the actual values.

Linear regression is commonly used in many fields, including economics, finance, and social sciences, to analyze and predict trends in data. It can also be extended to multiple linear regression, where there are multiple independent variables, and logistic regression, which is used for binary classification problems.

## Simple Linear Regression
In a simple linear regression, there is one independent variable and one dependent variable. The model estimates the slope and intercept of the line of best fit, which represents the relationship between the variables. The slope represents the change in the dependent variable for each unit change in the independent variable, while the intercept represents the predicted value of the dependent variable when the independent variable is zero.

Linear regression is a quiet and the simplest statistical regression method used for predictive analysis in machine learning. Linear regression shows the linear relationship between the independent(predictor) variable i.e. X-axis and the dependent(output) variable i.e. Y-axis, called linear regression. If there is a single input variable X(independent variable), such linear regression is called simple linear regression.

![Dependent and Independent Variable.jpg](attachment:d87ab109-551d-47b9-91a0-3bc2e9c8cba2.jpg)

**The above graph presents the linear relationship between the output(y) variable and predictor(X) variables.  The blue line is referred to as the best fit straight line. Based on the given data points, we attempt to plot a line that fits the points the best.**

To calculate best-fit line linear regression uses a traditional slope-intercept form which is given below,

**Yi = β0 + β1Xi**

where,

##### Yi = Dependent variable,
##### β0 = constant/Intercept,
##### β1 = Slope/Intercept, 
##### Xi = Independent variable.

##### Exploring ‘β1’

- If β1 > 0, then x(predictor) and y(target) have a positive relationship. That is increase in x will increase y.
- If β1 < 0, then x(predictor) and y(target) have a negative relationship. That is increase in x will decrease y.

##### Exploring ‘β0’

- If the model does not include x=0, then the prediction will become meaningless with only b0. For example, we have a dataset that relates height(x) and weight(y). Taking x=0(that is height as 0), will make equation have only b0 value which is completely meaningless as in real-time height and weight can never be zero. This resulted due to considering the model values beyond its scope.
- If the model includes value 0, then ‘b0’ will be the average of all predicted values when x=0. But, setting zero for all the predictor variables is often impossible.
- The value of b0 guarantee that residual have mean zero. If there is no ‘b0’ term, then regression will be forced to pass over the origin. Both the regression co-efficient and prediction will be biased.


This algorithm explains the linear relationship between the dependent(output) variable y and the independent(predictor) variable X using a straight line  Y= β0 + β1 X.

![Y= B0 + B1 X..jpg](attachment:a2b0cbf2-ad94-4597-8234-2f070090c528.jpg)

**The goal of the linear regression algorithm is to get the best values for B0 and B1 to find the best fit line. The best fit line is a line that has the least error which means the error between predicted values and actual values should be minimum.**

### How to Calculate Slope and Intercept in linear Regression
To calculate the intercept and slope in linear regression, you can use the method of least squares. The least squares method minimizes the sum of the squared differences between the observed dependent variable values and the predicted values based on the linear regression equation.

Here's a step-by-step guide on how to calculate the intercept and slope:

1. **Gather your data:** Collect the data for the independent variable (X) and the dependent variable (Y). Ensure that you have a sufficient number of data points to perform the regression analysis.

2. **Calculate the means:** Calculate the means (average values) of the X and Y variables. Let's denote the means as X̄ and Ȳ, respectively.

3. **Calculate the deviations:** Calculate the deviation of each X value from the mean of X (X - X̄) and the deviation of each Y value from the mean of Y (Y - Ȳ).

4. **Calculate the sums:** Calculate the sum of the product of the deviations (X - X̄) * (Y - Ȳ), as well as the sum of the squared deviations of X, Σ(X - X̄)².

5. **Calculate the slope (β₁):** The slope of the linear regression line can be calculated using the following formula:

__β₁ = Σ[(X - X̄) * (Y - Ȳ)] / Σ[(X - X̄)²]__

6. **Calculate the intercept (β₀):** Once you have the slope, you can calculate the intercept using the following formula:

**β₀ = Ȳ - β₁ * X̄**

7. **Interpret the results:** The calculated slope (β₁) represents the change in the dependent variable (Y) for each unit increase in the independent variable (X). The intercept (β₀) represents the value of the dependent variable (Y) when the independent variable (X) is zero.

By following these steps, you can calculate the intercept and slope for a linear regression analysis.


## Random Error(Residuals)

In regression, the difference between the observed value of the dependent variable(yi) and the predicted value(predicted) is called the residuals.

εi =  y**predicted** –   yi

where y**predicted** =   β0 + β1 Xi

