# Linear Regression and Logistic Regression

Linear Regression and Logistic Regression are two fundamental algorithms in the field of machine learning, both falling under the category of supervised learning. They are used for different types of prediction problems.

#### Linear Regression
Linear Regression is used for regression problems where the goal is to predict continuous values (e.g., predicting house prices, temperatures, sales amounts).

##### Key Points:

1. Objective: To find a linear relationship between the input variables (independent variables) and the output variable (dependent variable). The relationship is modeled through a linear equation:

𝑦 = 𝛽0+𝛽1𝑥1+𝛽2𝑥2+...+𝛽𝑛𝑥𝑛+𝜖
where 𝑦 is the dependent variable, 𝑥1,𝑥2,...,𝑥𝑛 are the independent variables, 𝛽0 is the y-intercept, 

β 
1
​
 ,β 
2
​
 ,...,βn
​
   are the coefficients, and 
𝜖
 is the error term.

2. Method: The coefficients (𝛽) are typically estimated using the least squares technique, minimizing the sum of the squared differences between the observed values and the values predicted by the linear function.

3. Assumptions:
- Linear relationship between independent and dependent variables.
- Homoscedasticity: The residuals (differences between observed and predicted values) have constant variance.
- Independence: Observations are independent of each other.
- No or little multicollinearity.
- Normal distribution of errors.
4. Evaluation: R-squared, Adjusted R-squared, Mean Squared Error (MSE), or Root Mean Squared Error (RMSE).

#### Logistic Regression
Logistic Regression is used for classification problems where the outcome is categorical 
- (e.g., email is spam or not spam, a tumor is malignant or benign).

##### Key Points:

1. Objective: To find the probability that a given input point belongs to a certain class. The output is transformed using the logistic (or sigmoid) function to ensure that the output lies between 0 and 1.

2. Method: The logistic function is:

𝑝
(
𝑋
)
=
1
1
+
𝑒
−
(
𝛽
0
+
𝛽
1
𝑥
1
+
.
.
.
+
𝛽
𝑛
𝑥
𝑛
)
p(X)= 
1+e 
−(β 
0
​
 +β 
1
​
 x 
1
​
 +...+β 
n
​
 x 
n
​
 )
 
1
​
 
where 
𝑝
(
𝑋
)
p(X) is the probability of the dependent variable equaling a 'success' or '1', 
𝑥
1
,
𝑥
2
,
.
.
.
,
𝑥
𝑛
x 
1
​
 ,x 
2
​
 ,...,x 
n
​
  are the independent variables, and 
𝛽
0
,
𝛽
1
,
.
.
.
,
𝛽
𝑛
β 
0
​
 ,β 
1
​
 ,...,β 
n
​
  are the coefficients.

3. Learning: The coefficients are typically estimated using maximum likelihood estimation (MLE), choosing the values that maximize the likelihood of observing the given sample.

4. Output Interpretation: The output is a probability. A threshold (commonly 0.5) is chosen to decide the class of the output. If 
𝑝(𝑋)>0.5
p(X)>0.5, the output is classified as 1 (positive class); if 
𝑝
(
𝑋
)
≤
0.5
p(X)≤0.5, it's classified as 0 (negative class).

5. Evaluation: Accuracy, Precision, Recall, F1-score, ROC curve, AUC score.

#### Differences between Linear and Logistic Regression
- Nature of Dependent Variable: Linear regression is used for predicting continuous variables. Logistic regression is used for predicting binary outcomes (0 or 1, true or false, yes or no).
- Function: Linear regression uses the identity function (output is a linear combination of inputs). Logistic regression uses the logistic function to ensure the output lies between 0 and 1.
- Problem Type: Linear regression is a regression algorithm, while logistic regression is a classification algorithm.
Both linear and logistic regression serve as a solid foundation for understanding more complex machine learning algorithms and are often the first algorithms that practitioners learn due to their simplicity and wide range of applications.