### Why linear Regression Cannot Be used For Classification

<li>The output of a linear regression model is continuous and unbounded.</li>
<li>This means that it can take any value along the real number line, which is not appropriate for categorical labels that have a finite set of discrete values.</li>

<li>Linear regression assumes that the relationship between the input variables and the output variable is linear.</li>
<li>However, in classification tasks, the relationship between the input variables and the class labels is often nonlinear, making it an unsuitable choice.</li>

<li>In contrast, classification algorithms such as logistic regression, decision trees, and SVMs are designed specifically for categorical classification tasks.</li>

![](images/linear_vs_logistic.png)

<li>The Value of Y can exceed 1 and go below 0 if we use linear regression.</li>
<li>But, we want an output as a probability of occurrence of the expected output(either 1 or 0). With this probability, we should be able to classify the output.</li>
<li>The regression line we get from Linear Regression is highly sensitive to outliers.</li>
<li>If the number of outliers relative to non-outlier data points is more than a few, then the model will be skewed away from the true underlying relationship.</li>
<li>Thus it will not do a good job in classifying two classes.</li>

### What Is Logistic Regression?

<li>Logistic regression is a statistical method used to predict the probability of an event occurring based on input variables.</li>
<li>It is a form of regression analysis, which means it estimates the relationship between the input variables and the output variable.</li>
<li>The output of logistic regression is a probability value between 0 and 1, which represents the likelihood of the event occurring given the input variables.</li>
<li>This probability can be converted into a binary classification by setting a threshold value.</li>
<li>For example, if the threshold is set to 0.5, we can classify observations with a predicted probability greater than or equal to 0.5 as "positive".</li>
<li>And those with a predicted probability less than 0.5 as "negative."</li>

<li>Logistic regression is widely used in various fields such as marketing, healthcare, and finance, to make predictions and decisions based on probabilities.</li>
<li>It is a powerful tool for analyzing data and making informed decisions based on the underlying patterns and relationships in the data.</li>

![](images/logistic_regression.png)

### What is sigmoid function?

<li>It is a mathematical function having a characteristic that can take any real value and map it to between 0 to 1 shaped like the letter “S”.</li>
<li>The sigmoid function also called a logistic function.</li>
<li>In logistic regression, sigmoid function is used to map the continuous probabilities in the range of 0 to 1 so that it can be used for binary classification.</li>

![](images/sigmoid_function.png)

![](images/logistic_eqn.png)

### Why logistic regression is called regression although it is used for classification tasks?

<li>Logistic regression gives a continuous value of the probability that y=1 given input X.</li>
<li>Since this probability is continuous (can take any value between 1 and 0), it means that the logistic regression is indeed a “regression”.</li>
<li>It only happens that, in practice, it is used for classification problems by converting these probabilities into categories (1 or 0) using a threshold value.</li>

### Loss Function For Logistic Regression

<li>The loss function used in logistic regression is the logistic loss function, also known as the cross-entropy loss function.</li>
<li>It is a measure of the difference between the predicted probabilities and the true labels of the binary dependent variable.</li>

**The logistic loss function is defined as:**

<code>
L(y, y') = -[y * log(y') + (1 - y) * log(1 - y')]
</code>

where:

<li>y is the true binary label (0 or 1) of the observation</li>
<li>y' is the predicted probability of the observation being in class 1 (i.e., the dependent variable taking the value 1)</li>
<li>The logistic loss function penalizes the model more for larger prediction errors (i.e., when the predicted probability is far from the true label).</li>
<li>When the predicted probability is close to the true label, the loss function is small, indicating a good fit.</li>

<li>The goal of logistic regression is to minimize the total logistic loss function over all the training examples.</li>
<li>This can be achieved through an optimization algorithm such as gradient descent.</li>
<li>The optimization algorithm adjusts the parameters of the logistic regression model to minimize the loss function and improve the accuracy.</li>

### Visualize Target Class

### Performance Metrics In Classification

<li>Confusion Matrix</li>
<li>Accuracy</li>
<li>Precision</li>
<li>Recall</li>
<li>F1 Score</li>

#### 1. Confusion Matrix

<li>A confusion matrix is a table that is used to define the performance of a classification algorithm.</li>
<li>It plots a table of all the predicted and actual values of a classifier.</li>

![](images/confusion_matrix.png)


<li>True Positives: Classifier predict positive and are actually positive.</li>
<li>True Negatives: Classifier predict negative and are actually negative.</li>
<li>False Positive:Classifier predict positive but are actually negative</li>
<li>False Negative: Classifier predict negative but are actually positive.</li>

### 2. Accuracy 
<li>Accuracy is the degree of closeness between a measurement and its true value.</li>
<li>Accuracy is defined as the number of correct predictions made by the model out of all possible predictions done by the model.</li>


![](images/accuracy.png)

#### 3. Precision

<li>How often the examples predicted as positive by our classifier are actually positive?</li>
<li>It is also called as Specificity or True Negative Rate.</li>
    
![](images/precision.png)

#### 4. Recall
<li>How often the examples that are actually positive are predicted as positive by our classifier.</li>
<li>It is also called as Sensitivity or True Positive Rate.</li>

![](images/recall.png)

#### 5. F1 Score

<li>F1 Score is the harmonic mean between Precision & Recall.</li>

![](images/f1_score.png)