## Logistic Regression

**Logistic Regression** is a **supervised machine learning algorithm** used for **classification problems**.  
Despite its name, it is a **classification model**, not a regression model.  

It predicts the **probability** that an input belongs to a particular class  
(e.g., Yes/No, Spam/Not Spam, 0/1).



### Why We Cannot Use Linear Regression for Classification

Using **Linear Regression** for classification tasks is not ideal for several reasons:

1. **Output Range**  
   Linear Regression can produce values outside the range of **0 to 1**, which are not valid probabilities.  
   Logistic Regression, on the other hand, uses the **logistic (sigmoid) function** to ensure that outputs are constrained within this range.

2. **Non-linearity**  
   The relationship between the independent variables and the probability of the dependent variable is often **non-linear**.  
   Logistic Regression captures this non-linearity through the logistic function.



### Logistic Regression Model and Sigmoid Function

The logistic regression model outputs probability using the **sigmoid function**:

$$
h_\theta(x) = \frac{1}{1 + e^{-(z)}}
$$

**Output meaning:**

$
\text{If } h_\theta(x) \geq 0.5 \Rightarrow \text{Class 1}
$

$
\text{If } h_\theta(x) < 0.5 \Rightarrow \text{Class 0}
$


where
z=θ₀ + θ₁x₁ + θ₂x₂ + ... + θₙxₙ



### cost function
The cost function used in Logistic Regression is the **Log Loss (Cross-Entropy Loss)**, defined as: 
$$
J(\theta) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right]
$$
where:
- \( m \) is the number of training examples
- \( y^{(i)} \) is the actual label for the \( i^{th} \) training example
- \( h_\theta(x^{(i)}) \) is the predicted probability for the \( i^{th} \) training example    
- \( \theta \) represents the model parameters

### Performance Metrics
Common performance metrics for evaluating Logistic Regression models include:
- Accuracy
- Precision
- Recall
- F1 Score
- F beta Score

* Precision: The ratio of true positive predictions to the total predicted positives.  
  It measures the accuracy of positive predictions.  
  example: spem email detection. Reduce false positive.It means not marking important email as spam.
  $$
  \text{Precision} = \frac{TP}{TP + FP}
  $$
* Recall: The ratio of true positive predictions to the total actual positives.  
  It measures the ability of the model to find all relevant cases.  
  Example: disease detection. Reduce false negative. It means not missing any diseased patient.
  $$
  \text{Recall} = \frac{TP}{TP + FN}
  $$
* F1 Score: The harmonic mean of precision and recall.  
  It provides a balance between precision and recall.  
  $$
  \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
  $$
* F beta Score: A generalization of the F1 Score that allows weighting recall more than precision (or vice versa) based on the value of beta.  
 F beta score= (precision* recall)/(precision + recall)
