# Logistic Regression

## Problem Type
**Logistic Regression** is primarily used for:
- **Classification** problems (binary and multiclass)
- **Supervised** learning

### How Logistic Regression Works
- **Models the probability** that a given input belongs to a particular class (e.g., binary classification as 0 or 1).
- **Uses the sigmoid (logistic) function** to output probabilities between 0 and 1.
- **Applies a threshold** (typically 0.5) to classify data into one of the categories.
- **Maximizes the likelihood** that the observed data belongs to the predicted class via Maximum Likelihood Estimation (MLE).
- **Extends to multiclass problems** using methods like one-vs-rest (OvR) or softmax for multinomial logistic regression.

### Key Tuning Metrics
- **`penalty`:**
  - **Description:** Type of regularization to apply (`l1`, `l2`, `elasticnet`, or `none`).
  - **Impact:** Regularization helps avoid overfitting by penalizing large coefficients.
  - **Common Choices:** `l2` (Ridge) is common for Logistic Regression; `l1` (Lasso) performs feature selection.
- **`C`:**
  - **Description:** Inverse of regularization strength (`C = 1/λ`).
  - **Impact:** Smaller values increase regularization strength; larger values reduce regularization (risk of overfitting).
  - **Default:** `C = 1.0` is the default value.
- **`solver`:**
  - **Description:** Optimization algorithm used in training (`liblinear`, `lbfgs`, `sag`, `saga`, etc.).
  - **Impact:** Different solvers are better suited for specific datasets; `lbfgs` is efficient for larger datasets, while `liblinear` is good for small datasets with binary classification.
- **`max_iter`:**
  - **Description:** Maximum number of iterations taken by the solver to converge.
  - **Impact:** Increase if the model does not converge (warnings about non-convergence).
  - **Default:** 100, but may need to increase depending on dataset size.

### Pros vs Cons

| Pros                                                | Cons                                                   |
|-----------------------------------------------------|--------------------------------------------------------|
| Simple and interpretable model                      | Assumes a linear decision boundary                      |
| Effective for binary classification                 | Can struggle with complex relationships in data         |
| Outputs well-calibrated probabilities               | Sensitive to multicollinearity among features           |
| Can be regularized to avoid overfitting             | Does not perform well when classes are highly imbalanced |
| Efficient and fast to train on small to medium datasets | Assumes independence of predictors                      |

### Evaluation Metrics
- **Accuracy:**
  - **Description:** Ratio of correct predictions to total predictions.
  - **Good Value:** Higher is better, ideally above 0.8 for well-performing models.
  - **Bad Value:** Lower than 0.5 indicates poor performance (worse than random guessing for binary classification).
- **Precision:**
  - **Description:** Proportion of positive identifications that were actually correct (True Positives / (True Positives + False Positives)).
  - **Good Value:** Higher is better, especially when False Positives are costly (e.g., medical diagnosis).
  - **Bad Value:** Low values indicate many False Positives.
- **Recall (Sensitivity):**
  - **Description:** Proportion of actual positives that were correctly identified (True Positives / (True Positives + False Negatives)).
  - **Good Value:** Higher is better, especially when False Negatives are costly (e.g., detecting fraud).
  - **Bad Value:** Low values indicate many False Negatives.
- **F1 Score:**
  - **Description:** Harmonic mean of Precision and Recall; balances the trade-off between the two.
  - **Good Value:** Higher is better; values above 0.7-0.8 indicate strong performance.
  - **Bad Value:** Lower values indicate imbalanced trade-off between Precision and Recall.
- **ROC-AUC:**
  - **Description:** Area under the Receiver Operating Characteristic curve, showing the trade-off between True Positive Rate (TPR) and False Positive Rate (FPR).
  - **Good Value:** Closer to 1 is better; above 0.8 indicates good discrimination between classes.
  - **Bad Value:** Close to 0.5 suggests the model is no better than random guessing.
- **Log Loss:**
  - **Description:** Measures the uncertainty of the model's predictions, where lower log loss is better.
  - **Good Value:** Lower is better; values close to 0 indicate high-confidence, accurate predictions.
  - **Bad Value:** Higher values suggest incorrect and uncertain predictions.



In [None]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (accuracy_score, auc, confusion_matrix, f1_score,
                             precision_score, recall_score, roc_curve, log_loss)
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

In [None]:
# Load the Iris dataset
iris = load_iris()
print(iris.DESCR)

In [None]:
X, y = iris.data, iris.target

In [None]:
# we know that each flower only has 50 data point and is in order
index = 100

In [None]:
# Preprocess the data (for binary classification)
# We'll consider only two classes: Setosa (class 0) and Versicolor (class 1)
X_binary = X[0:index]
y_binary = y[0:index]

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X_binary, y_binary, test_size=0.2, random_state=42
)

In [None]:
# Initialise the logistic regression model
model = make_pipeline(
    StandardScaler(),
    LogisticRegression(
        fit_intercept=True,
        penalty="l2",
        C=1,
        solver="liblinear",
        max_iter=100,
    ),
)

In [None]:
# Fit the model to the data
model.fit(X_train, y_train)

In [None]:
# Predict new values
y_pred = model.predict(X_test)

## Model Evaluation

In [None]:
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")

In [None]:
print(f"Precision: {precision_score(y_test, y_pred)}")

In [None]:
print(f"Recall: {recall_score(y_test, y_pred)}")

In [None]:
print(f"F1 Score: {f1_score(y_test, y_pred)}")

In [None]:
print(f"Log Loss: {log_loss(y_test, y_pred)}")

## Confusion Matrix
- **Interpretation:** A confusion matrix is a table that is often used to describe the performance of a classification model. It contains information about actual and predicted classifications done by the model.
- **Good vs. Bad Values:** In a confusion matrix, the diagonal elements represent the number of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier. The higher the diagonal values of the confusion matrix the better, indicating many correct predictions.

In [None]:
print(f"Confusion Matrix:\n {confusion_matrix(y_test, y_pred)}")

In [None]:
false_positive_rate, true_positive_rate, thresholds = roc_curve(y_test, y_pred)
print(f"false_positive_rate: {false_positive_rate}")
print(f"true_positive_rate: {true_positive_rate}")
print(f"AUC-ROC: {auc(false_positive_rate, true_positive_rate)}")

## Cross Validation

In [None]:
# Perform cross-validation
scores = cross_val_score(model, X_binary, y_binary, cv=5)
print(f"Cross-validation scores: {scores}")
print(f"Average cross-validation score: {scores.mean()}")