# Automating Decisions: A Deep Dive into Classification

In the realm of data science, we often aim to automate decision-making processes. Whether it's filtering spam emails, predicting customer churn, or determining the likelihood of a user clicking on an ad, these scenarios all fall under the umbrella of classification problems. This is a form of supervised learning where we train a model using data with known outcomes, then apply it to new data to predict unknown outcomes.

## What is Classification?

Classification is fundamentally about predicting whether a record belongs to a certain category. This could be a binary outcome (yes/no, true/false, 0/1) or one of several categories. The basic approach involves:

1. Establishing a cutoff probability for the class of interest
2. Estimating the probability of a record belonging to that class using a model
3. Assigning a new record to the class of interest if its estimated probability exceeds the cutoff probability

## Naive Bayes Classification

Naive Bayes provides a straightforward method for classification, starting with the idea of exact Bayesian classification:

1. Find all records with the same predictor profile
2. Determine the most prevalent class among those records
3. Assign that class to the new record

However, this approach is often impractical. Instead, the Naive Bayes algorithm takes a simplifying "naive" approach, treating each predictor as independent given the class. This involves these steps:

1. For each class, calculate conditional probabilities, such as P(Xj | Y=i), for each predictor variable Xj given a class Y=i
2. Multiply these probabilities by each other, then by the proportion of records belonging to Y=i
3. Repeat this for all classes
4. Estimate a probability for each outcome by dividing the calculated value for a class by the sum of such values for all classes
5. Assign the record to the class with the highest probability

The core formula representing the probability of observing outcome Y=i given a set of predictor variables is:

P(Y = i | X1, ... Xp) = [P(Y=i) * Π P(Xj | Y=i) ] / [Σ P(Y=i) * Π P(Xj | Y=i)]

Where:
- P(Y=i) is the prior probability of the outcome i
- P(Xj | Y=i) is the conditional probability of the predictor Xj given the outcome i

*Important Note: The standard Naive Bayes algorithm requires categorical variables. Numeric variables must be converted or handled using probability models like the normal distribution.*

## Linear Discriminant Analysis (LDA)

LDA is another method for classification that aims to find a linear combination of features that best separates classes. It assumes that each class has a multivariate normal distribution. The goal is to maximize the distance between class means while minimizing the variance within each class.

The core idea of LDA is to derive discriminant functions that are linear combinations of the predictor variables. These functions create boundaries that best separate the different classes in the data.

*LDA produces decision boundaries in the predictor space that are linear, this is illustrated in Figure 1.*

| ![LDA](figure/c5/fig5-1.png) | 
|:--:| 
| *Figure 1.  LDA prediction of loan default using two variables: a score of the borrower’s creditworthiness and the payment-to-income ratio* |

## Logistic Regression

Logistic regression is a versatile method for classification, especially when you want probabilities as outputs. Unlike linear regression, it models the log-odds of a binary outcome, allowing for non-linear relationships between predictors and the outcome. The basic logistic regression equation for a single predictor is:

log(odds) = b0 + b1X

Where:
- odds = P(Y=1) / P(Y=0)

This log-odds, the logit, can be converted to a probability:

P(Y=1) = 1 / (1 + e^(-(b0 + b1X)))

This is also called the inverse logit function.

*Maximum Likelihood Estimation (MLE): Logistic regression models are fit using MLE, which iteratively finds the model parameters that best fit the observed outcomes.*

Figure 2 illustrates partial residuals from logistic regression. This type of plot is used to examine residuals from a logistic regression and is helpful for model assessment.

| ![Partial_res](figure/c5/fig5-4.png) | 
|:--:| 
| *Figure 2.  Partial residuals from logistic regression* |


## Evaluating Classification Models

Once you've fit a classification model, you'll need to evaluate its performance. Key metrics and concepts include:

- **Confusion Matrix**: A table that summarizes the performance of a classification model by showing the counts of true positives, true negatives, false positives, and false negatives. Figure 3 provides an example.

| ![Partial_res](figure/c5/fig5-5.png) | 
|:--:| 
| *Figure 3.  Confusion matrix for a binary response and various metrics* |

- **Accuracy**: The proportion of correctly classified instances, but it can be misleading when classes are imbalanced.
- **Precision**: The proportion of true positives among all predicted positives.
- **Recall**: The proportion of true positives among all actual positives.
- **F1-score**: The harmonic mean of precision and recall.
- **ROC Curve**: A plot of the true positive rate against the false positive rate for various classification thresholds.
- **AUC**: Area under the ROC curve, a single value to compare models. It is a measure of the model's ability to distinguish between classes.

## The Rare Class Problem

In many classification problems, one class is far more prevalent than the other. This is often called the rare class problem. In such cases, accuracy may not be a good measure and you should consider using other metrics or techniques for model evaluation. Some strategies to address the rare class problem include:

- **Downsampling**: Reducing the number of instances in the prevalent class
- **Oversampling**: Increasing the number of instances in the rare class, often by bootstrapping
- **Up weighting/Down weighting**: Assigning different weights to the classes in the model

## Exploring Predictions

It is crucial to explore the predictions from a classification model. Figure 4 compares the decision rules from different models applied to the same loan data. Exploratory analysis of predicted values is always essential to understand how the model is working.

| ![different_rules](figure/c5/fig5-8.png) | 
|:--:| 
| *Figure 4. Comparison of the classification rules for four different methods* |

## Key Takeaways

- Classification is a supervised learning technique used to predict category membership
- Naive Bayes, LDA, and Logistic regression are powerful methods for building classification models
- Evaluation of classification models requires considering metrics such as precision, recall, AUC, and the confusion matrix
- Imbalanced datasets require specific techniques to address the rare class problem

This chapter has provided an essential foundation for understanding the core concepts and techniques used in classification, which form a cornerstone of many data science applications.