# Find Best Threshold

## Introduction
This notebook demonstrates the use of the `find_best_threshold` function, which is designed to determine the optimal threshold for classification problems. The function takes lists of thresholds, true positives (TP), false negatives (FN), true negatives (TN), and false positives (FP) as inputs.

The primary goal of the function is to find the best threshold that maximizes recall (greater than or equal to 0.9) while also maximizing precision when possible. The function also includes mechanisms to handle various edge cases, such as:

- All inputs being zero
- Valid inputs that yield recall below the required threshold
- Handling combinations of true positives and false negatives that affect the calculation

By the end of this notebook, users will have a clear understanding of how to use the function, what to expect in different scenarios, and how the function handles corner cases.



In [1]:
from modul_1.functions import find_best_threshold

## Function Description

### `find_best_threshold`

The `find_best_threshold` function is designed to determine the optimal threshold for classification problems by evaluating the relationship between true positives, false negatives, true negatives, and false positives.

### Parameters

- **thresholds (list)**: A list of threshold values (float) to evaluate.
- **tp (list)**: A list of true positives corresponding to each threshold.
- **fn (list)**: A list of false negatives corresponding to each threshold.
- **tn (list, optional)**: A list of true negatives corresponding to each threshold. Default is `None`.
- **fp (list, optional)**: A list of false positives corresponding to each threshold. Default is `None`.

### Returns

- **float or None**: The function returns the best threshold that yields recall greater than or equal to 0.9 and maximizes precision (if FP is provided). If no threshold meets the condition, the function returns `None`.

### Raises

- **ValueError**: The function raises a ValueError if any of the following conditions are met:
  - Mandatory input lists (thresholds, TP, FN) are empty.
  - Any input value is negative.
  - Input lists are of different lengths.

### Behavior

The function evaluates each threshold and calculates recall based on the provided TP and FN values. If both TP and FN are zero for a threshold, a warning is issued, and that threshold is skipped. The function also computes precision if FP and TN values are provided and updates the best threshold accordingly.

This function is robust against various edge cases, ensuring that users receive appropriate warnings when inputs are not valid for threshold calculations.


## Example Cases

### 1. Normal Case

In this example, we will demonstrate the typical usage of the `find_best_threshold` function with valid input data. This case represents a standard scenario where all input lists are filled with appropriate values.

#### Example:

We will use the following data:
- **Thresholds**: A list of potential thresholds for evaluation.
- **True Positives (TP)**: The count of true positive predictions for each threshold.
- **False Negatives (FN)**: The count of false negative predictions for each threshold.
- **True Negatives (TN)**: The count of true negative predictions for each threshold.
- **False Positives (FP)**: The count of false positive predictions for each threshold.

Here’s the sample data we will use:



In [2]:
thresholds = [0.1, 0.2, 0.3, 0.4, 0.5]
tp = [50, 48, 45, 40, 35]
tn = [30, 32, 33, 40, 50]
fp = [10, 12, 14, 18, 20]
fn = [5, 6, 7, 8, 9]

best_threshold = find_best_threshold(thresholds, tp, fn)
print(f"Best Threshold: {best_threshold}")


Best Threshold: 0.1


#### Expected Output:
For the given input data, the function should return a threshold value based on the calculated recall and precision. The expected output is:
Best Threshold: 0.1

This output indicates that the threshold of 0.1 yields the optimal balance of recall and precision based on the provided data.

## Corner Cases

### 1. All Inputs are Zero

This case tests the function's behavior when all inputs for true positives, false negatives, true negatives, and false positives are zero. 

#### Example:


In [3]:
thresholds = [0.1, 0.2]
tp = [0, 0]
fn = [0, 0]
tn = [0, 0]
fp = [0, 0]

best_threshold = find_best_threshold(thresholds, tp, fn)
print(f"Best Threshold: {best_threshold}")


Best Threshold: None




#### Expected Behavior:
A warning should be raised indicating that both true positives and false negatives are zero, and the function should return None.

### 2. Valid Inputs with Recall Below 0.9
This case tests the function's behavior when valid true positive and false negative counts yield a recall below 0.9.

#### Example:

In [4]:
thresholds = [0.1, 0.2, 0.3]
tp = [5, 2, 1]
fn = [10, 8, 9]
tn = [30, 32, 33]
fp = [5, 6, 7]

best_threshold = find_best_threshold(thresholds, tp, fn, tn, fp)
print(f"Best Threshold: {best_threshold}")


Best Threshold: None


#### Expected Behavior:
The function should return None, as none of the thresholds meet the recall requirement.

### 3. Mixed Values
This case tests how the function handles a mix of valid and invalid thresholds, including scenarios where TP or FN is zero.

#### Example:

In [5]:
thresholds = [0.1, 0.2, 0.3]
tp = [0, 20, 30]
fn = [0, 2, 3]
tn = [5, 6, 7]
fp = [0, 0, 0]

best_threshold = find_best_threshold(thresholds, tp, fn)
print(f"Best Threshold: {best_threshold}")

Best Threshold: 0.2


#### Expected Behavior:
The function should ignore the cases where both TP and FN are zero and return the best valid threshold based on the remaining data.

### 4. Missing Optional Inputs
This case tests the function's behavior when the optional inputs (TN and FP) are not provided.

#### Example:

In [6]:
thresholds = [0.1, 0.2, 0.3]
tp = [50, 48, 45]
fn = [5, 6, 7]

best_threshold = find_best_threshold(thresholds, tp, fn)
print(f"Best Threshold: {best_threshold}")


Best Threshold: 0.1


#### Expected Behavior:
In this scenario, the function will evaluate the provided thresholds based solely on the true positives (TP) and false negatives (FN) values. Since true negatives (TN) and false positives (FP) are not provided, the function will skip any precision calculations that require these inputs. However, it can still compute the recall for each threshold using the available TP and FN.

### How the Function Processes the Inputs

1. **Calculate Recall**: For each threshold, recall is computed using the formula:

   \[
   \text{Recall} = \frac{TP}{TP + FN}
   \]

2. **Check Recall Requirement**: The function checks if the recall is greater than or equal to 0.9. If it meets this condition, the threshold is added to the `valid_thresholds` list.

3. **Choosing the Best Threshold**:
   - If multiple valid thresholds exist in the `valid_thresholds` list and precision cannot be calculated due to missing TN and FP, the function defaults to choosing the minimum valid threshold.

4. **Rationale for Choosing Minimum Threshold**: Selecting the minimum threshold aligns with the idea that a lower threshold may help increase true positives (TP) while minimizing false negatives (FN). This is especially important in classification problems where catching as many positive instances as possible is prioritized.

If no valid thresholds meet the criteria, the function will return `None`, indicating that it was unable to find an appropriate threshold based on the provided data.


### 5. Edge Case with Missing True Negatives (TN)
This case examines how the function handles the situation when true negatives are missing.

#### Example:

In [7]:
thresholds = [0.1, 0.2, 0.3]
tp = [20, 15, 10]
fn = [0, 2, 3]
# No true negatives provided
fp = [5, 10, 15]

best_threshold = find_best_threshold(thresholds, tp, fn, fp=fp)
print(f"Best Threshold: {best_threshold}")

Best Threshold: 0.1


#### Expected Behavior:
Even with missing TN, the function should still be able to compute recall and provide a valid threshold if the conditions are met.

## Summary

In this notebook, we explored the functionality of the `find_best_threshold` function, which determines the optimal threshold for classification problems by evaluating true positives, false negatives, true negatives, and false positives. 

### Key Points:
- The function is designed to maximize recall while also considering precision when applicable.
- It handles various edge cases, such as:
  - All inputs being zero.
  - Valid inputs resulting in recall below the required threshold.
  - Combinations of TP and FN where both are zero.
  - Missing optional inputs (TN and FP).

---

### Conclusions:
- The implementation of the function effectively addresses potential edge cases, ensuring robustness and flexibility in its application.
- Users can leverage this function to determine the best threshold for their classification problems, improving model performance based on recall and precision considerations.

