Add `BinaryClassifierPrecisionEfficacy` metric

### Problem Description

There are many ways and reasons to perform data augmentation with synthetic data for the purposes for building ML models. While we have some ML Efficacy metrics in beta, we'd like to create a suite of metrics that more effectively cover the use case. The `BinaryClassifierPrecisionEfficacy` metric will specifically measure if synthetic data improves the precision of a binary classifier.

### Expected behavior
This metric should be defined in the `data_augmentation` sub-module inside `single_table`.
```python
from sdmetrics.single_table.data_augmentation import BinaryClassifierPrecisionEfficacy

BinaryClassifierPrecisionEfficacy.compute_breakdown(
  real_training_data=real_df,
  synthetic_data=synthetic_df,
  real_validation_data=real_holdout_df,
  metadata=single_table_metadata_dict,
  prediction_column_name='covid_status',
  minority_class_label=1,
  classifier='XGBoost',
  fixed_recall_value=0.9
)
```

#### `compute_breakdown`
**API**
- Args
  - `real_training_data (pd.DataFrame)` - A dataframe containing the real data that was used for training the synthesizer. The metric will use this data for training a Binary Classification model.
  - `synthetic_data (pd.DataFrame)` - A dataframe containing the synthetic data sampled from the synthesizer. The metric will use this data for training a Binary Classification model. 
  - `real_validation_data (pd.DataFrame)` - A dataframe containing a holdout set of real data. This data should not have been used to train the synthesizer. This data will be used to evaluate a Binary Classification model. 
  - `metadata (dict)` - The metadata dictionary describing the table of data.
  - `prediction_column_name (str)` - The name of the column to be predicted. The column should be a categorical or boolean column.
  - `minority_class_label [str/int/float]` - The value in the prediction column that should be considered a positive result, from the perspective of Binary Classfication. All other values in the column will be considered negative results. 
  - `classifier [str, optional]` - The ML algorithm to use when building a Binary Classfication. Supported options are 'XGBoost'. Defaults to 'XGBoost'.
    - Note: as an MVP, we will only support XGBoost. Future feature requests may add support for additional algorithms. 
  - `fixed_recall_value [float, optional]` - A float in the range (0, 1.0) describing the value to fix for the recall when building the Binary Classification model. Defaults to 0.9.
- Returns
  - A dictionary of the breakdown of the score, with the following information:
    - The score for the metric. This is the improvement precision score (from baseline -> augmented data) in percentage points, `score = MIN(0, augmented_precision_score - baseline_precision_score))`. 
    - The parameters used to run the metric
    - For each of the augmented data and the real data baseline:
      - The recall score achieved during training. This should be at least the requested score input as a parameter, but may not be exactly equal.
      - The actual recall score achieved on the validation (holdout) set.
      - The precision score achieved on the validation set.
      - The prediction counts achieved on the validation set (true positive, false positive, true negative, and false negative).
  - Expected dictionary output:  
  ```python
  {
    'score': 0.86,
    'augmented_data': {
      'recall_score_training': 0.950,
      'recall_score_validation': 0.912
      'precision_score_validation': 0.84,
      'prediction_counts_validation': {
        'true_positive': 21,
        'false_positive': 4,
        'true_negative': 73,
        'false_negative': 3
      },
    },
    'real_data_baseline': {
      # keys are the same as the 'augmented_data' dictionary
     },
    'parameters': {
      'prediction_column_name': 'covid_status',
      'minority_class_label': 1,
      'classifier': 'XGBoost',
      'fixed_recall_value': 0.9
    }
  }
  ```
**Algorithm**
1. Concatanate the `real_training_data` and `synthetic_data together`
2. Train a binary classification model on the data, using the classifier algorithm selected (default: XGBoost)
  a) Need to pre-process the data to turn discrete columns into continuous columns (note that we cannot use RDT, and should use scikit learn methods instead)
  b) Data pre-processing to convert the `prediction_columm` into a boolean column with the correct positive/negative values
    - If multi-class, consider only the `minority_class_label` as positive values. All other values will be considered negative.
3. Based on the parameters, fix the recall for the minority class
  a)  This will require finding the right threshold to achieve as close of the fixed recall as specified. The classifier will return a continuous value for each data point in training data and we would have to find the threshold that will achieve the value closest to the fixed rate. **Note that we should always choose a threshold that is as close as possible to the requested recall value but never less than it. That is to say, ensure that the training set recall is >= the requested recall value.**
  b) Save this threshold to use on the validation data. This threshold is now a learnt parameter alongside the classifier
4. Take the classifier and apply it on the `real_validation_data`. Compute the statistics that we want to return.
5. Calculate the baseline. Repeat steps 1-4 but this time, only use the `real_training_data` (do not concatenate `synthetic_data`).

#### `compute`
The `compute` method should take the same arguments as the `compute_breakdown` method.

The `compute` method should return just the overall `score` parameter calculated by `compute_breakdown`.


### Additional context
See [this doc](https://docs.google.com/document/d/1HS1bJRwjbp8XA58E54YOR9zTUBUyGBmABXgDq8VSl9g/edit?usp=sharing)

There will be significant overlap of required pre-processing/helper functions between data augmentation metrics. When possible, general functionality should be abstracted into utility functions that can be reused across many metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `BinaryClassifierPrecisionEfficacy` metric #711

Problem Description

Expected behavior

`compute_breakdown`

`compute`

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add BinaryClassifierPrecisionEfficacy metric #711

Description

Problem Description

Expected behavior

compute_breakdown

compute

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Add `BinaryClassifierPrecisionEfficacy` metric #711

`compute_breakdown`

`compute`