# <span>Binary Classifier Notebook</span>

## 30 Day Readmission Risk for Patients with Diabetes

The main objective of this notebook is to provide an example on how to utilize the seismometer package to analyze a binary classification predictive model.

Here we consider a binary classification model trained on the [Diabetes Dataset](https://archive.ics.uci.edu/dataset/296/diabetes+130-us+hospitals+for+years+1999-2008). This model predicts the risk of readmission within 30 days for patients with diabetes. The model used is a simple LightGBM model and used only to provide an example on how the seismometer package could be set up/utilized.

We have already prepared the data by performing basic preprocessing on the Diabetes Dataset. The prepared data is used in training the model and model performance analysis.

## Usage
Explore data from your organization's model including predictions, outcomes, interventions, and sensitive cohorts. Here we use the diabetes data and a LightGBM model.
Use ```sm.show_info()``` to explore what is available.

In [None]:
# Download dataset
import urllib.request
DATASET_SOURCE = "https://raw.githubusercontent.com/epic-open-source/seismometer-data/main"
_ = urllib.request.urlretrieve(f"{DATASET_SOURCE}/diabetes/predictions.parquet", "data/predictions.parquet")
_ = urllib.request.urlretrieve(f"{DATASET_SOURCE}/diabetes/events.parquet", "data/events.parquet")
_ = urllib.request.urlretrieve(f"{DATASET_SOURCE}/diabetes/data_dictionary.yml", "dictionary.yml")

In [None]:
%matplotlib inline
import seismometer as sm
sm.run_startup(config_path='.')

In [None]:
sm.show_info(plot_help=True)

## Overview

#### ℹ Info

We are using a basic LightGBM model trained on [Diabetes Dataset](https://archive.ics.uci.edu/dataset/296/diabetes+130-us+hospitals+for+years+1999-2008) to predict if a diabetes patient will be readmitted in the 30 days time window after discharge. The first step is to provide the required information which includes configuration files, predictions data, and events data (interventions, outcomes, or target events). Datasets should be stored in the _parquet_ format. 

The _seismometer_ package pulls configuration from the _config.yml_ file. This file stores:  
<ol>
<li>the filepath to the predictions dataframe in parquet format,</li>
<li>the filepath to the events dataframe in parquet format,</li>
<li>the filepath to usage configuration that describes how to interpret data during run,</li>
<li>the filepath to events definitions, that specify events,</li>
<li>the filepath to predictions definitions, that specify cohorts, scores, and features to consider.</li>
</ol>

We have created: 
<ol>
<li>the predictions dataset, where each row is a patient/encounter and columns are input features, a patient identifier, the time of the prediction, and a score column corresponding to the output of the trained LightGBM model,</li>
<li>the events dataset, where each row corresponds to a target, intervention, or outcome. Here we have only defined one event for the model: the target (<i>y</i>) of the train. The dataset also includes the patient identifier, the time of the event, the type of the event (relevant when there are multiple events) and the events value (in this example, a 1 indicates a readmission occurred within 30 days)</li>
<li>through data_usage node in usage_config.yml, we have specified:
<ol>
<li>age, race and gender as the analysis cohort attributes,</li>
<li>the LightGBM model as the primary output (score),</li>
<li>30 days readmission (<i>readmitted</i> column) event as the primary target,</li>
<li>admission_type_id, num_medications and num_procedures as the only extra features to consider in feature analysis.</li>
</ol>
</li>
</ol>

**Tips:** 

- Run the cell below to create an `ipywidget` selector. Selectors are linked across notebook cells and will dynamically update visualizations and reports throughout the notebook. You can also call `sm.cohort_selector()` in a new notebook cell to create the same selector elsewhere.  
- Use the selector to stratify certain plots and reports across cohorts. By default the `age` cohort is selected, and _all_ age groups are selected.

#### Selection

You can specify the sensitive cohorts for a more detailed study in usage_config.yml via the _Cohorts_ keyword. As mentioned above, we consider three cohort attributes:
<ol>
<li>age: the age group of the patient. Age groups are [0,10), [10,20), [20,50), [50,70) and 70+.</li>
<li>race: the self-reported race of the patient. Race cohorts are 'Caucasian', 'AfricanAmerican', 'Hispanic', 'Asian', 'Other', 'Unknown'.</li>
<li>gender: the self-reported gender of the patient. Gender cohorts are 'Female', 'Male'.</li>
</ol>

In [None]:
sm.cohort_selector()

## Feature Monitor

#### ℹ Info

**Tips**: 

- This section provides insight into model inputs, demographics, and the set of interventions and outcomes. During early stages this will help validate configuration; afterwards, it will assist with detecting feature and population drift. Read through the alerts identified for your data, dig deeper using the feature, demographic, and event summaries, or by comparing across targets or demographics.
- **Other Warnings:** The variable profiles below will identify any concerning trends in feature distributions. Depending on the model, you may want to do additional configuration to silence these alerts until certain thresholds are met. 
- Run the `sm.*()` functions in the cells below to get a report for the corresponding dataset.

#### Reports

##### Feature Alerts
View automatically identified data quality issues for the model inputs in your dataset

In [None]:
sm.feature_alerts()

##### Feature Summary Statistics and Plots
View the summary statistics and distributions for the model inputs in your dataset. 

In [None]:
sm.feature_summary()

##### Summarize Features by Cohort Subgroup
Run `sm.cohort_comparison_report()`, select a cohort and two distinct sets of subgroups, then click "Generate Report" to get link to a breakdown of your features stratified by the different subgroups. 

In [None]:
sm.cohort_comparison_report()

##### Summarize Features by Target
Run `sm.target_feature_summary()` to get a link to a breakdown of your features stratified by the different target values. Run `sm.target_feature_summary(inline=True)` to show the report inline.

In this example, we have a single target of interest: the 'readmitted' column from the original dataset.

In [None]:
sm.target_feature_summary()

## Model Performance

#### Overall

####  ℹ Info

**Model Performance Plots**

1. (Top-left) The ROC curve, or receiver operating characteristic curve demonstrates a model's performance across all thresholds. The curve shows the sensitivity and specificity across all possible thresholds for the model. 
    - This plot can help you assess both in aggregate and at specific thresholds how often the model correctly identifies positive cases and negative cases. The AUROC or C-stat is the area under the ROC curve is a single measure of how well the model performs across thresholds.
2. (Top-middle) This curve plots the sensitivity and flag rate across all possible thresholds for the model. 
    - This plot can help you determine how frequently your model would trigger workflow interventions at different thresholds, and how many of those interventions would be taken for true positive cases. The highlighted area above the curve indicates how many true positives would be missed at this threshold.
3. (Top-right) The calibration curve is a measure of how reliable a model is in its predictions at a given threshold. 
    - This plots the observed rate (what proportion of cases at that threshold are true positives) against the model's predicted probability.
    - Points above the line `y=x` line indicate that a model is *overconfident* in its predictions (i.e., identifying more positive cases than exist), and points below the `y=x` line indicate that a model is *underconfident* in its predictions (i.e. identifying fewer positive cases than exist).
4. (Bottom-left) The PR curve, or precision-recall curve, shows the tradeoff between precision and recall for different threshold. 
    - The curve shows precision and recall across all possible thresholds for the model. This plot can help you assess the tradeoffs between identifying more positive cases, and correctly identifying positive cases. 
5. (Bottom-middle) This plot shows the sensitivity, specificity, and PPV across all possible thresholds for a model. 
    - This plot is useful for identifying thresholds where your model has high enough specificity, sensitivity, and PPV for your intended workflows.
6. (Bottom-right) This plot shows the predicted probabilities for entities in the dataset stratified by whether or not they met the target criteria. 
    - This plot can help you identify thresholds where your model will correctly identify enough of the true positives without identifying too many of the true negatives.

**Tips:**

 - Thresholds configured for the model are highlighted on the graphs.
 - Use `sm.model_evaluation()` to get model evaluation plots for your model at the observation level.
 - Use `sm.model_evaluation(per_context_id=True)` to get model evaluation plots for your model at the encounter level.

#### Visuals

In [None]:
sm.model_evaluation()

In [None]:
sm.model_evaluation(per_context_id=True)

## Fairness Audit
It is important to note that decisions on which fairness metrics are applicable to a model should be made with deep knowledge of how the model is used and then interpreted carefully.  Even when a metric is flagged as failing, there may be context that explains and even predicts the difference, so generally warrants some attention and not necessarily exposes a problem.

It is mathematically impossible to ensure parity across many definitions simultaneously, so focus on a predetermined set and an awareness of the others is an observed strategy.  
Like many concepts, a single parity concept can have several different names; notable parity of true positive rate is equal opportunity, parity of false positive rate is predictive equality, and parity of predictive prevalence is demographic parity.

An [Aequitas audit](https://dssg.github.io/aequitas/) gives an overview of parity across all defined groups for each cohort attribute.  By default, the majority group is the baseline and a statistic for all observations in the other groups is compared. A fairness threshold such as 125% is then used to classify the ratio of each group to the reference.  If any group performs differently, above (125% in our example) or below (80%) then it is considered a failure for that cohort+metric.  

The visualization is a table showing the overall pass/fail, an ordered list of circles representing the groups, and a bar representing the percentage of the population data in reference.  Note that comparison across columns is not always exact due to potential differences in the included observations from missing information. Hovering over a bar or dot will give details on the group and metric.


In [None]:
sm.fairness_audit(metric_list=['tpr', 'fpr', 'pprev', 'precision'], fairness_threshold=1.25)

## Cohort Analysis 

#### ℹ Info

**Model Performance Plots**:
This section has general distribution plots for cohorts, crosstabs, and the `sm.model_evaluation()` performance plots stratified by cohort.

1. (Top-left) The ROC curve, or receiver operating characteristic curve demonstrates a model's performance across all thresholds. The curve shows the sensitivity and specificity across all possible thresholds for the model. 
    - This plot can help you assess both in aggregate and at specific thresholds how often the model correctly identifies positive cases and negative cases. The AUROC or C-stat is the area under the ROC curve is a single measure of how well the model performs across thresholds.
2. (Top-middle) This curve plots the sensitivity and flag rate across all possible thresholds for the model. 
    - This plot can help you determine how frequently your model would trigger workflow interventions at different thresholds, and how many of those interventions would be taken for true positive cases. The highlighted area above the curve indicates how many true positives would be missed at this threshold.
3. (Top-right) The calibration curve is a measure of how reliable a model is in its predictions at a given threshold. 
    - This plots the observed rate (what proportion of cases at that threshold are true positives) against the model's predicted probability.
    - Points above the line `y=x` line indicate that a model is *overconfident* in its predictions (i.e., identifying more positive cases than exist), and points below the `y=x` line indicate that a model is *underconfident* in its predictions (i.e. identifying fewer positive cases than exist).
4. (Bottom-left) The PR curve, or precision-recall curve, shows the tradeoff between precision and recall for different threshold. 
    - The curve shows precision and recall across all possible thresholds for the model. This plot can help you assess the tradeoffs between identifying more positive cases, and correctly identifying positive cases. 
5. (Bottom-middle) This plot shows the sensitivity, specificity, and PPV across all possible thresholds for a model. 
    - This plot is useful for identifying thresholds where your model has high enough specificity, sensitivity, and PPV for your intended workflows.
6. (Bottom-right) This plot shows the predicted probabilities for entities in the dataset stratified by whether or not they met the target criteria. 
    - This plot can help you identify thresholds where your model will correctly identify enough of the true positives without identifying too many of the true negatives.

**Tips:**

 - Thresholds configured for the model are highlighted on the graphs.
 - Use `sm.cohort_evaluation()` to get cohort evaluation plots for your cohort at the observation level.
 - Use `sm.cohort_evaluation(per_context_id=True)` to get cohort evaluation plots for your cohort at the encounter level.

#### Visuals

In [None]:
sm.show_cohort_summaries()

In [None]:
sm.cohort_selector()

In [None]:
sm.cohort_evaluation()

In [None]:
sm.cohort_evaluation(per_context_id=True)

## Lead-time Analysis 

#### ℹ Info

Lead-time analysis is focused on revealing the amount of time that a high prediction gives before an event of interest.  These analyses implicitly restrict data to the positive cohort, as that is expected to be the place time the event occurs.
The visualization uses standard box-and-whisker plots where each quartile of the subpopulation has a vertical line, the inner box representing the inner quartiles with mean.   When the cohorts overlap significantly, this indicates the model is providing equal opportunity for action to be taken based on the scores across the cohort groups.

#### Visuals

In [None]:
lead=sm.plot_leadtime_enc()

## Add Your Own Analysis

You can also incorporate other packages to create your own analyses. Here, we use the seaborn package to create a heatmap of average score across different age groups and procedure counts.

`sm.Seismogram().dataframe` is a pandas DataFrame with merged predictions and events data.

In [None]:
import seaborn as sns 
import matplotlib.pyplot as plt 
import pandas as pd

sg = sm.Seismogram()
heat = sg.dataframe[["num_procedures","age","LGBM_score"]]
heat = pd.DataFrame(heat.groupby(["num_procedures","age"],observed=False)["LGBM_score"].agg('mean')).reset_index()
heat["num_procedures"] = heat["num_procedures"].astype('category').cat.reorder_categories([6,5,4,3,2,1,0])
hm = sns.heatmap(data = heat.pivot(index="num_procedures", columns="age", values="LGBM_score"))
  
# displaying the plotted heatmap 
plt.show()