# <img style="float: left; padding-right: 10px; width: 45px" src="https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/iacs.png"> Predicting Hospital Readmission Rates for Diabetes
## CS109a: Introduction to Data Science



**Harvard University**<br/>
**Fall 2023**<br/>
**Team members**: Karim Gowani, Ryan McGillicuddy, Yaseen Mohmand, Steven Worthington


<hr style="height:2pt">

# Introduction

## i. About This Notebook

This notebook represents a summary of a series of exploratory analysis, wherein we attempt various approaches to classify X-ray images into 14 disease classes and a 'no finding' class. For an overview of the analytical sections of this notebook, please see the [Notebook Contents](#Notebook-Contents) index listed below this Introduction.

### i.i. A note about supporting notebooks

The code and results reported in this notebook are only a summary of the work completed for this project - they represent the final form of our analyses. Supplemental notebooks containing auxillary EDA, data cleaning, and model exploration illustrated in this report can be found in the **`notebooks/`** directory of the **[GitHub project repository](https://github.com/liujinjie111/chestXray)**. The notebooks in this repository are not designed to be run in any particular order to reproduce the results shown in the final report. They just show the extra work and experiments we have tried that are not shown in the final report.

## ii. Research Question

After initial exploration and cleaning of the data, we have focused our efforts on the following research question:

Which model architecture performs best for out-of-sample classification of the X-ray images into the 14 disease classes and 'no finding' class?

## iii. Summary of Findings

We found this project to be both interesting and quite challenging. We created an analysis pipeline using TF datasets that was efficient enough for us to experiment with many different modeling architectures in a short period of time. This encorporated downsampling of the majority image label classes to make modeling more tractable. We incorporated data augmentation steps into the model itself to use GPU, rather than CPU, cycles, thus reducing the computational burden and time cost of preprocessing, allowing us to devote more time to exploring different modelling approaches. 


<a name="Notebook-Contents"></a>
# Notebook Contents

[Introduction](#Introduction)

[Setup](#Setup)

**[1. Data Source and Composition](#1.-Data-Source-and-Composition)**

- [1.1. Data Source and Substantive Context](#1.1.-Data-Source-and-Substantive-Context)

- [1.2. Data Granularity](#1.2.-Data-Granularity)

- [1.3. Class Imbalance](#1.3.-Class-Imbalance)
 
- [1.4. Missing Observations](#1.4.-Missing-Observations)
  
**[2. Exploratory Data Analysis and Preprocessing](#2.-Exploratory-Data-Analysis-and-Preprocessing)**

- [2.1. Exploratory Data Analysis of Raw Data](#2.1.-Exploratory-Data-Analysis-of-Raw-Data)

- [2.2. Data Preprocessing](#2.2.-Data-Preprocessing)

- [2.3. Data Partitioning](#2.3.-Data-Partitioning)
 
- [2.4. Exploratory Data Analysis of Cleaned Data](#2.4.-Exploratory-Data-Analysis-of-Cleaned-Data)

- [2.5. Summary of EDA Key Findings](#2.5.-Summary-of-EDA-Key-Findings)

**[3. Research Questions](#3.-Research-Questions)**

**[4. Modeling Pipeline and Training](#4.-Modeling-Pipeline-and-Training)**

- [4.1. Candidate Models](#4.1.-Candidate-Models)

- [4.2. Hyperparameter Tuning Settings](#4.2.-Hyperparameter-Tuning-Settings)

- [4.3. Performance Metrics](#4.3.-Performance-Metrics)

- [4.4. Resampling Scheme](#4.4.-Resampling-Scheme)

- [4.5. Model Training](#4.5.-Model-Training)

**[5. Model Selection and Evaluation](#5.-Model-Selection-and-Evaluation)**

- [5.1. Model Selection](#5.1.-Model-Selection)

- [5.2. Best Model Performance](#5.2.-Best-Model-Performance)

- [5.3. Variable Importance](#5.3.-Variable-Importance)

**[6. Conclusions](#6.-Conclusions)**

- [6.1. Patient Early Readmittance Rate](#6.1.-Patient-Early-Readmittance-Rate)

- [6.2. Patient Risk Profiles](#6.2.-Patient-Risk-Profiles)

**[7. Future Work](#7.-Future-Work)**

**[8. References](#8.-References)**

# Setup

[Return to top](#Notebook-Contents)

The following sections include general setup code for:
1. Installing the necessary packages needed for data preparation, modeling, and visualization
2. Setting pseudo-random number seeds for reproducibility

### Install packages

In [None]:
# Import libraries
import os
import time
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer, KNNImputer
from sklearn.linear_model import LinearRegression, LogisticRegression, LogisticRegressionCV
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split, cross_validate
from sklearn.metrics import r2_score, confusion_matrix, classification_report, roc_curve
from sklearn.metrics import roc_auc_score, precision_recall_curve, average_precision_score
import warnings
warnings.filterwarnings("ignore")
plt.style.use('seaborn-notebook')
sns.set_style('darkgrid')
# pandas tricks for better display
pd.options.display.max_columns = 50  
pd.options.display.max_rows = 500     
pd.options.display.max_colwidth = 100
pd.options.display.precision = 3

### Set RNG seeds

In [None]:
# Ensure replicable results
import os
import random as rn
SEED = 109
tf.random.set_seed(SEED)
os.environ['PYTHONHASHSEED'] = '0'
os.environ['CUDA_VISIBLE_DEVICES'] = ''
tf.random.set_seed(SEED)
np.random.seed(SEED)
rn.seed(SEED)

<a name="1.-Data-Source-and-Composition"></a>
# 1. Data Source and Composition

[Return to top](#Notebook-Contents)

<a name="1.1.-Data-Source-and-Substantive-Context"></a>
## 1.1 Data Source and Substantive Context

[Return to top](#Notebook-Contents)

Our dataset is from the UC Irvine ML Repository and involves patient records of those diagnosed with diabetes from 1999 through 2008 at 130 US hospitals. We have downloaded this dataset and examined it. It has ~102K records, a binary target variable, and 47 features, the majority of which are categorical.

<a name="1.2.-Data-Granularity"></a>
## 1.2 Data Granularity

[Return to top](#Notebook-Contents)

In a clinical setting, doctors and medical staff would like to answer the question, “given information from the current and previous hospitalizations, how likely is it for this patient to be readmitted to hospital early (within 30 days)?”. This question is inherently at the patient-level, but each record in the dataset is at the level of an ‘encounter’, which represents a patient hospitalization event (rather than an outpatient visit). A subset of 16.5% of patients have multiple encounters.

A patient-level perspective is more likely to be of benefit to clinicians, since answering the above question will help medical personnel prioritize follow-ups and interventions through the creation of patient risk profiles, which can identify patients at the highest risk level for early hospital readmittance. This information is actionable and can be used to mitigate negative health outcomes for these patients as well as increased costs for the hospital and insurance carrier. We will therefore aggregate data from the encounter-level to the patient-level.

For those patients with multiple encounters, however, features that vary at the encounter-level contain important information that we do not wish to discard. For example, if a patient was readmitted early relative to the immediately preceding encounter, it is perhaps more likely that the patient will be readmitted to hospital early again after the current encounter. Therefore, our strategy will be to select only the final encounter for these patients and create several new derived features that encapsulate the history of their previous encounters. Such features will include, but are not limited to, the number of previous inpatient encounters, whether the last encounter resulted in early readmission, and whether the patient ever had a high value of A1c.

In following this approach, we will have to make a (reasonable) assumption that encounters for each patient are in temporal order in the dataset because no explicit date information is provided.

<a name="1.3.-Class-Imbalance"></a>
## 1.3 Class Imbalance

[Return to top](#Notebook-Contents)

About 11% of encounters belong to the positive class (readmitted within 30 days), so while there is imbalance, it is not severe. While our performance metric of interest - AUC - is robust to class imbalance, we will still try to address this issue in several ways. We will use stratified sampling in train/test splits, and we will attempt the standard techniques of undersampling and oversampling, as well as use of class weights built into the different ML models of interest, including Logistic Regression, CART, Random Forest, and XG Boost.

<a name="1.4.-Missing-Observations"></a>
## 1.4 Missing Observations

[Return to top](#Notebook-Contents)

There are only 7 (out of 47) relevant columns that contain missing values:
- weight is missing ~97% of its values, so this column can be safely dropped; no other numerical column has missing values.
- medical specialty is missing ~49% of its values, but may be relevant to the classification task, so we keep it and fill the missing values with ‘unknown’..
- payer code (insurance carrier) is missing ~40% of the values, but because it does not seem to be relevant to the target - it is also a candidate for being eliminated altogether.

The remaining columns have less than 3% values missing so they can be managed. They are all categorical, including race and diagnosis codes. As is common for categorical variables, we will fill the missing values with 'unknown'.

<a name="2.-Exploratory-Data-Analysis-and-Preprocessing"></a>
# 2. Exploratory Data Analysis and Preprocessing

[Return to top](#Notebook-Contents)

<a name="2.1.-Exploratory-Data-Analysis-of-Raw-Data"></a>
## 2.1 Exploratory Data Analysis of Raw Data

[Return to top](#Notebook-Contents)

### Raw Diabetes Data

<a name="2.2.-Data-Preprocessing"></a>
## 2.2 Data Preprocessing

[Return to top](#Notebook-Contents)

### Overall Strategy

Several of the categorical variables have many categories each that should be easily collapsed to reduce dimensionality:
- At the most extreme, the first diagnosis column has 716 codes, only 23 of which represent more than 1% of the observations; similarly for the second and third diagnosis variables. In fact, these diagnosis codes should be grouped into types such as Circulatory (codes 390-459), respiratory (codes 460-519), digestive (520-579), etc.
- Admission type code has 8 categories, 3 of which make up less than 1% of observations and can be safely collapsed.
- Medical specialty has 72 categories, but only 9 represent more than 1% of the observations.
- Age buckets can be consolidated: Currently, each bucket includes only 10 years. Less than 1% of
the observations fall into age < 20 and age > 90, for instance.

Furthermore, patients who were discharged with codes such as expired, hospice, transferred to another institution as inpatient, etc. should be filtered out as these types of discharge codes are of no practical relevance for predicting the target of early readmission. Trivially, encounter ID and patient ID are mere identifiers and should not be fed into any modeling.

### One-hot encode labels

**Findings**

There are a total of 112,120 unique image filenames in the meta-data.

**Findings**

The majority of patients (\~54%) have no evidence of disease, while \~28% have been diagnosed with a single disease, and \~19% have been diagnosed with multiple diseases.

**Findings**

While the majority of X-ray images show either no disease finding (\~60K) or a single disease (\~30K), there is a long right tail to the distribution. Some images have as many as 9 out of the possible 14 disease labels.


**Findings**

Occurence differs dramatically among different diseases. Several diseases (e.g., hernia, pneumonia, fibrosis) have only a few hundred occurences in the X-ray images, while others (e.g., infiltration, effusion, atelectasis) have over 10,000 occurences. This means that the data exhibit extreme class imbalance.

<a name="2.3.-Data-Partitioning"></a>
## 2.3 Data Partitioning

[Return to top](#Notebook-Contents)

<a name="2.4.-Exploratory-Data-Analysis-of-Cleaned-Data"></a>
## 2.4 Exploratory Data Analysis of Cleaned Data

[Return to top](#Notebook-Contents)

### Disease occurence

### Disease distribution

### Commonly occurring diseases

### Disease correlations (co-morbidity)

<a name="2.5.-Summary-of-EDA-Key-Findings"></a>
## 2.5 Summary of EDA Key Findings

[Return to top](#Notebook-Contents)

After exploring the image data and the disease class labels, we have identified 5 major issues that will need to be addressed during data pre-processing and analysis.

1. Placeholder1
2. Placeholder2
3. Placeholder2

<a name="3.-Research-Questions"></a>
# 3. Research Questions

[Return to top](#Notebook-Contents)

After initial exploration and cleaning of the data, we have focused our efforts on answering the following research questions:

1. **How likely are patients to be readmitted to hospital within 30 days of discharge?**

2. **What risk factors drive early readmittance (within 30 days of discharge) to hospital?**

Answering these questions will help medical personnel prioritize follow-ups and interventions through the creation of patient risk profiles, which can identify patients at the highest risk level for early hospital readmittance. This information is actionable and can be used to mitigate negative health outcomes for these patients as well as increased costs for the hospital and insurance carrier.

<a name="3.1.-Raw-Data"></a>
## 3.1 Raw Data

[Return to top](#Notebook-Contents)

Here we load the images from the train, validation, and test sets into 3 separate TF datasets.

<a name="3.2.-Training-Data"></a>
## 3.2 Training Data

[Return to top](#Notebook-Contents)

### Select one-hot encoded labels for train, validation, and test sets

Here we subset the one-hot encoded labels into the same train, validation, and test sets as the images, using the image filenames as an index.

### Load one-hot encoded labels into TF datasets

Here we load the partitioned one-hot encoded label data into 3 TF datasets.

<a name="3.3.-Testing-Data"></a>
## 3.3 Testing Data

[Return to top](#Notebook-Contents)

### Combine images and labels

Here we combine the image and label TF datasets, by zipping them together to form 3 new TF datasets with both image and label information.

### Batching & prefetching

Here we set up the batched TF datasets with prefetching that is autotuned. We used fairly small batch sizes to reduce memory demands and allow us to train more complex models with a larger sample of data. We shuffled the data for the training set to prevent the model from learning image order information, but not the validation or test sets. 

<a name="4.-Modeling-Pipeline-and-Training"></a>
# 4. Modeling Pipeline and Training

[Return to top](#Notebook-Contents)

<a name="4.1.-Candidate-Models"></a>
## 4.1 Candidate Models

[Return to top](#Notebook-Contents)

We have four candidate models:

1. Logistic regression with L1 regularization
2. Single Decision Tree
3. Random Forest
4. Extreme Gradient Boosting

<a name="4.2.-Hyperparamter-Tuning-Settings"></a>
## 4.2 Hyperparameter Tuning Settings

[Return to top](#Notebook-Contents)

<a name="4.3.-Performance-Metrics"></a>
## 4.3 Performance Metrics

[Return to top](#Notebook-Contents)

<a name="4.4.-Resampling-Scheme"></a>
## 4.4 Resampling Scheme

[Return to top](#Notebook-Contents)

Repeated $k$-fold CV.

<a name="4.5.-Model-Training"></a>
## 4.5 Model Training

[Return to top](#Notebook-Contents)

<a name="5.-Model-Selection-and-Evaluation"></a>
# 5. Model Selection and Evaluation

[Return to top](#Notebook-Contents)

<a name="5.1.-Model-Selection"></a>
## 5.1 Model Selection

[Return to top](#Notebook-Contents)

<a name="5.2.-Best-Model-Performance"></a>
## 5.2 Best Model Performance

[Return to top](#Notebook-Contents)

We generate predicted probabilities of class membership for all 14 diseases and the 'no finding' class on the test set. We then calculate the following metrics:
1. Accuracy
2. Prevalence  
3. Sensitivity
4. Specificity
5. Positive Predictive Value      
6. Negative Predictive Value  
7. Area Under ROC 
8. AP Score  
9. F1 Score

Among these metrics, only Area Under ROC, AP Score, and F1 Score, do not depend on the choice of a threshold. Also, Accuracy, Prevalence, Positive Predictive Value, and Negative Predictive Value are not appropriate for evaluating class imbalance dataset, while Sensitivity, Specificity, Area Under ROC, AP Score, and F1 Score are less sensitive to class imbalance. Therefore, we have focused more on the metrics of Area Under ROC, AP Score, and F1 Score, since they are not subject to the arbitrary choice of a threshold value and are more informative performance metrics when significant class imbalance is present in the data.

In [None]:
# Function to calculate all performance metrics

def get_performance_metrics(y_df, pred_df, th, class_name):
  y = y_df[class_name].values
  pred = pred_df[class_name].values

  def true_positives(y, pred, th):
    TP = 0
    # get thresholded predictions
    thresholded_preds = pred >= th
    # compute TP
    TP = np.sum((y == 1) & (thresholded_preds == 1))
    return TP

  def true_negatives(y, pred, th):
    TN = 0
    # get thresholded predictions
    thresholded_preds = pred >= th
    # compute TN
    TN = np.sum((y == 0) & (thresholded_preds == 0))
    return TN

  def false_positives(y, pred, th):
    FP = 0
    # get thresholded predictions
    thresholded_preds = pred >= th
    # compute FP
    FP = np.sum((y == 0) & (thresholded_preds == 1)) 
    return FP

  def false_negatives(y, pred, th):
    FN = 0
    # get thresholded predictions
    thresholded_preds = pred >= th    
    # compute FN
    FN = np.sum((y == 1) & (thresholded_preds == 0))  
    return FN

  def get_accuracy(y, pred, th):
    accuracy = 0.0
    # get TP, FP, TN, FN using our previously defined functions
    TP = true_positives(y, pred, th)
    TN = true_negatives(y, pred, th)
    FP = false_positives(y, pred, th)
    FN = false_negatives(y, pred, th)
    # Compute accuracy using TP, FP, TN, FN
    accuracy = (TP + TN) / (TP + TN + FP + FN)
    return accuracy

  def get_prevalence(y):
    prevalence = 0.0
    prevalence = np.sum(y == 1) / len(y) 
    return prevalence

  def get_sensitivity(y, pred, th):
    sensitivity = 0.0
    # get TP and FN using our previously defined functions
    TP = true_positives(y, pred, th)
    FN = false_negatives(y, pred, th)
    # use TP and FN to compute sensitivity
    sensitivity = TP / (TP + FN)
    return sensitivity

  def get_specificity(y, pred, th):
    specificity = 0.0
    # get TN and FP using our previously defined functions
    TN = true_negatives(y, pred, th)
    FP = false_positives(y, pred, th)
    # use TN and FP to compute specificity 
    specificity = TN / (TN + FP)
    return specificity

  def get_ppv(y, pred, th):
    PPV = 0.0
    # get TP and FP using our previously defined functions
    TP = true_positives(y, pred, th)
    FP = false_positives(y, pred, th)
    # use TP and FP to compute PPV
    PPV = TP / (TP + FP)
    return PPV

  def get_npv(y, pred, th):
    NPV = 0.0
    # get TN and FN using our previously defined functions
    TN = true_negatives(y, pred, th)
    FN = false_negatives(y, pred, th)
    # use TN and FN to compute NPV
    NPV = TN / (TN + FN)
    return NPV

  def get_roc_auc(y, pred):
    return roc_auc_score(y, pred)

  def print_confidence_intervals(class_name, statistics):
    print("CI for disease {} roc_auc_score:".format(class_name))
    print(np.percentile(statistics, 2.5), np.percentile(statistics, 97.5))

  def bootstrap_auc(y, pred, class_name, bootstraps, fold_size):
    statistics = np.zeros((1, bootstraps))
    df = pd.DataFrame(columns=['y', 'pred'])
    df.loc[:, 'y'] = y_df[class_name]
    df.loc[:, 'pred'] = pred_df[class_name]
    # get positive examples for stratified sampling
    df_pos = df[df.y == 1]
    df_neg = df[df.y == 0]
    prevalence = len(df_pos) / len(df)
    for i in range(bootstraps):
      # stratified sampling of positive and negative examples
      pos_sample = df_pos.sample(n = int(fold_size * prevalence), replace=True)
      neg_sample = df_neg.sample(n = int(fold_size * (1-prevalence)), replace=True)

      y_sample = np.concatenate([pos_sample.y.values, neg_sample.y.values])
      pred_sample = np.concatenate([pos_sample.pred.values, neg_sample.pred.values])
      score = roc_auc_score(y_sample, pred_sample)
      statistics[0][i] = score
    print_confidence_intervals(class_name, statistics)
    
  def get_ap_score(y, pred):
    return average_precision_score(y, pred)

  def get_f1(y, pred):
    precision = get_ppv(y, pred, th)
    recall = get_sensitivity(y, pred, th)
    F1 = 2 * (precision * recall) / (precision + recall)
    return F1

  accuracy = get_accuracy(y, pred, th)
  prevalence = get_prevalence(y)
  sensitivity = get_sensitivity(y, pred, th)
  specificity = get_specificity(y, pred, th)
  ppv = get_ppv(y, pred, th)
  npv = get_npv(y, pred, th)
  roc_auc = get_roc_auc(y, pred)
  #bootstrap_auc(y, pred, class_name, bootstraps = 100, fold_size = len(y)) # PROBLEM
  ap_score = get_ap_score(y, pred)   
  f1_score = get_f1(y, pred)

  return accuracy, prevalence, sensitivity, specificity, ppv, npv, roc_auc, ap_score, f1_score   

In [None]:
# Iterate over class labels and calculate metrics

threshold = 0.5

metrics_dict = dict()
class_list = list()
accuracy_list = list()
prevalence_list = list() 
sensitivity_list = list() 
specificity_list = list() 
ppv_list = list() 
npv_list = list() 
roc_auc_list = list() 
ap_score_list = list() 
f1_score_list = list()

for class_label in class_labels:
  class_list.append(class_label)
  metrics_dict['class'] = class_list
  metrics_output = get_performance_metrics(y_df=correct_labels_df, pred_df=pred_df, th=threshold, class_name=class_label)
  accuracy, prevalence, sensitivity, specificity, ppv, npv, roc_auc, ap_score, f1_score = metrics_output
  accuracy_list.append(accuracy)
  metrics_dict['accuracy'] = accuracy_list
  prevalence_list.append(prevalence)
  metrics_dict['prevalence'] = prevalence_list
  sensitivity_list.append(sensitivity)
  metrics_dict['sensitivity'] = sensitivity_list
  specificity_list.append(specificity)
  metrics_dict['specificity'] = specificity_list
  ppv_list.append(ppv)
  metrics_dict['ppv'] = ppv_list
  npv_list.append(npv)
  metrics_dict['npv'] = npv_list
  roc_auc_list.append(roc_auc)
  metrics_dict['roc_auc'] = roc_auc_list
  ap_score_list.append(ap_score)
  metrics_dict['ap_score'] = ap_score_list
  f1_score_list.append(f1_score)
  metrics_dict['f1_score'] = f1_score_list

In [None]:
# Summary metrics each class label on test set

metrics_df = pd.DataFrame.from_dict(metrics_dict)
print(metrics_df.round(2))

                 class  accuracy  prevalence  sensitivity  specificity  ppv  \
0            Emphysema      0.92        0.08          0.0          1.0  NaN   
1           No Finding      0.86        0.14          0.0          1.0  NaN   
2            Pneumonia      0.96        0.04          0.0          1.0  NaN   
3         Infiltration      0.71        0.29          0.0          1.0  NaN   
4               Nodule      0.89        0.11          0.0          1.0  NaN   
5                Edema      0.94        0.06          0.0          1.0  NaN   
6        Consolidation      0.87        0.13          0.0          1.0  NaN   
7          Atelectasis      0.81        0.19          0.0          1.0  NaN   
8         Pneumothorax      0.84        0.16          0.0          1.0  NaN   
9                 Mass      0.88        0.12          0.0          1.0  NaN   
10  Pleural_Thickening      0.92        0.08          0.0          1.0  NaN   
11            Effusion      0.75        0.25        

In [None]:
# AUROC for all classes

plt.figure(figsize=(12, 12))

for class_label in class_labels:
  y = correct_labels_df[class_label].values
  pred = pred_df[class_label].values
  fpr, tpr, thresholds = roc_curve(y, pred)
  auc = roc_auc_score(y, pred)
  plt.plot(fpr, tpr, label='%s ROC (area = %0.2f)' % (class_label, auc))
# Custom settings for the plot 
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('1-Specificity (False Positive Rate)')
plt.ylabel('Sensitivity (True Positive Rate)')
plt.title('Area Under Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show() 

In [None]:
# Precision Recall Curve for all classes

plt.figure(figsize=(12, 12))

for class_label in class_labels:
  y = correct_labels_df[class_label].values
  pred = pred_df[class_label].values
  p, r, t = precision_recall_curve(y, pred)
  ap_score = average_precision_score(y, pred)
  plt.plot(r, p, label='%s PRC (ap = %0.2f)' % (class_label, ap_score))
# Custom settings for the plot 
plt.plot([0, 1], [0, 1],'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision Recall Curve')
plt.legend(loc="lower right")
plt.show()  

<a name="5.3.-Variable-Importance"></a>
## 5.3 Variable Importance

[Return to top](#Notebook-Contents)

<a name="6.-Conclusions"></a>
# 6. Conclusions

[Return to top](#Notebook-Contents)

We found this project to be both interesting and quite challenging. Blah blah blah.

Perhaps the three most perplexing issues we encountered were:

1. placeholder1 

2. placeholder2

3. placeholder3

<a name="6.1.-Patient-Early-Readmittance-Rate"></a>
## 6.1 Patient Early Readmittance Rate

[Return to top](#Notebook-Contents)

<a name="6.2.-Patient-Risk-Profiles"></a>
## 6.2 Patient Risk Profiles

[Return to top](#Notebook-Contents)

<a name="7.-Future-Work"></a>
# 7. Future Work

[Return to top](#Notebook-Contents)

In our analysis of the Chest X-ray dataset we have tried a variety of modeling and feature engineering approaches, but there are still several additional steps that could be taken:

1. placeholder1

2. placeholder2

3. placeholder2

<a name="8.-References"></a>
# 8. References

[Return to top](#Notebook-Contents)

**The following are links to papers, blogs, and tutorials we found useful during the development of this project:**

Fine-tuning for transfer learning models:
https://keras.io/guides/transfer_learning/

Medical neural networks:
https://glassboxmedicine.com/

Image classification using CNNs:
https://towardsdatascience.com/medical-x-ray-%EF%B8%8F-image-classification-using-convolutional-neural-network-9a6d33b1c2a

Comparison of ResNet50 and VGG19 and training from stratch for X-ray images dataset:
https://www.sciencedirect.com/science/article/pii/S2666285X21000558

Tensorflow Applications for base model:
https://keras.io/api/applications/

Tensorboard confusion matrix:
https://towardsdatascience.com/exploring-confusion-matrix-evolution-on-tensorboard-e66b39f4ac12

Pre-processing and modeling pipelines (ResNet50):
https://towardsdatascience.com/time-to-choose-tensorflow-data-over-imagedatagenerator-215e594f2435

Image data input pipelines:
https://towardsdatascience.com/what-is-the-best-input-pipeline-to-train-image-classification-models-with-tf-keras-eb3fe26d3cc5

Split TF datasets:
https://towardsdatascience.com/how-to-split-a-tensorflow-dataset-into-train-validation-and-test-sets-526c8dd29438

Transfer learning with EfficientNet:
https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/

Training greyscale images using transfer learning:
https://stackoverflow.com/questions/51995977/how-can-i-use-a-pre-trained-neural-network-with-grayscale-images

Multi-label vs multi-class classification:
https://glassboxmedicine.com/2019/05/26/classification-sigmoid-vs-softmax/

Multi-label classification example use-case:
https://towardsdatascience.com/fast-ai-season-1-episode-3-a-case-of-multi-label-classification-a4a90672a889

Element-wise sigmoid:
https://www.programcreek.com/python/example/93769/keras.backend.sigmoid

Element-wise sigmoid:
https://stackoverflow.com/questions/52090857/how-to-apply-sigmoid-function-for-each-outputs-in-keras

DenseNet121:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8189817/