# Assess patient hospital readmission within 30 days predictions on diabetes hospital data

This notebook demonstrates the use of the `responsibleai` API to assess a model trained on hospital data. It walks through the API calls necessary to create a widget with model analysis insights, then guides a visual analysis of the model.

* [Launch Responsible AI Toolbox](#Launch-Responsible-AI-Toolbox)
    * [Train a Model](#Train-a-Model)
    * [Create Model and Data Insights](#Create-Model-and-Data-Insights)

In [50]:
#!pip install -r libs/requirements.txt

## Launch Responsible AI Toolbox

The following section examines the code necessary to create datasets and a model. It then generates insights using the `responsibleai` API that can be visually analyzed.

### Train a Model
*The following section can be skipped. It loads a dataset and trains a model for illustrative purposes.*

In [52]:
import zipfile

from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer

import pandas as pd
from lightgbm import LGBMClassifier

First, load the hospital dataset and specify the different types of features. Compose a pipeline which contains a preprocessor and estimator.

In [53]:
train_data = pd.read_parquet('data/train_dataset.parquet')
test_data = pd.read_parquet('data/test_dataset.parquet')

In [54]:
train_data.head()

Unnamed: 0,race,gender,age,discharge_destination,admission_source,time_in_hospital,num_lab_procedures,num_procedures,num_medications,prior_outpatient,...,prior_inpatient,primary_diagnosis,number_diagnoses,max_glu_serum,A1Cresult,insulin,diabetes_Med_prescribe,readmit_status,medicare,medicaid
3572,Caucasian,Male,Over 60 years,Other,7,12,85,2,21,0,...,0,0.0,9,,Norm,Up,Yes,Not readmitted,False,False
543,Caucasian,Male,Over 60 years,Discharged to Home,1,3,42,0,17,0,...,1,0.0,8,,,Steady,Yes,Not readmitted,True,False
2074,Caucasian,Female,Over 60 years,Discharged to Home,7,7,48,1,20,0,...,0,0.0,9,,,No,No,Not readmitted,True,False
4190,Caucasian,Male,30-60 years,Discharged to Home,1,4,38,3,18,0,...,0,0.0,9,,,Down,Yes,Readmitted,False,True
42,Hispanic,Female,30-60 years,Other,7,1,50,0,8,0,...,0,0.0,9,,>8,No,No,Not readmitted,False,False


In [55]:
from raiutils.dataset import fetch_dataset
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression

def split_label(dataset, target_feature):
    X = dataset.drop([target_feature], axis=1)
    y = dataset[[target_feature]]
    return X, y

def create_classification_pipeline(X):
    pipe_cfg = {
        'num_cols': X.dtypes[X.dtypes == 'int64'].index.values.tolist(),
        'cat_cols': X.dtypes[X.dtypes == 'object'].index.values.tolist(),
    }
    num_pipe = Pipeline([
        ('num_imputer', SimpleImputer(strategy='median')),
        ('num_scaler', StandardScaler())
    ])
    cat_pipe = Pipeline([
        ('cat_imputer', SimpleImputer(strategy='constant', fill_value='?')),
        ('cat_encoder', OneHotEncoder(handle_unknown='ignore', sparse=False))
    ])
    feat_pipe = ColumnTransformer([
        ('num_pipe', num_pipe, pipe_cfg['num_cols']),
        ('cat_pipe', cat_pipe, pipe_cfg['cat_cols'])
    ])

    # Append classifier to preprocessing pipeline.
    # Now we have a full prediction pipeline.
    pipeline = Pipeline(steps=[('preprocessor', feat_pipe),
                               ('model', LGBMClassifier(random_state=0))])

    return pipeline


def get_categorical_numerical_data(dataset, target):
    dataset = dataset.drop([target], axis = 1)  
    categorical = []
    for col, value in dataset.iteritems():
        if value.dtype == 'object' or value.dtype == 'bool':
            categorical.append(col)
    return categorical


target_feature = 'readmit_status'

categorical_features = get_categorical_numerical_data(train_data, target_feature)

X_train_original, y_train = split_label(train_data, target_feature)
X_test_original, y_test = split_label(test_data, target_feature)

pipeline = create_classification_pipeline(X_train_original)

y_train = y_train[target_feature].to_numpy()
y_test = y_test[target_feature].to_numpy()


# Take 500 samples from the test data
test_data_sample = test_data.sample(n=500, random_state=5)

iteritems is deprecated and will be removed in a future version. Use .items instead.


Train the classification pipeline composed in the previous cell on the training data.

In [56]:
model = pipeline.fit(X_train_original, y_train)

### Create Model and Data Insights

In [57]:
from raiwidgets import ResponsibleAIDashboard
from responsibleai import RAIInsights

To use Responsible AI Toolbox, initialize a RAIInsights object upon which different components can be loaded.

RAIInsights accepts the model, the full dataset, the test dataset, the target feature string and the task type string as its arguments.

You may also create the `FeatureMetadata` container, identify any feature of your choice as the `identity_feature`, specify a list of strings of categorical feature names via the `categorical_features` parameter, and specify dropped features via the `dropped_features` parameter. The `FeatureMetadata` may also be passed into the `RAIInsights`.

In [58]:
from responsibleai.feature_metadata import FeatureMetadata
feature_metadata = FeatureMetadata(categorical_features=categorical_features, dropped_features=[])


In [59]:
rai_insights = RAIInsights(model, train_data, test_data_sample, target_feature, 'classification',
                           feature_metadata=feature_metadata)

Add the components of the toolbox that are focused on model assessment.

In [60]:
# Interpretability
rai_insights.explainer.add()
# Error Analysis
rai_insights.error_analysis.add()
# Counterfactuals
rai_insights.counterfactual.add(total_CFs=10, desired_class='opposite')
#casual
rai_insights.causal.add(treatment_features=['time_in_hospital']),

(None,)

Once all the desired components have been loaded, compute insights on the test set.

In [61]:
rai_insights.compute()

Causal Effects
Current Status: Generating Causal Effects.
Current Status: Finished generating causal effects.
Time taken: 0.0 min 10.766581700000188 sec
Counterfactual
Current Status: Generating 10 counterfactuals for 500 samples


100%|██████████| 500/500 [05:39<00:00,  1.47it/s]


Current Status: Generated 10 counterfactuals for 500 samples.
Time taken: 5.0 min 46.47203439999976 sec
Error Analysis
Current Status: Generating error analysis reports.


Using categorical_feature in Dataset.


Current Status: Finished generating error analysis reports.
Time taken: 0.0 min 0.2292328999997153 sec
Explanations
Current Status: Explaining 20 features


categorical_feature keyword has been found in `params` and will be ignored.
Please use categorical_feature argument of the Dataset constructor to pass this parameter.


Current Status: Explained 20 features.
Time taken: 0.0 min 0.7867237000000387 sec


Compose some cohorts which can be injected into the `ResponsibleAIDashboard`.

In [62]:
from raiutils.cohort import Cohort, CohortFilter, CohortFilterMethods

# Cohort on age and inpatient features in the dataset
cohort_filter_age = CohortFilter(
    method=CohortFilterMethods.METHOD_INCLUDES,
    arg=["Over 60 years", "30-60 years", "30 years or younger"],
    column='age')

cohort_filter_prior_inpatient = CohortFilter(
    method=CohortFilterMethods.METHOD_GREATER,
    arg=[1],
    column='prior_inpatient')

user_cohort_age_and_prior_inpatient = Cohort(name='Cohort Age and Prior_Inpatient > 1')
user_cohort_age_and_prior_inpatient.add_cohort_filter(cohort_filter_age)
user_cohort_age_and_prior_inpatient.add_cohort_filter(cohort_filter_prior_inpatient)

# Cohort on race feature in the dataset
cohort_filter_race = CohortFilter(
    method=CohortFilterMethods.METHOD_INCLUDES,
    arg=["Caucasian", "AfricanAmerican", "Asian", "Hispanic", "Other"],
    column='race')

cohort_filter_no_prior_inpatient = CohortFilter(
    method=CohortFilterMethods.METHOD_INCLUDES,
    arg=[1],
    column='prior_inpatient')

user_cohort_age_no_prior_inpatient = Cohort(name='Cohort Age and Inpatients = 0')
user_cohort_age_no_prior_inpatient.add_cohort_filter(cohort_filter_age)
user_cohort_age_no_prior_inpatient.add_cohort_filter(cohort_filter_no_prior_inpatient)

cohort_list = [user_cohort_age_no_prior_inpatient,
               user_cohort_age_and_prior_inpatient]

Finally, visualize and explore the model insights. Use the resulting widget or follow the link to view this in a new tab.

In [63]:
ResponsibleAIDashboard(rai_insights, cohort_list=cohort_list)

ResponsibleAI started at http://localhost:5003


<raiwidgets.responsibleai_dashboard.ResponsibleAIDashboard at 0x2c2a58a9d90>

Using categorical_feature in Dataset.
