# Teamwork study

In this notebook, we use the `teamwork` library to represent a care teams' collaborative experience as a network graph. We can then leverage the `networkx` library to calculate the average clustering coefficient for each care team in the dataset. Combining this care team collaboration data with patient discharge data, we can study the correlation between care team collaboration experience and patient outcomes. 

## Import libraries

In [16]:
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from multiprocessing import Pool
import os
import sys

# Import from the parent directory
sys.path.append(os.path.join(os.getcwd(), '..'))
from teamwork import teamwork as tw
from utils import utils

## Read in EHR data and create study runner object

The study runner is an iterable (generator) object

In [8]:
#Get EHR notes data
notes_df = pd.read_csv(utils.notes_with_disposition_file, parse_dates=[2])

# set a 90 day window to find collaboration among care teams
WINDOW = 90

# identify care teams within 2 day increments 
STEP = 2

# create the study runner
get_care_dates = tw.TeamworkStudyRunner(notes_df, WINDOW, STEP)

## Gather data for each care team identified on each care date

The `get_careteam_data` utility function uses the `care_team` network graph
to calculate the cumulative experience and other metrics for the care team


In [15]:
# measure performance
start_time = time.perf_counter()

# flatten the experience data into a list
experience_data_list = [utils.get_careteam_data(care_team) for care_date in get_care_dates for care_team in care_date]

stop_time = time.perf_counter()

print(f"It took {stop_time - start_time} seconds or {(stop_time - start_time) / 60} minutes"
      + f" to process a total of {len(notes_df.index)} notes. The study walked through the notes {STEP} days at a time"
      + f" to identify care teams and calculate care team experience within the previous {WINDOW} day window.")

It took 118.70449000000008 seconds or 1.978408166666668 minutes to process a total of 2721 notes. The study walked through the notes 2 days at a time to identify care teams and calculate care team experience within the previous 90 day window.


## Convert data into DataFrame for analysis

To study care team experience and patient outcomes, we need to tie in the patient info from discharge data 

In [5]:
experience_df = pd.DataFrame(experience_data_list, columns=utils.columns).drop_duplicates()

discharges_df = pd.read_csv(utils.discharges_with_disposition_file)

experience_master_df = experience_df.merge(discharges_df, left_on='discharge_id', right_on='id', copy=False)

print(experience_master_df.shape)

(388, 12)


## Analysis can be performed on the resulting DataFrame to study the correlation between cumulative care team experience and patient outcomes 

In [6]:
def get_model(var):
    return sm.GLM.from_formula(f'disposition ~ {var} + age', family = sm.families.Binomial(), data=experience_master_df)

model = get_model('avg_clust')
result = model.fit()
result.summary() 

0,1,2,3
Dep. Variable:,disposition,No. Observations:,388.0
Model:,GLM,Df Residuals:,385.0
Model Family:,Binomial,Df Model:,2.0
Link Function:,logit,Scale:,1.0
Method:,IRLS,Log-Likelihood:,-110.87
Date:,"Tue, 26 Jan 2021",Deviance:,221.74
Time:,12:07:06,Pearson chi2:,388.0
No. Iterations:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-8.2942,3.073,-2.699,0.007,-14.317,-2.271
avg_clust,-0.0897,0.472,-0.190,0.849,-1.014,0.835
age,0.0815,0.041,1.966,0.049,0.000,0.163


In [7]:
model = get_model('cumulative_experience')
result = model.fit()
result.summary() 

0,1,2,3
Dep. Variable:,disposition,No. Observations:,388.0
Model:,GLM,Df Residuals:,385.0
Model Family:,Binomial,Df Model:,2.0
Link Function:,logit,Scale:,1.0
Method:,IRLS,Log-Likelihood:,-109.06
Date:,"Tue, 26 Jan 2021",Deviance:,218.12
Time:,12:07:06,Pearson chi2:,384.0
No. Iterations:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-8.1931,3.117,-2.628,0.009,-14.303,-2.083
cumulative_experience,-0.0447,0.025,-1.765,0.078,-0.094,0.005
age,0.0839,0.042,1.991,0.046,0.001,0.167


In [8]:
model = get_model('avg_cumulative_experience')
result = model.fit()
result.summary()   

0,1,2,3
Dep. Variable:,disposition,No. Observations:,388.0
Model:,GLM,Df Residuals:,385.0
Model Family:,Binomial,Df Model:,2.0
Link Function:,logit,Scale:,1.0
Method:,IRLS,Log-Likelihood:,-109.69
Date:,"Tue, 26 Jan 2021",Deviance:,219.37
Time:,12:07:06,Pearson chi2:,383.0
No. Iterations:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-8.1604,3.104,-2.629,0.009,-14.244,-2.077
avg_cumulative_experience,-0.2820,0.188,-1.500,0.134,-0.650,0.086
age,0.0835,0.042,1.990,0.047,0.001,0.166
