# 5: Additional good outcomes

## Plain English summary
This notebook builds on the scenarios that we considered in a previous notebook. Now that we know the effect of changing scenario on the proportions of patients who meet certain criteria, we can feed that data into a pathway simulator and an outcome model.

The pathway simulator generates a length of time for each patient and for each step in the pathway. The times are selected using the expected distributions of times at each step that we measured previously.

The outcome model takes the final times from onset to treatment for each patient and calculates their outcome. The outcomes are given as mRS (modified Rankin Scale) scores, where 0 is completely independent and 6 is dead. 

We run the pathway and outcome models to generate outcomes for 100 years' worth of patients. By averaging the results over the 100 years, we can find the effect of scenario on the number of additional good outcomes. Here we define additional good outcomes as the extra number compared with the case where nobody receives any treatment for their stroke.

## Notebook setup:

In [1]:
import pandas as pd
import numpy as np
import pickle
from dataclasses import dataclass
from math import sqrt

import matplotlib.pyplot as plt

import stroke_utilities.scenario
from classes.pathway import SSNAP_Pathway

import warnings
warnings.filterwarnings("ignore")

## Set up paths and filenames

In [2]:
@dataclass(frozen=True)
class Paths:
    '''Singleton object for storing paths to data and database.'''

    data_read_path: str = './stroke_utilities/data/'
    data_save_path: str = './stroke_utilities/data/'
    output_folder = './stroke_utilities/output/'
    model_folder = './stroke_utilities/models'
    model_text = 'lgbm_decision_'
    notebook: str = '03_'

paths = Paths()

## Import existing utilities

This hospital performance data for various scenarios was calculated in a previous notebook.

In [3]:
filename = paths.output_folder + 'all_performance_scenarios.csv'
df_performance_scenarios = pd.read_csv(filename, index_col=0)

In [4]:
df_performance_scenarios.T

Unnamed: 0,stroke_type,admissions,proportion_of_all_with_ivt,proportion_of_all_with_mt,proportion_of_mt_with_ivt,proportion1_of_all_with_onset_known_ivt,proportion2_of_mask1_with_onset_to_arrival_on_time_ivt,proportion3_of_mask2_with_arrival_to_scan_on_time_ivt,proportion4_of_mask3_with_onset_to_scan_on_time_ivt,proportion5_of_mask4_with_enough_time_to_treat_ivt,...,lognorm_mu_onset_arrival_mins_mt,lognorm_sigma_onset_arrival_mins_mt,lognorm_mu_arrival_scan_arrival_mins_mt,lognorm_sigma_arrival_scan_arrival_mins_mt,lognorm_mu_scan_puncture_mins_mt,lognorm_sigma_scan_puncture_mins_mt,proportion_of_all_with_mask6_and_ivt,proportion_of_all_with_mask6_and_mt,stroke_team,scenario
1 / lvo / base,lvo,79.4,0.244332,0.045340,0.444444,0.513854,0.823529,0.982143,0.951515,1.0,...,4.618692,0.533736,2.839948,1.036634,4.249917,1.663217,0.216625,0.035264,1,base
1 / nlvo / base,nlvo,223.8,0.140304,0.001787,0.500000,0.582663,0.602761,0.959288,0.877984,1.0,...,4.703262,0.609983,3.374502,1.044641,,,0.119750,0.001787,1,base
1 / other / base,other,40.2,0.000000,0.000000,,0.512438,0.757282,0.974359,0.947368,1.0,...,4.632703,0.745401,3.041762,1.059592,,,0.000000,0.000000,1,base
1 / mixed / base,mixed,343.4,0.147932,0.011648,0.450000,0.558532,0.666319,0.967136,0.906149,1.0,...,4.673074,0.610768,3.196128,1.069103,4.365630,1.580607,0.128130,0.009319,1,base
1 / lvo / benchmark,lvo,79.4,0.302267,0.035264,0.444444,0.513854,0.823529,0.982143,0.951515,1.0,...,4.618692,0.533736,2.839948,1.036634,4.249917,1.663217,0.302267,0.035264,1,benchmark
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119 / mixed / speed + onset,mixed,142.4,0.220887,0.015173,0.866667,0.608442,0.834302,0.950000,0.968531,1.0,...,4.501338,0.534095,2.708050,0.000000,,,0.220887,0.015173,119,speed + onset
119 / lvo / speed + onset + benchmark,lvo,31.4,0.458488,0.056429,0.833333,0.608843,0.939024,0.950000,0.974026,1.0,...,4.506708,0.551317,2.572203,0.000000,,,0.458488,0.056429,119,speed + onset + benchmark
119 / nlvo / speed + onset + benchmark,nlvo,85.2,0.235005,0.002796,1.000000,0.597516,0.768473,0.950000,0.961538,1.0,...,4.516969,0.550005,2.708050,0.000000,,,0.235005,0.002796,119,speed + onset + benchmark
119 / other / speed + onset + benchmark,other,25.8,0.000000,0.000000,,0.608578,0.915254,0.950000,0.981132,1.0,...,4.447202,0.461481,2.708050,0.000000,,,0.000000,0.000000,119,speed + onset + benchmark


## Run models

Find the complete list of stroke teams and scenarios:

In [5]:
stroke_teams = sorted(list(set(df_performance_scenarios['stroke_team'])))

In [6]:
scenario_list = sorted(list(set(df_performance_scenarios['scenario'])))

The following cell looks at each stroke team in turn. It picks out the patients with each stroke type. 

Then separate patient pathways are run for the separate stroke types. The results from the pathways are combined into a single data set.

Some of the pathway results are split off and used for the outcome model. The required results for each patient are their times from onset to treatment, which treatment they received, and which stroke type they have. 

Then the outcome model is run to find out the expected outcome of patients with those features who were treated at those times. The "discrete" outcome model is used, and so the patient must have a pre-stroke mRS score. Rather than sampling this from the original data where there might not be many patients for each stroke type, we create a more representative selection by sampling uniformly from a pre-stroke mRS distribution.

In [7]:
np.random.seed(42)

# How many trials to run:
n_trials = 100

results_df, outcome_results_columns, trial_columns = stroke_utilities.scenario.set_up_results_dataframe()

for stroke_team in stroke_teams:
    for scenario in scenario_list:
        # Get data for one hospital.
        # Squeeze to convert DataFrame to Series.
        lvo_data = df_performance_scenarios[(
            (df_performance_scenarios['stroke_team'] == stroke_team) & 
            (df_performance_scenarios['stroke_type'] == 'lvo') &
            (df_performance_scenarios['scenario'] == scenario)
            )].copy().squeeze()
        nlvo_data = df_performance_scenarios[(
            (df_performance_scenarios['stroke_team'] == stroke_team) & 
            (df_performance_scenarios['stroke_type'] == 'nlvo') &
            (df_performance_scenarios['scenario'] == scenario)
            )].copy().squeeze()
        other_data = df_performance_scenarios[(
            (df_performance_scenarios['stroke_team'] == stroke_team) & 
            (df_performance_scenarios['stroke_type'] == 'other') &
            (df_performance_scenarios['scenario'] == scenario)
            )].copy().squeeze()
    
        # Set up trial results dataframe
        trial_df = pd.DataFrame(columns=trial_columns)
        # Set up the pathways with this data...
        pathway_object_dict = stroke_utilities.scenario.set_up_pathway_objects(stroke_team, lvo_data, nlvo_data, other_data)
        for trial in range(n_trials):
            # ... run the pathways...
            combo_trial_dict = stroke_utilities.scenario.run_trial_of_pathways(pathway_object_dict)
            # ... overwrite the results so that nobody has thrombectomy...
            combo_trial_dict['mt_chosen_bool'] = np.array([0] * len(combo_trial_dict['mt_chosen_bool'])) == 1
            # ... and run the clinical outcome model.
            results_by_stroke_type, patient_array_outcomes = (
                stroke_utilities.scenario.run_discrete_outcome_model(combo_trial_dict))

            number_of_patients = len(combo_trial_dict['stroke_type_code'])
            # Patients' mRS if not treated..
            mrs_not_treated = patient_array_outcomes['each_patient_mrs_not_treated']
            # Patients' mRS in this trial...
            mrs_post_stroke = patient_array_outcomes['each_patient_mrs_post_stroke']
            # How many patients were good outcomes? mRS 0 or 1.
            n_good_baseline = len(np.where(mrs_not_treated <= 1)[0])
            n_good_post_stroke = len(np.where(mrs_post_stroke <= 1)[0])
            # Additional good outcomes:
            n_good_additional = n_good_post_stroke - n_good_baseline
            # Convert to outcomes per 1000 patients:
            n_baseline_good_per_1000 = n_good_baseline * (1000.0 / number_of_patients)
            n_additional_good_per_1000 = n_good_additional * (1000.0 / number_of_patients)
    
            result = stroke_utilities.scenario.gather_results_from_trial(
                trial_columns, combo_trial_dict, results_by_stroke_type,
                n_baseline_good_per_1000, n_additional_good_per_1000
                )
            trial_df.loc[trial] = result
    
        summary_trial_results = stroke_utilities.scenario.gather_summary_results_across_all_trials(outcome_results_columns, trial_df)
        summary_trial_results += [f'{stroke_team}']
        summary_trial_results += [scenario.replace(' + ', '_')]
        
        # add scenario results to results dataframe
        results_df.loc[f'{stroke_team} / {scenario}'] = summary_trial_results

## Results

In [8]:
results_df.T

Unnamed: 0,1 / base,1 / benchmark,1 / onset,1 / onset + benchmark,1 / speed,1 / speed + benchmark,1 / speed + onset,1 / speed + onset + benchmark,2 / base,2 / benchmark,...,118 / speed + onset,118 / speed + onset + benchmark,119 / base,119 / benchmark,119 / onset,119 / onset + benchmark,119 / speed,119 / speed + benchmark,119 / speed + onset,119 / speed + onset + benchmark
Percent_Thrombolysis_(median),13.157895,18.421053,14.619883,19.590643,13.450292,18.71345,15.204678,20.467836,12.854031,17.538126,...,19.340463,22.816399,18.439716,20.921986,22.695035,25.531915,17.730496,20.567376,21.276596,24.822695
Percent_Thrombolysis_(low_5%),10.804094,15.774854,11.111111,16.081871,11.111111,15.204678,12.266082,17.236842,10.239651,14.586057,...,17.290553,19.964349,14.858156,16.312057,18.439716,19.858156,13.475177,14.893617,15.567376,19.858156
Percent_Thrombolysis_(high_95%),15.804094,21.944444,17.836257,23.69883,16.666667,22.821637,18.435673,24.269006,15.03268,20.501089,...,21.568627,25.499109,23.404255,25.531915,29.113475,31.241135,22.730496,25.531915,26.950355,31.205674
Percent_Thrombolysis_(mean),13.307018,18.497076,14.654971,19.733918,13.710526,18.75731,15.25731,20.535088,12.694989,17.514161,...,19.356506,22.85205,18.631206,21.170213,22.758865,25.702128,18.248227,20.595745,21.049645,25.085106
Percent_Thrombolysis_(stdev),1.683289,1.92328,1.987609,2.269281,1.726047,2.338951,1.978944,2.302662,1.482767,1.951568,...,1.407926,1.690868,2.706464,2.913527,3.385721,3.309877,3.058961,3.247583,3.449328,3.196707
Percent_Thrombolysis_(95ci),0.329919,0.376956,0.389564,0.444771,0.338299,0.458426,0.387866,0.451313,0.290617,0.3825,...,0.275948,0.331404,0.530457,0.571041,0.663589,0.648724,0.599545,0.636515,0.676056,0.626543
Baseline_good_outcomes_per_1000_patients_(median),333.333333,324.561404,327.48538,333.333333,330.409357,330.409357,327.48538,327.48538,344.22658,349.673203,...,317.290553,315.508021,304.964539,304.964539,308.510638,304.964539,315.602837,312.056738,304.964539,304.964539
Baseline_good_outcomes_per_1000_patients_(low_5%),298.245614,286.549708,289.473684,295.321637,292.397661,303.947368,292.105263,295.175439,315.795207,317.973856,...,288.680927,288.680927,247.87234,248.22695,255.319149,248.22695,255.319149,254.609929,248.22695,248.22695
Baseline_good_outcomes_per_1000_patients_(high_95%),371.491228,362.719298,368.421053,362.719298,377.192982,365.643275,365.497076,365.497076,374.836601,379.302832,...,345.989305,344.117647,375.886525,361.702128,369.148936,368.794326,354.609929,361.702128,354.609929,368.794326
Baseline_good_outcomes_per_1000_patients_(mean),334.005848,325.321637,327.894737,331.783626,332.982456,331.374269,328.947368,327.894737,345.46841,350.130719,...,316.096257,316.327986,307.588652,306.524823,308.368794,303.546099,310.921986,304.893617,303.900709,304.326241


Save the results to file:

In [9]:
df = results_df

# Round the values to fewer decimal places:
for column in df.columns:
    if column not in ['stroke_team', 'stroke_team_id', 'scenario']:
        df[column] = df[column].astype(float).round(6)
        
df.to_csv(
    f'{paths.output_folder}/scenario_results.csv',
    index=False
    )