# Combine cohorts

already have
Calculate outcomes for these cohorts of patients:
+ nLVO with IVT
+ LVO with IVT
+ LVO with MT

Then combine the results into these groups:
+ RACE < 5, treated population only
+ RACE >= 5, treated population only
+ Full population, actual treatment rates
+ Full population, target treatment rates

## Code setup

In [3]:
import pandas as pd
import os
from dataclasses import dataclass
import copy
import numpy as np

In [15]:
# Define file paths
@dataclass(frozen=True)
class Paths:
    '''Singleton object for storing paths to data and database.'''

    # Directories:
    dir_data = 'data'
    dir_output = 'output'

    # Reference data:
    patient_props_target = 'patient_proportions_target.csv'
    patient_props_actual_pre_str = 'patient_proportions_'  # {unit_postcode}.csv
    # Input files (outcome model results):
    outcome_file_pre_str = 'sa_outcome_model_output_'  # {unit_postcode}.csv

paths = Paths()

## Load outcome model outputs

There is one file saved for each of the following stroke unit postcodes:

In [5]:
unit_list = [
    'BT126BA',
    'BT161RH',
    'BT358DR',
    'BT412RL',
    'BT476SB',
    'BT521HS',
    'BT635QQ',
    'BT746DN'
]

Load the file for each stroke unit postcode in turn and store the results in the dictionary:

In [10]:
dict_outcomes = {}

for unit_name in unit_list:
    path_to_file = os.path.join(paths.dir_output, paths.outcome_file_pre_str + unit_name + '.csv')
    df = pd.read_csv(path_to_file, index_col=0)
    dict_outcomes[unit_name] = df

Show the contents of the first of these dataframes:

In [11]:
dict_outcomes[unit_list[0]].head(3).T

from_postcode,N00000001,N00000002,N00000003
nlvo_ivt_added_utility,0.10711,0.10988,0.10988
nlvo_ivt_mean_mrs,1.69309,1.67704,1.67704
nlvo_ivt_mrs_less_equal_2,0.69579,0.69884,0.69884
nlvo_ivt_mrs_shift,-0.58691,-0.60296,-0.60296
nlvo_ivt_added_mrs_less_equal_2,0.11579,0.11884,0.11884
lvo_ivt_added_utility,0.05214,0.05399,0.05399
lvo_ivt_mean_mrs,3.38255,3.37244,3.37244
lvo_ivt_mrs_less_equal_2,0.32186,0.32381,0.32381
lvo_ivt_mrs_shift,-0.25745,-0.26756,-0.26756
lvo_ivt_added_mrs_less_equal_2,0.05686,0.05881,0.05881


## Combine cohorts into patient populations

The outcome model has created separate results for these cohorts:
+ nLVO treated with IVT
+ LVO treated with IVT only
+ LVO treated with MT only
+ LVO treated with IVT and MT

We will combine these into the following patient populations:
+ RACE < 5
  + Just the nLVO patients
+ RACE >= 5
  + Just the LVO patients, mix of treatment types
+ Full population, actual treatment rates
  + Combine treated ischaemic patients scaled to include patients receiving no treatment
+ Full population, target treatment rates
  + Combine treated ischaemic patients scaled to include patients receiving no treatment      

List of patient cohorts:

In [13]:
patient_cohorts = ['nlvo_ivt', 'lvo_ivt', 'lvo_ivt_mt', 'lvo_mt']

List of outcome measures in the results data:

In [14]:
outcome_measures = sorted(list(set([
    c.split('ivt_')[-1].split('mt_')[-1]
    for c in dict_outcomes[unit_list[0]].columns
])))

outcome_measures

['added_mrs_less_equal_2',
 'added_utility',
 'mean_mrs',
 'mrs_less_equal_2',
 'mrs_shift']

### Patient proportions

These are the proportions of the full population with target treatment rates:

In [16]:
df_props_target = pd.read_csv(
    os.path.join(paths.dir_data, paths.patient_props_target),
    index_col=0, header=None
).squeeze()
df_props_target.name = 'target_treatment_rates'

df_props_target

0
haemorrhagic         0.13600
lvo_no_treatment     0.16332
lvo_ivt_only         0.04156
lvo_ivt_mt           0.04250
lvo_mt_only          0.00750
nlvo_no_treatment    0.51318
nlvo_ivt             0.09594
Name: target_treatment_rates, dtype: float64

And proportions with actual target rates:  DEPEND ON STROKE UNTI  - __TO DO__ change this to import separately

In [44]:
df_props_actual = pd.read_csv(
    os.path.join(paths.dir_output, paths.patient_props_actual),
    index_col=0, header=None
).squeeze()
df_props_actual.name = 'actual_treatment_rates'

df_props_actual

0
haemorrhagic         0.13600
lvo_no_treatment     0.16332
lvo_ivt_only         0.04156
lvo_ivt_mt           0.04250
lvo_mt_only          0.00750
nlvo_no_treatment    0.51318
nlvo_ivt             0.09594
Name: target_treatment_rates, dtype: float64

In [None]:
prop_nlvo = df_props_target['nlvo_ivt'] + df_props_target['nlvo_no_treatment']
prop_lvo = (
    df_props_target['lvo_ivt_only'] +
    df_props_target['lvo_ivt_mt'] +
    df_props_target['lvo_mt_only'] +
    df_props_target['lvo_no_treatment'] +
)

proportions_dicts = {
    'racel5': {
        'nlvo_ivt': (df_props_target['nlvo_ivt'] / prop_nlvo),
        'lvo_ivt': 0.0,
        'lvo_ivt_mt': 0.0,
        'lvo_mt': 0.0
    },
    'racegeq5': {
        'nlvo_ivt': 0.0,
        'lvo_ivt': (df_props_target['lvo_ivt_only'] / prop_lvo),
        'lvo_ivt_mt': (df_props_target['lvo_ivt_mt'] / prop_lvo),
        'lvo_mt': (df_props_target['lvo_mt_only'] / prop_lvo)
    },
    'full_pop_target_rates': {
        'nlvo_ivt': df_props_target['lvo_ivt_only'],
        'lvo_ivt': df_props_target['lvo_ivt_only'],
        'lvo_ivt_mt': df_props_target['lvo_ivt_only'],
        'lvo_mt': df_props_target['lvo_ivt_only']
    },
    'full_pop_actual_rates': {
        'nlvo_ivt': ,
        'lvo_ivt': ,
        'lvo_ivt_mt': ,
        'lvo_mt': 
    },
}

In [None]:
def combine_cohorts(df_outcomes, outcome_measures, proportions_dicts):
    df_results = pd.DataFrame(df_outcomes.index)

    for om im outcome_measures:
    

## Save results

Save the results to file: