# Stroke type proportions for patients with onset-to-arrival times below various cutoffs

Find the proportions of patients with nLVO, LVO, and haemorrhagic stroke in subgroups of patients that arrive at hospital by ambulance within X hours of their stroke onset.

## Plain English summary

Generally we expect people who have more severe strokes to recognise the symptoms and arrive at hospital sooner than people who have mild strokes. In the modelling, we usually use the stroke severity score to label patients as having an nLVO (for milder strokes) or an LVO (for more severe strokes). This means that a subgroup of patients who arrive at hospital very quickly, within a few hours after their stroke onset, is likely to contain more LVO patients than a subgroup of patients who arrived more slowly, within 24 hours.

In this notebook we check the proportions of stroke types in subgroups of patients arriving by various cutoff times. The times are one hour after onset, two hours after onset, and so on up to and including 24 hours after onset.

We expect to see some difference in the proportions across subgroups and so these calculations are more of a check to see how big the differences are.

## Aims

Calculate the proportions of patients with nLVO, LVO, and haemorrhagic stroke in subgroups that arrive by various cutoff times. Look at subgroups arriving before 1 hour, 2 hour, ..., 24 hours.

Save the resulting proportions to file and make pie charts to compare the variations visually.

## Method

Load existing data on proportions of stroke types arriving by the expected time. Simplify the data into just three values for each time: proportion of those who arrived on time with nLVO, LVO, and haemorrhagic strokes. Compare the results across the cutoff times using pie charts.

## Code setup

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation

## Load data

In [2]:
df_props_full = pd.read_csv('stroke_type_proportions_with_arrival_time.csv', index_col=0)

In [3]:
df_props_full.head(3).T

expected_time,1.0,2.0,3.0
prop_arriving_within_expected_time,0.05991,0.26796,0.3768
prop_haemo_of_prop_arriving_within_expected_time,0.15957,0.15649,0.15302
prop_nlvo_of_prop_arriving_within_expected_time,0.48388,0.54138,0.56905
prop_lvo_of_prop_arriving_within_expected_time,0.35655,0.30213,0.27792
prop_haemo_arriving_within_expected_time,0.00956,0.04193,0.05766
prop_nlvo_arriving_within_expected_time,0.02899,0.14507,0.21442
prop_lvo_arriving_within_expected_time,0.02136,0.08096,0.10472
prop_haemo_of_prop_arriving_after_expected_time,0.13094,0.12393,0.12034
prop_nlvo_of_prop_arriving_after_expected_time,0.66118,0.69052,0.69984
prop_lvo_of_prop_arriving_after_expected_time,0.20788,0.18555,0.17983


## Simplify data

Keep only patients who arrived before the onset-to-arrival time cutoff, and store proportions of nLVO, LVO, and haemorrhagic stroke for each time to sum to 1.

In [4]:
df_props = df_props_full.copy()

df_props['prop_nlvo'] = (
    df_props['prop_nlvo_arriving_within_expected_time'] /
    df_props['prop_arriving_within_expected_time']
)

df_props['prop_lvo'] = (
    df_props['prop_lvo_arriving_within_expected_time'] /
    df_props['prop_arriving_within_expected_time']
)

df_props['prop_haemo'] = (
    df_props['prop_haemo_arriving_within_expected_time'] /
    df_props['prop_arriving_within_expected_time']
)

df_props = df_props[['prop_nlvo', 'prop_lvo', 'prop_haemo']]

In [5]:
df_props

Unnamed: 0_level_0,prop_nlvo,prop_lvo,prop_haemo
expected_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1.0,0.483893,0.356535,0.159573
2.0,0.541387,0.302135,0.156479
3.0,0.569055,0.277919,0.153025
4.0,0.583048,0.266635,0.150318
5.0,0.591916,0.260136,0.147948
6.0,0.598433,0.255442,0.146125
7.0,0.603749,0.251589,0.144662
8.0,0.607694,0.248782,0.143506
9.0,0.611209,0.246262,0.142529
10.0,0.613927,0.244329,0.141762


# Extra subgroup: patients who arrive eventually

Have no limits on the arrival time, and so include patients who are missing the onset-to-arrival time data.

The total number of patients and the patient proportions will be the same across all time cutoffs because we're taking everybody whether they're before or after the cutoff. (Except for any rounding errors!)

In [6]:
df_props_extra = df_props_full.copy()

df_props_extra['prop_haemo'] = (
    df_props_extra['prop_haemo_arriving_within_expected_time'] +
    df_props_extra['prop_haemo_arriving_after_expected_time']
)

df_props_extra['prop_nlvo'] = (
    df_props_extra['prop_nlvo_arriving_within_expected_time'] +
    df_props_extra['prop_nlvo_arriving_after_expected_time']
)

df_props_extra['prop_lvo'] = (
    df_props_extra['prop_lvo_arriving_within_expected_time'] +
    df_props_extra['prop_lvo_arriving_after_expected_time']
)

df_props_extra['prop_sum'] = (
    df_props_extra['prop_lvo'] +
    df_props_extra['prop_nlvo'] + 
    df_props_extra['prop_haemo']
)

# Only keep these columns:
df_props_extra = df_props_extra[['prop_haemo', 'prop_nlvo', 'prop_lvo', 'prop_sum']]

In [7]:
df_props_extra

Unnamed: 0_level_0,prop_haemo,prop_nlvo,prop_lvo,prop_sum
expected_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1.0,0.13266,0.65056,0.21679,1.00001
2.0,0.13265,0.65056,0.21679,1.0
3.0,0.13266,0.65056,0.21679,1.00001
4.0,0.13266,0.65055,0.21679,1.0
5.0,0.13266,0.65055,0.21679,1.0
6.0,0.13265,0.65055,0.21679,0.99999
7.0,0.13265,0.65056,0.21679,1.0
8.0,0.13265,0.65055,0.21679,0.99999
9.0,0.13266,0.65056,0.21679,1.00001
10.0,0.13266,0.65056,0.21679,1.00001


Pick the first option that has a sum of 1 exactly:

In [8]:
ind = df_props_extra[df_props_extra['prop_sum'] == 1.0].index.values[0]

Place this row into the full results list:

In [9]:
df_props.loc['inf'] = df_props_extra.loc[ind, ['prop_nlvo', 'prop_lvo', 'prop_haemo']]

In [10]:
df_props.tail(3)

Unnamed: 0_level_0,prop_nlvo,prop_lvo,prop_haemo
expected_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
23.0,0.631308,0.232846,0.135846
24.0,0.632363,0.232109,0.135543
inf,0.65056,0.21679,0.13265


## Save results

Round all values to 5 decimal places, which should be plenty:

In [11]:
df_props = np.round(df_props, 5)

In [12]:
df_props.to_csv('stroke_type_props_arrived_on_time.csv')

## Plot proportions

In [13]:
for i in range(len(df_props)):
    fig, ax = plt.subplots()
    
    time = df_props.index.values[i]
    ax.pie(df_props.loc[time].values, labels=df_props.columns, autopct='%1.1f%%')
    try:
        title = f'Expected onset-to-arrival time less than {time:.0f} hours'
        savename = f'props_{time:.0f}h'
    except ValueError:
        title = f'All patients who arrive eventually'
        savename = 'props_eventually'
    ax.set_title(title)
    
    plt.savefig(f'./pie_frames/{savename}.png', bbox_inches='tight')
    plt.close()

## Show the results for 6 hours and 24 hours:

![Pie chart of stroke type proportions for patients arriving within 6 hours of stroke onset.](./pie_frames/props_6h.png)
![Pie chart of stroke type proportions for patients arriving within 24 hours of stroke onset.](./pie_frames/props_24h.png)

The differences are fairly small, plus or minus only a few percent...

In [14]:
df_props.loc[[6.0, 24.0]]

Unnamed: 0_level_0,prop_nlvo,prop_lvo,prop_haemo
expected_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
6.0,0.59843,0.25544,0.14613
24.0,0.63236,0.23211,0.13554


... so generally speaking it should be fine to pick one set of stroke type proportions at an expected onset-to-arrival time and apply that one set to all results.

## Show the results for all times:

![Animated pie chart of stroke type proportions for patients arriving within the cutoff time stroke onset. The cutoff times run from 1 hour to 24 hours in steps of 1 hour.](./pie_frames/props_with_time.gif)


## Show the results for any arrival time:

![Pie chart of stroke type proportions for patients arriving at any time after stroke onset.](./pie_frames/props_eventually.png)


## Conclusion

There is a small variation of the proportions of different stroke types in subgroups of patients who arrive at hospital within a given time of their stroke onset.

The variation between patients arriving within 6 hours and patients arriving within 24 hours is quite small. The proportions for the 6 hours group could be applied to results for the 24 hours group without a huge loss of accuracy.