# Data sources cheat sheet

Here's where all the data came from

---

## Notebook admin

In [10]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Set NumPy and Pandas to print 4 decimal places
np.set_printoptions(precision=4)
pd.set_option("display.precision", 4)

---

# Pre-stroke groups

The population data comes from [the SAMueL-1 survey](https://samuel-book.github.io/samuel-1/descriptive_stats/08_prestroke_mrs.html). All of the ischaemic and intra-cranial haemorrhage data are included. Patients with NIHSS>11 are designated as LVO, and NIHSS$\leq$10 are designated nLVO. 

In [70]:
dist_pre_stroke_nlvo = np.array(
    [0.582881, 0.162538, 0.103440, 0.102223, 0.041973, 0.006945, 0.0])
dist_pre_stroke_lvo = np.array([
    0.417894, 0.142959, 0.118430, 0.164211, 0.113775, 0.042731, 0.0])

There are [246,676 patients](https://samuel-book.github.io/samuel-1/introduction/data.html) in the full dataset, and [the proportions of nLVO and LVO](https://samuel-book.github.io/samuel-1/descriptive_stats/10_using_nihss_10_for_lvo.html) are 74.9% and 25.1% respectively. Again these numbers are calculated using ICH patients too. 

### Our new calculations 

We can convert the probabilities back into number of patients:

In [71]:
n_patients_samuel_total = 246676
n_patients_samuel_total_nlvo = 0.749 * n_patients_samuel_total
n_patients_samuel_total_lvo = 0.251 * n_patients_samuel_total

n_patients_pre_stroke_nlvo = dist_pre_stroke_nlvo * n_patients_samuel_total_nlvo
n_patients_pre_stroke_lvo = dist_pre_stroke_lvo * n_patients_samuel_total_lvo

__nLVO:__

In [73]:
data_pre_stroke_nlvo = np.stack((
    dist_pre_stroke_nlvo,
    n_patients_pre_stroke_nlvo, 
    np.round(n_patients_pre_stroke_nlvo,0).astype(int),
))

df = pd.DataFrame(data_pre_stroke_nlvo.T,
            columns=['Probability', 'Number of patients', 
                     'Number of patients (rounded)'],
            index=[f'mRS {i}' for i in range(7)])
df.loc['Total'] = df.sum()

df

Unnamed: 0,Probability,Number of patients,Number of patients (rounded)
mRS 0,0.5829,107693.2824,107693.0
mRS 1,0.1625,30030.5735,30031.0
mRS 2,0.1034,19111.6079,19112.0
mRS 3,0.1022,18886.7546,18887.0
mRS 4,0.042,7754.9451,7755.0
mRS 5,0.0069,1283.1605,1283.0
mRS 6,0.0,0.0,0.0
Total,1.0,184760.324,184761.0


__LVO:__

In [74]:
data_pre_stroke_lvo = np.stack((
    dist_pre_stroke_lvo,
    n_patients_pre_stroke_lvo, 
    np.round(n_patients_pre_stroke_lvo,0).astype(int),
))

df = pd.DataFrame(data_pre_stroke_lvo.T,
            columns=['Probability', 'Number of patients', 
                     'Number of patients (rounded)'],
            index=[f'mRS {i}' for i in range(7)])
df.loc['Total'] = df.sum()

df

Unnamed: 0,Probability,Number of patients,Number of patients (rounded)
mRS 0,0.4179,25874.1895,25874.0
mRS 1,0.143,8851.4031,8851.0
mRS 2,0.1184,7332.6735,7333.0
mRS 3,0.1642,10167.2351,10167.0
mRS 4,0.1138,7044.456,7044.0
mRS 5,0.0427,2645.7188,2646.0
mRS 6,0.0,0.0,0.0
Total,1.0,61915.676,61915.0


---

# Control groups

## nLVO and LVO separately, for mRS<=1

Population data from Emberson et al. 2014 (their Figure 2, "Control" column):

> ![](./images/data_sources/emberson-et-al-2014_figure-2_population-data.png)

We take the rows where NIHSS$\leq$10 to mean patients with nLVO, and NIHSS>10 to mean LVO. Then we can calculate the probability $P$ and odds $O$:

__nLVO:__

$P(\mathrm{mRS}\leq1 \ |\ \mathrm{no\ treatment} \ |\ \mathrm{nLVO}) = \frac{189 + 538}{321 + 1252} = 0.4622 $

$O(\mathrm{mRS}\leq1 \ |\ \mathrm{no\ treatment} \ |\ \mathrm{nLVO}) = \frac{0.46...}{1.0 - 0.46...} = 0.8593 $

__LVO:__

$P(\mathrm{mRS}\leq1 \ |\ \mathrm{no\ treatment} \ |\ \mathrm{LVO}) = \frac{175 + 55 + 8}{808 + 671 + 313} = 0.1328 $

$O(\mathrm{mRS}\leq1 \ |\ \mathrm{no\ treatment} \ |\ \mathrm{LVO}) = \frac{0.13...}{1.0 - 0.13...} = 0.1532 $

## LVO, for each mRS

Population data from Goyal et al. 2016 (their Figure 1, Section A "Overall", "Control population" row):

> ![](./images/data_sources/goyal-et-al-2016_figure-1_mRS-dists.png)


From the Goyal et al. 2016 data above, the probability $P$:  

$P(\mathrm{mRS}\leq1 \ |\ \mathrm{no\ treatment} \ |\ \mathrm{LVO}) = \frac{5.0 + 7.9}{100} = 0.129$

We can compare this probability with the corresponding value from Emberson et al. 2014.
    
| | $P$(mRS$\leq$1 &#124; no treatment &#124; LVO) |    
| --- | --- | 
| Emberson et al. 2014 | 0.1328 |
| Goyal et al. 2016 | 0.1290 | 

These values are the same to two significant figures.

### Our new calculations

We can convert the Goyal et al. 2016 probability distribution back into the number of patients:

In [34]:
dist_notreatment_lvo = (
    np.array([5.0, 7.9, 13.6, 16.4, 24.7, 13.5, 18.9])/100.0)

n_patients_total_notreatment_lvo = 645

n_patients_notreatment_lvo = []
for prob in dist_notreatment_lvo:
    n_patients = prob * n_patients_total_notreatment_lvo
    n_patients_notreatment_lvo.append(n_patients)

The numbers of patients in each mRS bin are not whole numbers because of the rounding of the probability distribution from Figure 1. We'll also show the numbers of patients rounded to sensible numbers. 

In [49]:
data_lvo = np.stack((
    dist_notreatment_lvo,
    n_patients_notreatment_lvo, 
    np.round(n_patients_notreatment_lvo,0).astype(int)
))

df = pd.DataFrame(data_lvo.T,
            columns=['Probability', 'Number of patients', 
                     'Number of patients (rounded)'],
            index=[f'mRS {i}' for i in range(7)])

df.loc['Total'] = df.sum()

df

Unnamed: 0,Probability,Number of patients,Number of patients (rounded)
mRS 0,0.05,32.25,32.0
mRS 1,0.079,50.955,51.0
mRS 2,0.136,87.72,88.0
mRS 3,0.164,105.78,106.0
mRS 4,0.247,159.315,159.0
mRS 5,0.135,87.075,87.0
mRS 6,0.189,121.905,122.0
Total,1.0,645.0,645.0


## nLVO and LVO combined, for each mRS

Population data from Lees et al. 2010 (their Figure 2, "Placebo" rows):

> ![](./images/data_sources/lees-et-al-2010_figure-2_mRS-dists.png)

> n.b. The sizes of the bars are not given anywhere in the text.

### Our new calculations

To convert this image into useful numbers, we've measured the ratio of the size of each individual mRS bar to the combined width of all of the bars in a row. 

This gives the numbers of patients in each bar as the following: 

In [59]:
n_patients_0to90min = np.array(
    [15.9142, 30.3480, 16.4694, 18.3815, 29.0527, 10.6095, 30.2247])
n_patients_91to180min = np.array(
    [46.0662, 45.4228, 29.8529, 51.7279, 61.1213, 33.8419, 46.9669])
n_patients_181to270min = np.array(
    [129.5347, 176.9093, 108.6634, 113.9641, 134.5041,  64.6017, 82.8227])
n_patients_271to360min = np.array(
    [ 78.9657, 115.3431,  72.9767,  70.0931, 104.0306,  47.6900, 53.9007])

We can combine the four "placebo" bars by summing them: 

In [None]:
n_patients_notreatment_nlvo_lvo = np.sum(
    [n_patients_0to90min, n_patients_91to180min, 
     n_patients_181to270min, n_patients_271to360min],
    axis=0)

n_patients_total_0to360min = 151 + 315 + 811 + 543

Convert the number of patients to a probability distribution:

In [75]:
dist_notreatment_nlvo_lvo = n_patients_0to360min / n_patients_total_0to360min

dist_notreatment_nlvo_lvo

array([0.1486, 0.2022, 0.1253, 0.1397, 0.1806, 0.0861, 0.1175])

In [60]:
data_nlvo_lvo = np.stack((
    dist_notreatment_nlvo_lvo,
    n_patients_notreatment_nlvo_lvo, 
    np.round(n_patients_notreatment_nlvo_lvo,0).astype(int)
))

df = pd.DataFrame(data_nlvo_lvo.T,
            columns=['Probability', 'Number of patients', 
                     'Number of patients (rounded)'],
            index=[f'mRS {i}' for i in range(7)])

df.loc['Total'] = df.sum()

df

Unnamed: 0,Probability,Number of patients,Number of patients (rounded)
mRS 0,0.1486,270.4808,270.0
mRS 1,0.2022,368.0232,368.0
mRS 2,0.1253,227.9624,228.0
mRS 3,0.1397,254.1666,254.0
mRS 4,0.1806,328.7087,329.0
mRS 5,0.0861,156.7431,157.0
mRS 6,0.1175,213.915,214.0
Total,1.0,1819.9998,1820.0


---

# Extrapolate odds ratio

Plot of odds ratio with time from Emberson et al. 2014 (their Figure 1):

> ![](./images/data_sources/emberson-et-al-2014_figure-1_odds-ratio-with-time.png)

> n.b. The y-axis is definitely "odds ratio". The dark blue line was calculated using "log(odds ratio)" and then transformed to "odds ratio" for the graph. 

We take the following two important coordinates from this graph:
    
| Treatment delay (hours) | Odds ratio |    
| --- | --- | 
| 1.0 | $\sim$1.9 |
| 6.3 | 1.0 | 


## Our new calculations

### Method

The following method is explained more fully [here](https://github.com/samuel-book/stroke_outcome/blob/main/extrapolate_odds_ratio.ipynb). 

By converting the two important coordinates to log(odds ratio) and connecting them with a straight line, we can extrapolate back to time zero:

> <img src='./images/extrapolate-emberson-fig1.jpg' width='600'>

The three marked coordinates are:

| Treatment delay (hours) | log(odds ratio) |    
| --- | --- | 
| 0.0 | 0.76296 | 
| 1.0 | 0.64185 |
| 6.3 | 0.00000 | 


When a data point for probability at time 6.3 hours is known, the new log(odds ratio) at time zero can be converted into probability at time zero.


### Usage 

In the "Control Groups" section above, we used data from the Emberson et al. 2014 Figure 2 to find probabilities in the patient population that did not receive treatment. We can feed those probabilities into this straight-line fit to create time-zero probabilities. 

The results:

| Occlusion | $P$(mRS$\leq$1 &#124; time zero) <br> (Extrapolated) | $P$(mRS$\leq$1 &#124; no treatment)  <br> (Emberson et al. 2014, Figure 2) |
| --- | --- | --- |
| nLVO | 0.6483 | 0.4622 |
| LVO | 0.2472 | 0.1328 |

The code to generate these numbers is given at the bottom of [this page](https://github.com/samuel-book/stroke_outcome/blob/main/extrapolate_odds_ratio.ipynb). 

---

# Excess deaths

## MT 

From Goyal et al. 2016 (Table 4, "Mortality" row):

> ![](./images/data_sources/goyal-et-al-2016_table-4_excess-deaths.png)

In [22]:
prop_fatalICH_MT = (97.0 / 633.0)
prop_fatalICH_noMT = (122.0 / 646.0)
prop_fatalICH_MT_excess = prop_fatalICH_MT - prop_fatalICH_noMT

print(f'% of deaths, given MT:     {100.0 * prop_fatalICH_MT:.4f}')
print(f'% of deaths, no treatment: {100.0 * prop_fatalICH_noMT:.4f}')
print(f'% of excess deaths:        {100.0 * prop_fatalICH_MT_excess:.4f}')

% of deaths, given MT:     15.3239
% of deaths, no treatment: 18.8854
% of excess deaths:        -3.5616


## IVT

From Emberson et al. 2014 (page 5, penultimate paragraph before Discussion):

> ![](./images/data_sources/emberson-et-al-2014_text_excess-deaths.png)

The 2.7% and 0.4% values given in the text are the proportion of fatal haemorrhages out of the population:

In [19]:
prop_fatalICH_IVT = (91.0 / 3391.0)
prop_fatalICH_noIVT = (13.0 / 3365.0)
prop_fatalICH_IVT_excess = prop_fatalICH_IVT - prop_fatalICH_noIVT

print(f'% of deaths, given IVT:    {100.0 * prop_fatalICH_IVT:.4f}')
print(f'% of deaths, no treatment: {100.0 * prop_fatalICH_noIVT:.4f}')
print(f'% of excess deaths:        {100.0 * prop_fatalICH_IVT_excess:.4f}')

% of deaths, given IVT:    2.6836
% of deaths, no treatment: 0.3863
% of excess deaths:        2.2972


## Our new calculations:

Done stuff to turn IVT into nLVO and LVO separately

[here](https://github.com/samuel-book/stroke_outcome/blob/main/mRS_datasets_full.ipynb). 

---

# Recanalisation

From Hui et al. 2020 (bottom of page 2031, final paragraph of "Secondary Outcome: Recanalisation" section):

> ![](./images/data_sources/hui-et-al-2020_text_recanalisation-perc.png)

---

## References

de la Ossa Herrero N, Carrera D, Gorchs M, Querol M, Millán M, Gomis M, et al. Design and Validation of a Prehospital Stroke Scale to Predict Large Arterial Occlusion The Rapid Arterial Occlusion Evaluation Scale. Stroke; a journal of cerebral circulation. 2013 Nov 26;45. 

Emberson J, Lees KR, Lyden P, et al. _Effect of treatment delay, age, and stroke severity on the effects of intravenous thrombolysis with alteplase for acute ischaemic stroke: A meta-analysis of individual patient data from randomised trials._ The Lancet 2014;384:1929–35. doi:10.1016/S0140-6736(14)60584-5

Fransen, P., Berkhemer, O., Lingsma, H. et al. Time to Reperfusion and Treatment Effect for Acute Ischemic Stroke: A Randomized Clinical Trial. JAMA Neurol. 2016 Feb 1;73(2):190–6. DOI: 10.1001/jamaneurol.2015.3886

Goyal M, Menon BK, van Zwam WH, et al. _Endovascular thrombectomy after large-vessel ischaemic stroke: a meta-analysis of individual patient data from five randomised trials._ The Lancet 2016;387:1723-1731. doi:10.1016/S0140-6736(16)00163-X

Hui W, Wu C, Zhao W, Sun H, Hao J, Liang H, et al. Efficacy and Safety of Recanalization Therapy for Acute Ischemic Stroke With Large Vessel Occlusion. Stroke. 2020 Jul;51(7):2026–35. 

IST-3 collaborative group, Sandercock P, Wardlaw JM, Lindley RI, Dennis M, Cohen G, et al. The benefits and harms of intravenous thrombolysis with recombinant tissue plasminogen activator within 6 h of acute ischaemic stroke (the third international stroke trial [IST-3]): a randomised controlled trial. Lancet. 2012 379:2352-63.

Lees KR, Bluhmki E, von Kummer R, et al. _Time to treatment with intravenous alteplase and outcome in stroke: an updated pooled analysis of ECASS, ATLANTIS, NINDS, and EPITHET trials_. The Lancet 2010;375:1695-703. doi:10.1016/S0140-6736(10)60491-6

McMeekin P, White P, James MA, Price CI, Flynn D, Ford GA. Estimating the number of UK stroke patients eligible for endovascular thrombectomy. European Stroke Journal. 2017;2:319–26. 

SAMueL-1 data on mRS before stroke (DOI: 10.5281/zenodo.6896710): https://samuel-book.github.io/samuel-1/descriptive_stats/08_prestroke_mrs.html