# Comparison between patients who received thrombolysis versus not

## Aims

Replicate SAMueL-1 analysis [available to view here](https://samuel-book.github.io/samuel-1/descriptive_stats/05_thrombolysis_comparison.html).

Compares feature means for patients who receive thrombolysis against those who did not.

This analysis is only for patients arriving within 4 hours of known stroke onset.

## Set up

In [1]:
# Import packages and functions
from dataclasses import dataclass
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd

# Display entire dataframes
pd.set_option("display.max_rows", 999, "display.max_columns", 150)

# Linting
%load_ext pycodestyle_magic
%pycodestyle_on

In [2]:
# Set paths and filenames
@dataclass(frozen=True)
class Paths:
    '''Singleton object for storing paths to data and files.'''

    data_path = './../output/'
    data_filename = 'reformatted_data.csv'


paths = Paths()

In [3]:
# Load data
data = pd.read_csv(os.path.join(paths.data_path,
                                paths.data_filename))

## Filter dataset

Keep columns of interest to the analysis.

In [4]:
relevant_data = data.drop(
    ['id'] + [col for col in data if col.startswith('thrombolysis_no')],
    axis=1)

## Describe entire dataset

In [5]:
all = relevant_data.describe().T
all

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
age,137019.0,74.509028,13.410804,37.5,67.5,77.5,82.5,92.5
male,137019.0,0.527715,0.499233,0.0,0.0,1.0,1.0,1.0
infarction,137019.0,0.853787,0.353321,0.0,1.0,1.0,1.0,1.0
onset_to_arrival_time,137019.0,113.81604,52.959625,1.0,73.0,104.0,150.0,240.0
onset_known,137019.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0
precise_onset_known,137019.0,0.631168,0.48249,0.0,0.0,1.0,1.0,1.0
onset_during_sleep,137019.0,0.047015,0.211673,0.0,0.0,0.0,0.0,1.0
arrive_by_ambulance,137012.0,0.903855,0.294791,0.0,1.0,1.0,1.0,1.0
call_to_ambulance_arrival_time,37780.0,24.415723,25.798357,1.0,11.0,18.0,30.0,1395.0
ambulance_on_scene_time,37780.0,30.862864,14.242626,1.0,21.0,28.0,37.0,183.0


## Describe thrombolysed patients

In [6]:
ivt = relevant_data[relevant_data['thrombolysis'] == 1].describe().T
ivt

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
age,40055.0,72.374672,13.553829,37.5,62.5,72.5,82.5,92.5
male,40055.0,0.556035,0.496856,0.0,0.0,1.0,1.0,1.0
infarction,40055.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0
onset_to_arrival_time,40055.0,99.305155,46.214023,1.0,65.0,90.0,125.0,240.0
onset_known,40055.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0
precise_onset_known,40055.0,0.818475,0.385458,0.0,1.0,1.0,1.0,1.0
onset_during_sleep,40055.0,0.005992,0.077175,0.0,0.0,0.0,0.0,1.0
arrive_by_ambulance,40053.0,0.9245,0.2642,0.0,1.0,1.0,1.0,1.0
call_to_ambulance_arrival_time,11161.0,22.686677,21.390832,1.0,11.0,17.0,28.0,940.0
ambulance_on_scene_time,11161.0,27.356957,12.112554,1.0,19.0,25.0,33.0,142.0


## Describe non-thrombolysed patients

In [7]:
no_ivt = relevant_data[relevant_data['thrombolysis'] == 0].describe().T
no_ivt

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
age,96964.0,75.390712,13.251384,37.5,67.5,77.5,87.5,92.5
male,96964.0,0.516016,0.499746,0.0,0.0,1.0,1.0,1.0
infarction,96964.0,0.793387,0.404877,0.0,1.0,1.0,1.0,1.0
onset_to_arrival_time,96964.0,119.810363,54.389181,1.0,77.0,110.0,159.0,240.0
onset_known,96964.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0
precise_onset_known,96964.0,0.553793,0.4971,0.0,0.0,1.0,1.0,1.0
onset_during_sleep,96964.0,0.063962,0.244686,0.0,0.0,0.0,0.0,1.0
arrive_by_ambulance,96959.0,0.895327,0.306133,0.0,1.0,1.0,1.0,1.0
call_to_ambulance_arrival_time,26619.0,25.140689,27.404434,1.0,11.0,18.0,31.0,1395.0
ambulance_on_scene_time,26619.0,32.332845,14.801424,1.0,22.0,30.0,39.0,183.0


## Describe all three groups

In [8]:
summary = pd.DataFrame(
    {'all': all['mean'],
     'thrombolysed': ivt['mean'],
     'non_thrombolysed': no_ivt['mean']})
summary['ratio'] = summary['thrombolysed'] / summary['non_thrombolysed']
round(summary, 2)

Unnamed: 0,all,thrombolysed,non_thrombolysed,ratio
age,74.51,72.37,75.39,0.96
male,0.53,0.56,0.52,1.08
infarction,0.85,1.0,0.79,1.26
onset_to_arrival_time,113.82,99.31,119.81,0.83
onset_known,1.0,1.0,1.0,1.0
precise_onset_known,0.63,0.82,0.55,1.48
onset_during_sleep,0.05,0.01,0.06,0.09
arrive_by_ambulance,0.9,0.92,0.9,1.03
call_to_ambulance_arrival_time,24.42,22.69,25.14,0.9
ambulance_on_scene_time,30.86,27.36,32.33,0.85


## Observations

In [9]:
# Extract metrics
ivt_ischaemic = summary.loc['thrombolysis', 'thrombolysed']*100
ivt_pih = 100 - ivt_ischaemic
ivt_sleep = summary.loc['onset_during_sleep', 'thrombolysed']*100
non_sleep = summary.loc['onset_during_sleep', 'non_thrombolysed']*100
ivt_age = summary.loc['age', 'thrombolysed']
non_age = summary.loc['age', 'non_thrombolysed']
ivt_arr = summary.loc['onset_to_arrival_time', 'thrombolysed']
non_arr = summary.loc['onset_to_arrival_time', 'non_thrombolysed']
ivt_sev = summary.loc['stroke_severity', 'thrombolysed']
non_sev = summary.loc['stroke_severity', 'non_thrombolysed']
ivt_sca = summary.loc['arrival_to_scan_time', 'thrombolysed']
non_sca = summary.loc['arrival_to_scan_time', 'non_thrombolysed']
ivt_pre = summary.loc['precise_onset_known', 'thrombolysed']*100
non_pre = summary.loc['precise_onset_known', 'non_thrombolysed']*100
ivt_amb = summary.loc['arrive_by_ambulance', 'thrombolysed']*100
non_amb = summary.loc['arrive_by_ambulance', 'non_thrombolysed']*100
ivt_fib = summary.loc['atrial_fibrillation', 'thrombolysed']*100
non_fib = summary.loc['atrial_fibrillation', 'non_thrombolysed']*100
ivt_fib = summary.loc['atrial_fibrillation', 'thrombolysed']*100
non_fib = summary.loc['atrial_fibrillation', 'non_thrombolysed']*100
ivt_pri = summary.loc['prior_stroke_tia', 'thrombolysed']*100
non_pri = summary.loc['prior_stroke_tia', 'non_thrombolysed']*100
ivt_coa = summary.loc['afib_anticoagulant', 'thrombolysed']*100
non_coa = summary.loc['afib_anticoagulant', 'non_thrombolysed']*100

# Print observations
print(f'''
For patients arriving within 4 hours of known stroke onset, compared
with those who don't receive thrombolysis, patients who receive thrombolysis -
* {ivt_ischaemic:.2f}% had iscahemic stroke (and so {ivt_pih:.2f}% had PIH)
* {ivt_sleep:.2f}% had onset during sleep '''
      f'''(compared to {non_sleep:.2f}% non-thrombolysed)
* Younger mean age ({ivt_age:.2f} v.s. {non_age:.2f})
* Higher stroke severity (NIHSS score {ivt_sev:.2f} v.s. {non_sev:.2f})
* Shorter onset to arrival time ({ivt_arr:.2f}m v.s. {non_arr:.2f}m)
* Shorter mean arrival to scan time ({ivt_sca:.2f}m v.s. {non_sca:.2f}m)
* More likely precise onset time ({ivt_pre:.2f}% v.s. {non_pre:.2f}%)
* Arrive by ambulance ({ivt_amb:.2f}% v.s. {non_amb:.2f}%)
* Don't have atrial firbillation ({ivt_fib:.2f}% v.s. {non_fib:.2f}%)
* Don't have history of stroke or TIA ({ivt_pri:.2f}% v.s. {non_pri:.2f}%)
* Not on anticoagulant ({ivt_coa:.2f}% v.s. {non_coa:.2f}%)
''')


For patients arriving within 4 hours of known stroke onset, compared
with those who don't receive thrombolysis, patients who receive thrombolysis -
* 100.00% had iscahemic stroke (and so 0.00% had PIH)
* 0.60% had onset during sleep (compared to 6.40% non-thrombolysed)
* Younger mean age (72.37 v.s. 75.39)
* Higher stroke severity (NIHSS score 10.98 v.s. 8.07)
* Shorter onset to arrival time (99.31m v.s. 119.81m)
* Shorter mean arrival to scan time (23.12m v.s. 130.57m)
* More likely precise onset time (81.85% v.s. 55.38%)
* Arrive by ambulance (92.45% v.s. 89.53%)
* Don't have atrial firbillation (11.94% v.s. 23.47%)
* Don't have history of stroke or TIA (20.30% v.s. 29.27%)
* Not on anticoagulant (4.47% v.s. 23.82%)

