# Comparison of average values for patients who receive thrombolysis and those that do not

## Aims

Compare feature means for patients who receive thrombolysis and those that do not.

This analysis is for only those patients arriving within 4 hours of known stroke onset, and is on data that has been coded and, where necessary, imputed.The data used in this analysis is the data used for machine learning.

## Load and analyse data

In [1]:
# import libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Set up results DataFrame
results = pd.DataFrame()

# Display entire dataframes
pd.set_option("display.max_rows", 999, "display.max_columns", 150)

# Import data (concenate training and test set used in machine learning)
data = pd.concat([
    pd.read_csv('./../data/10k_training_test/cohort_10000_train.csv'),
    pd.read_csv('./../data/10k_training_test/cohort_10000_test.csv')])

# Add columns for scanned within 4 hours of arrival and onset
data['scan_within_4_hrs_arrival'] = data['S2BrainImagingTime_min'] <= 240
data['scan_within_4_hrs_onset'] = \
    (data['S1OnsetToArrival_min'] + data['S2BrainImagingTime_min']) <= 240
# Convert boolean to integer
data['scan_within_4_hrs_arrival'] *= 1
data['scan_within_4_hrs_onset'] *= 1

In [2]:
data.dtypes

StrokeTeam                                         object
S1AgeOnArrival                                    float64
S1OnsetToArrival_min                              float64
S2RankinBeforeStroke                                int64
Loc                                                 int64
LocQuestions                                      float64
LocCommands                                       float64
BestGaze                                          float64
Visual                                            float64
FacialPalsy                                       float64
MotorArmLeft                                      float64
MotorArmRight                                     float64
MotorLegLeft                                      float64
MotorLegRight                                     float64
LimbAtaxia                                        float64
Sensory                                           float64
BestLanguage                                      float64
Dysarthria    

## Summarise all 4 hour admissions

In [3]:
all_4hr_admissions_summary = data.describe().T
results['all'] = all_4hr_admissions_summary['mean']
all_4hr_admissions_summary

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
S1AgeOnArrival,88928.0,75.135447,13.82542,17.5,67.5,77.5,87.5,110.0
S1OnsetToArrival_min,88928.0,111.392722,52.729071,1.0,71.0,100.0,146.0,240.0
S2RankinBeforeStroke,88928.0,1.059689,1.420161,0.0,0.0,0.0,2.0,5.0
Loc,88928.0,0.275178,0.667914,0.0,0.0,0.0,0.0,3.0
LocQuestions,88928.0,0.675198,0.89233,0.0,0.0,0.0,2.0,2.0
LocCommands,88928.0,0.372133,0.719221,0.0,0.0,0.0,0.0,2.0
BestGaze,88928.0,0.354253,0.669124,0.0,0.0,0.0,0.0,2.0
Visual,88928.0,0.506016,0.844272,0.0,0.0,0.0,1.0,3.0
FacialPalsy,88928.0,0.885717,0.880693,0.0,0.0,1.0,2.0,3.0
MotorArmLeft,88928.0,0.869546,1.377068,0.0,0.0,0.0,1.0,4.0


## Summarise 4 hour admissions who receive thrombolysis

In [4]:
mask = data['S2Thrombolysis'] == 1
thrombolysis_admissions = data[mask]
thrombolysis_admissions_summary = thrombolysis_admissions.describe().T
results['thrombolysis'] = thrombolysis_admissions_summary['mean']
thrombolysis_admissions_summary

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
S1AgeOnArrival,26257.0,73.092318,13.964103,17.5,62.5,77.5,82.5,110.0
S1OnsetToArrival_min,26257.0,96.968808,45.638922,1.0,64.0,87.0,122.0,240.0
S2RankinBeforeStroke,26257.0,0.727806,1.184651,0.0,0.0,0.0,1.0,5.0
Loc,26257.0,0.20665,0.509833,0.0,0.0,0.0,0.0,3.0
LocQuestions,26257.0,0.801653,0.921942,0.0,0.0,0.0,2.0,2.0
LocCommands,26257.0,0.405263,0.727145,0.0,0.0,0.0,1.0,2.0
BestGaze,26257.0,0.467266,0.720925,0.0,0.0,0.0,1.0,2.0
Visual,26257.0,0.678829,0.899433,0.0,0.0,0.0,2.0,3.0
FacialPalsy,26257.0,1.193548,0.878565,0.0,0.0,1.0,2.0,3.0
MotorArmLeft,26257.0,1.069886,1.486363,0.0,0.0,0.0,2.0,4.0


## Summarise 4 hour admissions who do not receive thrombolysis

In [5]:
mask = data['S2Thrombolysis'] == 0
no_thrombolysis_admissions = data[mask]
no_thrombolysis_admissions_summary = no_thrombolysis_admissions.describe().T
results['no_thrombolysis'] = no_thrombolysis_admissions_summary['mean']
no_thrombolysis_admissions_summary

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
S1AgeOnArrival,62671.0,75.991447,13.676585,17.5,67.5,77.5,87.5,110.0
S1OnsetToArrival_min,62671.0,117.435848,54.303876,1.0,75.0,107.0,156.0,240.0
S2RankinBeforeStroke,62671.0,1.198736,1.486076,0.0,0.0,0.0,2.0,5.0
Loc,62671.0,0.303889,0.722028,0.0,0.0,0.0,0.0,3.0
LocQuestions,62671.0,0.622218,0.874215,0.0,0.0,0.0,2.0,2.0
LocCommands,62671.0,0.358252,0.715425,0.0,0.0,0.0,0.0,2.0
BestGaze,62671.0,0.306904,0.640291,0.0,0.0,0.0,0.0,2.0
Visual,62671.0,0.433614,0.80917,0.0,0.0,0.0,1.0,3.0
FacialPalsy,62671.0,0.756746,0.849037,0.0,0.0,1.0,1.0,3.0
MotorArmLeft,62671.0,0.785611,1.319609,0.0,0.0,0.0,1.0,4.0


## Show summary of all groups

Add ratio of yes/no thrombolysis (and sort by ratio), and save.

In [6]:
results['ratio'] = results['thrombolysis'] / results['no_thrombolysis']
results.sort_values('ratio', inplace=True)
results.to_csv('output/thrombolse_yes_no_means.csv')
results

Unnamed: 0,all,thrombolysis,no_thrombolysis,ratio
S2StrokeType_Primary Intracerebral Haemorrhage,0.148918,0.0,0.21131,0.0
S2StrokeType_missing,0.001529,0.0,0.00217,0.0
S1OnsetDateType_Stroke during sleep,0.045655,0.005218,0.062597,0.083353
AFAnticoagulentDOAC_Yes,0.031497,0.006056,0.042157,0.143643
S2BrainImagingTime_min,117.212959,22.984195,156.69158,0.146684
AFAnticoagulentHeparin_Yes,0.000742,0.00019,0.000973,0.195642
AFAnticoagulent_Yes,0.121525,0.036943,0.156963,0.235359
S1OnsetDateType_Best estimate,0.056934,0.021975,0.07158,0.307
AFAnticoagulentVitK_Yes,0.019398,0.007846,0.024238,0.323691
S1OnsetTimeType_Best estimate,0.379599,0.180104,0.463181,0.388842


## Observations

For patients arriving within 4 hours of known stroke onset, compared to those that do not receive thrombolysis, patients who receive thrombolysis:

* Have a confirmed ischaemic stroke 
* Do not have stroke onset during sleep
* Arrive outside of the hours 3am to 6am
* Are younger (mean age 73 vs 76)
* Arrive sooner (mean onset to arrival 97 vs 117 minutes)
* Have higher stroke severity (mean NIHSS 11.6 vs 8.4)
* Are scanned within 4 hours of arrival (100% vs 93%) and 4 hours of onset (99% vs 77%)
* Are more likely to have a precisely determined stroke onset time (97% vs 87%)
* Have arrived by ambulance (94% vs 91%)
* Not have atrial fibrillation (14% vs 24% having AF)
* Not have a history of TIA (21% vs 30% having had TIA)
* Not be on anticoagulant