# Descriptive analysis of SSNAP Extract Version 2

## Plain English Summary

tbc

## Aims

* Restrict to records from 2017 to 2019 (inclusive) and stroke teams with an average of at least 100 stroke admissions and 3 thrombolysis patients per year.

## Observations

tbc

## Import libraries

In [1]:
# Import packages and functions
import numpy as np
import os
import pandas as pd
from dataclasses import dataclass

# Set the maximum number of columns and rows to 100
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 100)

## Set paths and filenames

In [2]:
@dataclass(frozen=True)
class Paths:
    '''Singleton object for storing paths to data and files.'''

    data_path = './../output/'
    data_filename = 'reformatted_data.csv'
    notebook = '01'
    
paths = Paths()

## Load and restrict data

In [3]:
raw_data = pd.read_csv(os.path.join(paths.data_path, paths.data_filename))

Restrict to records from 2017, 2018 and 2019.

In [4]:
print('Number of records per year:')
print(raw_data.year.value_counts().sort_index().to_string())
print('Total: {0}'.format(len(raw_data.index)))

Number of records per year:
2016    56510
2017    58983
2018    58549
2019    60413
2020    59301
2021    66625
Total: 360381


In [5]:
raw_data_restrict = raw_data[raw_data['year'].isin([2017, 2018, 2019])]
print('New total number of records: {0}'.format(len(raw_data_restrict.index)))

New total number of records: 177945


Restrict to stroke teams with at least an average of 100 stroke admissions and 3 thrombolysis patients per year - hence, removing where less than 300 admissions or 9 patients.

In [6]:
keep = []
discard = 0

# Group dataframe by stroke team
groups = raw_data_restrict.groupby('stroke team')

# Loop through name (each stroke team) and group_df (relevant rows from data)
for name, group_df in groups:
    # Skip if admissions less than 300 or thrombolysis patients less than 9
    admissions = len(group_df.index)
    thrombolysis_received = group_df['thrombolysis'] == 1
    if (admissions < 300) or (thrombolysis_received.sum() < 9):
        discard += 1
        continue
    else:
        keep.append(group_df)

# Concatenate output
data = pd.concat(keep)

# Number of stroke teams kept v.s. removed
print('Number of stroke teams remaining in dataset: {0}'.format(len(keep)))
print('Number of stroke teams removed from dataset: {0}'.format(discard))


Number of stroke teams remaining in dataset: 114
Number of stroke teams removed from dataset: 4


## Exploratory (to tidy)

Initially copied from SAMueL 1

In [7]:
print('Total admissions: {0}'.format(len(data.index)))
print('Average yearly admissions: {0}'.format(round(len(data.index)/3)))

Total admissions: 177631
Average yearly admissions: 59210


In [8]:
# Stroke types
data['infarction'].map({1: 'Infarction',
                        0: 'Primary Intracerebral Haemorrage'}).value_counts(normalize=True)

Infarction                          0.874662
Primary Intracerebral Haemorrage    0.125338
Name: infarction, dtype: float64

In [9]:
# Thrombolysis use rates for in-hospital and out-of-hospital onset
# Can't do as S1OnsetInHospital not in cleaned dataset
# Also therefore can't restrict to out-of-hospital only

In [10]:
# Analyse by team - group by team, record:
# Team, admission numbers, thrombolysis rate, rank before stroke, NIHSS on arrival, proportion with known onset time (remove rest), proportion with onset <4 (remove rest)
# rankin again, proportion 80+, onset to arrival, scan within 4 hours, arrival to scan, thrombolysis given, scan to needle, arrival to needle, onset to needle, proportion thrombolysis after 180 or 270

In [11]:
# Based on analysis by team, summarise for whole population (average of each hospital)

In [12]:
# Those average summary results for under 80 v.s. over 80

In [13]:
# Figure with thrombolyysis use (all and < 4 hours onset)
# Figure proportion with known onset
# Mean arrival to scan time for patients
# Mean scan to needle time
# Mean arrival to needle time

In [14]:
# Stroke severity distribution

In [15]:
# Onset to arrival, proportion known onset, severity

In [27]:
# Restrict to patients who received thrombolysis
thrombolysed = data[data['thrombolysis'] == 1].copy()

# Proportion where onset is known
# throm_arrival['onset known'] = thrombolysed['onset known'].value_counts(normalize = True)[1]

# Arrival within 4 or 6 hours
thrombolysed['arrive_within_4'] = np.where(thrombolysed['onset-to-arrival time'] <= 240, 1, 0)
thrombolysed['arrive_within_6'] = np.where(thrombolysed['onset-to-arrival time'] <= 360, 1, 0)

# NIHSS 6+ or 11+
thrombolysed['nihss_6_plus'] = np.where(thrombolysed['stroke severity'] >= 6, 1, 0)
thrombolysed['nihss_11_plus'] = np.where(thrombolysed['stroke severity'] >= 11, 1, 0)

# Find results overall, by arrival time group, and by NIHSS group
thrombolysed.groupby('arrive_within_4').mean()


  thrombolysed.groupby('arrive_within_4').mean()


Unnamed: 0_level_0,id,age,male,infarction,onset-to-arrival time,onset known,precise onset known,onset during sleep,arrive by ambulance,call-to-ambulance-arrival time,ambulance on-scene time,ambulance travel-to-hospital time,ambulance wait time at hospital,month,year,arrival time 3 hour period,arrival-to-scan time,thrombolysis,scan-to-thrombolysis time,thrombectomy,arrival-to-thrombectomy time,congestive heart failure,hypertension,atrial fibrillation,diabetes,prior stroke/TIA,antiplatelet for atrial fibrillation,use of AF anticoagulants,vit k anticoagulant for atrial fibrillation,DOAC anticoagulant for atrial fibrillation,heparin anticoagulant for atrial fibrillation,prior disability,stroke severity,nihss complete,NihssArrivalLoc,NihssArrivalLocQuestions,NihssArrivalLocCommands,NihssArrivalBestGaze,NihssArrivalVisual,NihssArrivalFacialPalsy,NihssArrivalMotorArmLeft,NihssArrivalMotorArmRight,NihssArrivalMotorLegLeft,NihssArrivalMotorLegRight,NihssArrivalLimbAtaxia,NihssArrivalSensory,NihssArrivalBestLanguage,NihssArrivalDysarthria,NihssArrivalExtinctionInattention,death,discharge disability,6 month disability,ThrombolysisNoButHaemorrhagic,ThrombolysisNoButTimeWindow,ThrombolysisNoButComorbidity,ThrombolysisNoButMedication,ThrombolysisNoButRefusal,ThrombolysisNoButAge,ThrombolysisNoButImproving,ThrombolysisNoButTooMildSevere,ThrombolysisNoButTimeUnknownWakeUp,ThrombolysisNoButOtherMedical,arrive_within_6,nihss_6_plus,nihss_11_plus
arrive_within_4,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1
0,154401.512658,71.845992,0.518987,1.0,1060.462025,0.470464,0.234177,0.170886,0.932489,27.9,29.472727,16.883495,10.776699,6.544304,2018.147679,11.746835,27.183544,1.0,39.883966,0.099156,104.340426,0.025316,0.523207,0.126582,0.160338,0.181435,0.018987,0.046414,0.0,0.0,0.0,0.71097,11.656118,0.976793,0.253165,0.867089,0.413502,0.449367,0.628692,1.154008,0.959916,1.078059,0.953586,1.120253,0.314346,0.607595,1.137131,0.934599,0.622363,0.170886,2.713675,1.755396,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.377637,0.729958,0.50211
1,142749.08533,72.401334,0.555386,1.0,98.490183,0.999508,0.827863,0.005069,0.927415,-246.7914,-31950.46274,-138387.873783,177953.966719,6.536342,2018.007332,12.758673,22.882585,1.0,36.574676,0.051621,183.790276,0.039909,0.51341,0.122189,0.174893,0.209586,0.030215,0.03489,0.0,0.0,0.0,0.721175,11.002362,0.969785,0.192953,0.761626,0.378869,0.42429,0.617243,1.130997,0.996506,0.967128,0.980808,0.954235,0.285567,0.645096,1.037646,0.909945,0.629693,0.137739,2.544088,1.997902,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.736775,0.432656
