# Descriptive analysis of SSNAP Extract Version 2

## Plain English Summary

tbc

## Aims

* Restrict to records from 2017 to 2019 (inclusive), with minimum of 300 admissions and 10 thrombolysis patients per stroke team

## Observations

tbc

## Import libraries

In [2]:
import pandas as pd

## Set filenames

In [3]:
notebook = '01'
data_file = './../output/reformatted_data.csv'

## Load data

In [4]:
raw_data = pd.read_csv(data_file)

## Restrict data

Restrict to records from 2017, 2018 and 2019.

In [40]:
print('Number of records per year:')
print(raw_data.year.value_counts().sort_index().to_string())
print('Total: {0}'.format(len(raw_data.index)))

Number of records per year:
2016    56510
2017    58983
2018    58549
2019    60413
2020    59301
2021    66625
Total: 360381


In [42]:
raw_data_restrict = raw_data[raw_data['year'].isin([2017, 2018, 2019])]
print('New total number of records: {0}'.format(len(raw_data_restrict.index)))

New total number of records: 177945


Restrict to stroke teams with a minimum of 300 admissions and 10 thrombolysis patients across that three year period.

In [25]:
keep = []
discard = 0

# Group dataframe by stroke team
groups = raw_data_restrict.groupby('stroke team')

# Loop through name (each stroke team) and group_df (relevant rows from data)
for name, group_df in groups:
    # Skip if admissions less than 300 or thrombolysis patients less than 10
    admissions = len(group_df.index)
    thrombolysis_received = group_df['thrombolysis'] == 1
    if (admissions < 300) or (thrombolysis_received.sum() < 10):
        discard += 1
        continue
    else:
        keep.append(group_df)

# Concatenate output
data = pd.concat(keep)

# Number of stroke teams kept v.s. removed
print('Number of stroke teams remaining in dataset: {0}'.format(len(keep)))
print('Number of stroke teams removed from dataset: {0}'.format(discard))


Number of stroke teams remaining in dataset: 114
Number of stroke teams removed from dataset: 4
