## Identification of patients admitted to hospital due to COVID-19 using emergence care attendance data

### Background
There is an urgent need to understand which patients are being admitted to hospital with COVID-19.  SUS-APCS data is the 'gold standard' for COVID-19 hospital admission but is only made available once the patient is discharged from hospital which can be many weeks or months after their admission.  Data for ongoing hospital spells at the time of the SUS-APCS extract is therefore not available, creating an ascertainment bias against longer spells and more recent spells. 

Hospital admission is a crucial outcome for vaccine evaluation.  The absence of rapidly available hospital admissions data means we cannot rapidly evaluate vaccine effectiveness with respect to reducing hospital admission, compare different vaccines' impact on hospital admission or identify changes in vaccine effectives on admission over time.

A large proportion of hospital admission comes through A&E attendance. Emergency admission data through ECDS is much more rapidly available than SUS-APCS.  We therefore set out to validate if those patients being admitted to hospital due to COVID-19 could be identified earlier using data made available through SUS-APCS.

### Methods

Working on behalf of NHS England we used routine clinical data from 24 million patients to conduct a retrospective cohort study of comprehensive electronic health record data in NHS England, using the OpenSAFELY-TPP platform which covers approximately 40% of the general population in England.  Using data between 2020-09-01 and 2021-01-01, we selected a study population that included all those that were registered with a GP practice at the start of the study period, were aged between 18-110 at the start of the study period, had not died before the start of the study period and those who had either attended emergency care or had been admitted to hospital at any point during the study period.  

Patients that were admitted to hospital with COVID-19 as their primary diagnosis were identified using a COVID-19 identification codelist made available at [OpenSAFELY Codelists here](https://codelists.opensafely.org/codelist/opensafely/covid-identification/2020-06-03/).  Patients that attended emergency care with COVID-19 were identified using the following SNOMED diagnosis code: 1240751000000100 and their discharge destination is extracted.  Patients attending emergency care with respiratory-related diagnosis codes were identified using the codelist available here ([upper resp infectio, lower resp infection, sars, pneumonia, resp failure]).Whether patients attending emergency care had had a positive COVID-19 test in the period inclusive of the 2 weeks prior to attendance and 1 weeks after attendance was also extracted, as was whether patients attending emergency care had had COVID-19 confirmed in primary care in the period 2 weeks prior to admission using the following [codelists]().

We first assessed how many patients hospitalised for any cause could be identified in those attending emergency care for any reason.  We then explore how many patients hospitalised for any cause could be identified in those attending emergency care who were discharged to either the ward or emergency short stay ward.  

We then look at identifying those admitted to hospital for COVID-19 in the emergency care data.  In the emergency care data, patients are defined as being admitted to hospital for COVID-19 if they attend emergency care and they have either had a recent positive COVID-19 test, have had recently confirmed COVID-19 in primary care, or are recorded as having a COVID-19 diagnosis code at attendance.  Using this classification we provide a contingency matrix and measures of measures of the predictive ability of identifying COVID-19 hospitalisations using emergency care data.  Variable importance of this classification is then assessed using Matthew's Correlation Coefficient (MCC).

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib.colors import ListedColormap
from IPython.display import Markdown as md
from IPython.display import HTML, display
from collections import Counter
from scipy.stats import chi2_contingency
from scipy.stats import chi2
import math

%matplotlib inline

pd.options.display.float_format = '{:.0f}'.format

In [2]:
df = pd.read_csv('../output/input.csv')
num_patients = len(df['patient_id'].unique())
num_patients_hosp = len(df[df['hospital_admission'].notna()]['patient_id'].unique())
num_patients_hosp_prim_covid = len(df[df['primary_covid_hospital_admission'].notna()]['patient_id'].unique())
num_patients_hosp_covid = len(df[df['covid_hospital_admission'].notna()]['patient_id'].unique())
num_patients_hosp_positive_cov_test = len(df[(df['hospital_admission'].notna()) & (df['positive_covid_test_before_hospital_admission']==1)]['patient_id'].unique())
num_patients_hosp_positive_cov_pc = len(df[(df['hospital_admission'].notna()) & (df['covid_primary_care_before_hospital_admission']==1)]['patient_id'].unique())


num_patients_cov_hosp_attended_ae_any = len(df[(df['primary_covid_hospital_admission'].notna()) & (df['ae_attendance_any']==1)]['patient_id'].unique())
num_patients_cov_hosp_attended_ae = len(df[(df['primary_covid_hospital_admission'].notna()) & (df['ae_attendance']==1)]['patient_id'].unique())
num_patients_cov_hosp_attended_ae_covid = len(df[(df['primary_covid_hospital_admission'].notna()) & ((df['ae_attendance_any']==1)&(df['ae_attendance_covid_status']==1))]['patient_id'].unique())
num_patients_cov_hosp_attended_ae_pos_test = len(df[(df['primary_covid_hospital_admission'].notna()) & ((df['ae_attendance_any']==1)&(df['positive_covid_test_before_ae_attendance']==1))]['patient_id'].unique())
num_patients_cov_hosp_attended_ae_pos_test_month = len(df[(df['primary_covid_hospital_admission'].notna()) & ((df['ae_attendance_any']==1)&(df['positive_covid_test_month_before_ae_attendance']==1))]['patient_id'].unique())
num_patients_cov_hosp_attended_ae_pos_pc = len(df[(df['primary_covid_hospital_admission'].notna()) & ((df['ae_attendance_any']==1)&(df['covid_primary_care_before_ae_attendance']==1))]['patient_id'].unique())
num_patients_cov_hosp_attended_ae_resp = len(df[(df['primary_covid_hospital_admission'].notna()) & ((df['ae_attendance_any']==1)&(df['ae_attendance_respiratory_status']==1))]['patient_id'].unique())


num_patients_ae = len(df[df['ae_attendance_any']==1]['patient_id'].unique())
num_patients_ae_with_discharge_destination = len(df[(df['ae_attendance_any']==1) & df['discharge_destination'].notna()]['patient_id'].unique())
num_patients_ae_hosp_discharge = len(df[df['ae_attendance']==1]['patient_id'].unique())
num_patients_ae_pos_covid = len(df[(df['ae_attendance']==1) & (df['ae_attendance_covid_status']==1)]['patient_id'].unique())
num_patients_ae_pos_covid_test = len(df[(df['ae_attendance']==1) & (df['positive_covid_test_before_ae_attendance']==1)]['patient_id'].unique())
num_patients_ae_pos_covid_pc = len(df[(df['ae_attendance']==1) & (df['covid_primary_care_before_ae_attendance']==1)]['patient_id'].unique())

### Results

In [3]:
md(f'Between 2020-09-01 and 2021-09-01 {num_patients} people out of 24 million in our dataset either attended emergency care ({num_patients_ae}) or were admitted to hospital ({num_patients_hosp}). In those admitted to hospital, {num_patients_hosp_covid} ({(num_patients_hosp_covid/num_patients_hosp)*100}%) were admitted with a COVID-19 diagnosis code.  {num_patients_hosp_prim_covid} ({(num_patients_hosp_prim_covid/num_patients_hosp)*100}) were admitted with a COVID-19 as their primary diagnosis code. In those attending emergency care, {num_patients_ae_with_discharge_destination} ({(num_patients_ae_with_discharge_destination/num_patients_ae)*100}%) had a recorded discharge destination.  A breakdown of discharge destination is shown below.')


Between 2020-09-01 and 2021-09-01 10000 people out of 24 million in our dataset either attended emergency care (4000) or were admitted to hospital (3000). In those admitted to hospital, 3000 (100.0%) were admitted with a COVID-19 diagnosis code.  3000 (100.0) were admitted with a COVID-19 as their primary diagnosis code. In those attending emergency care, 1614 (40.35%) had a recorded discharge destination.  A breakdown of discharge destination is shown below.

In [4]:
df['discharge_destination'] = df['discharge_destination'].astype('category')
not_null_dict = (Counter(df['discharge_destination'].notnull()))
destination_dict = (Counter(df[df['discharge_destination'].notnull()]['discharge_destination']))
missing_number = not_null_dict[False]
destination_dict['missing'] = missing_number

discharge_dict = {1066341000000100:"Ambulatory Emergency Care", 19712007: "Patient Transfer", 183919006: "Urgent admission to hospice", 1066361000000104: "High dependency unit", 305398007: "Mortuary", 1066381000000108: "Special baby care unit", 1066331000000109: "Emergency department short stay ward", 306705005: "Police custody", 306706006:"Ward", 306689006: "Home", 306694006: "Nursing Home", 306691003: "Residential Home", 1066351000000102: "Hospital at Home", 1066401000000108: "Neonatal ICU", 1066371000000106: "Coronary Care Unit", 50861005: "Legal Custody", 1066391000000105: "ICU", "missing": "missing"}

data = []
for key, value in destination_dict.items():
    row = [discharge_dict[key], value]
    data.append(row)

discharge_destination_df = pd.DataFrame(data, columns=["Discharge Destination", "%"])
discharge_destination_df



Unnamed: 0,Discharge Destination,%
0,Ward,2000
1,ICU,2000
2,missing,6000


In [5]:
md(f'In those who attended emergency care, {num_patients_ae_pos_covid} ({(num_patients_ae_pos_covid/num_patients_ae)*100:.2f}%) patients attended with COVID diagnosis code, {num_patients_ae_pos_covid_test} ({(num_patients_ae_pos_covid_test/num_patients_ae)*100:.2f}%) patients had had a positive COVID-19 in the 2 weeks prior to attendance and {num_patients_ae_pos_covid_pc} ({(num_patients_ae_pos_covid_pc/num_patients_ae)*100:.2f}%) patients had had COVID-19 confirmed in primary care in the weeks prior to attendance')



In those who attended emergency care, 3581 (89.53%) patients attended with COVID diagnosis code, 1989 (49.73%) patients had had a positive COVID-19 in the 2 weeks prior to attendance and 2023 (50.58%) patients had had COVID-19 confirmed in primary care in the weeks prior to attendance

In [6]:
md(f'In patients who were hospitalised with a primary diagnosis of COVID-19, {num_patients_cov_hosp_attended_ae_any} ({(num_patients_cov_hosp_attended_ae_any/num_patients_hosp_prim_covid)*100:.2f}%) attended emergency care.  Of these people, {num_patients_cov_hosp_attended_ae_covid} ({(num_patients_cov_hosp_attended_ae_covid/num_patients_hosp_prim_covid)*100:.2f}%) had a COVID-19 diagnosis code on attendance.  {num_patients_cov_hosp_attended_ae_resp} ({(num_patients_cov_hosp_attended_ae_resp/num_patients_hosp_prim_covid)*100:.2f}%) attended emergency care with respiratory related diagnosis codes.{num_patients_cov_hosp_attended_ae_pos_test} ({(num_patients_cov_hosp_attended_ae_pos_test/num_patients_hosp_prim_covid)*100:.2f}%) had had a positive COVID-19 test in the 2 weeks prior to attendance, and {num_patients_cov_hosp_attended_ae_pos_test_month} ({(num_patients_cov_hosp_attended_ae_pos_test_month/num_patients_hosp_prim_covid)*100:.2f}%) had had a positive test within the month prior to attendance.  {num_patients_cov_hosp_attended_ae_pos_pc} ({(num_patients_cov_hosp_attended_ae_pos_pc/num_patients_hosp_prim_covid)*100:.2f}%) of these patients had had COVID-19 confirmed in primary care in the 2 weeks prior to emergency attendance.')


In patients who were hospitalised with a primary diagnosis of COVID-19, 1177 (39.23%) attended emergency care.  Of these people, 1061 (35.37%) had a COVID-19 diagnosis code on attendance.  1040 (34.67%) attended emergency care with respiratory related diagnosis codes.583 (19.43%) had had a positive COVID-19 test in the 2 weeks prior to attendance, and 587 (19.57%) had had a positive test within the month prior to attendance.  586 (19.53%) of these patients had had COVID-19 confirmed in primary care in the 2 weeks prior to emergency attendance.

In [7]:

display(HTML(f'<table><tr><th>Group</th><th>Total</th><th>%</th></tr><tr><th>Hospital Admission</th></tr><tr><td>Admitted</td><td>{num_patients_hosp}</td><td>{(num_patients_hosp/num_patients_hosp)*100}</td></tr><tr><td>Admitted with primary COVID-19</td><td>{num_patients_hosp_prim_covid}</td><td>{(num_patients_hosp_prim_covid/num_patients_hosp)*100}</td></tr><tr><td>Admitted with COVID-19</td><td>{num_patients_hosp_covid}</td><td>{(num_patients_hosp_covid/num_patients_hosp)*100}</td></tr><tr><th>Emergence Care Attendance</th></tr><tr><td>Attended</td><td>{num_patients_ae}</td><td>{(num_patients_ae/num_patients_ae)*100}</td></tr><tr><td>Attendeded AE with discharge destination</td><td>{num_patients_ae_with_discharge_destination}</td><td>{(num_patients_ae_with_discharge_destination/num_patients_ae)*100}</td></tr><tr><td>Attended with COVID</td><td>{num_patients_ae_pos_covid}</td><td>{(num_patients_ae_pos_covid/num_patients_ae)*100}</td></tr><tr><td>Recent + COVID Test</td><td>{num_patients_ae_pos_covid_test}</td><td>{(num_patients_ae_pos_covid_test/num_patients_ae)*100}</td></tr><tr><td>Recent + COVID in Primary Care</td><td>{num_patients_ae_pos_covid_pc}</td><td>{(num_patients_ae_pos_covid_pc/num_patients_ae)*100}</td></tr><tr><th>Primary Covid Hospital Admission</th></tr><tr><td>Attended AE</td><td>{num_patients_cov_hosp_attended_ae_any}</td><td>{(num_patients_cov_hosp_attended_ae_any/num_patients_hosp_prim_covid)*100}</td></tr><tr><td>Attended AE with hospital discharge</td><td>{num_patients_cov_hosp_attended_ae}</td><td>{(num_patients_cov_hosp_attended_ae/num_patients_hosp_prim_covid)*100}</td></tr><tr><td>Attended AE COVID code</td><td>{num_patients_cov_hosp_attended_ae_covid}</td><td>{(num_patients_cov_hosp_attended_ae_covid/num_patients_hosp_prim_covid)*100}</td></tr><tr><td>Attended AE recent + test</td><td>{num_patients_cov_hosp_attended_ae_pos_test}</td><td>{(num_patients_cov_hosp_attended_ae_pos_test/num_patients_hosp_prim_covid)*100}</td></tr><tr><td>Attended AE + Covid Month</td><td>{num_patients_cov_hosp_attended_ae_pos_test_month}</td><td>{(num_patients_cov_hosp_attended_ae_pos_test_month/num_patients_hosp_prim_covid)*100}</td></tr><tr><td>Attended AE recent + PC</td><td>{num_patients_cov_hosp_attended_ae_pos_pc}</td><td>{(num_patients_cov_hosp_attended_ae_pos_pc/num_patients_hosp_prim_covid)*100}</td></tr><tr><td>Attended AE resp</td><td>{num_patients_cov_hosp_attended_ae_resp}</td><td>{(num_patients_cov_hosp_attended_ae_resp/num_patients_hosp_prim_covid)*100}</td></tr></table>'
            ))
    
    

Group,Total,%
Hospital Admission,Unnamed: 1_level_1,Unnamed: 2_level_1
Admitted,3000.0,100.0
Admitted with primary COVID-19,3000.0,100.0
Admitted with COVID-19,3000.0,100.0
Emergence Care Attendance,,
Attended,4000.0,100.0
Attendeded AE with discharge destination,1614.0,40.35
Attended with COVID,3581.0,89.525
Recent + COVID Test,1989.0,49.725
Recent + COVID in Primary Care,2023.0,50.575
Primary Covid Hospital Admission,,


The confusion matrix below shows the relationship between those patients attending emergency care (with hospital discharge) who are predicted to later be hospitalised due to COVID-19 (identified through ECDS) (using emergency care covid status, recent positive test and recent covid confirmed in pc) and those who actually attend hospital due to COVID-19 (SUS). 

In [None]:
positive_covid_patients_sus = df[df['primary_covid_hospital_admission'].notna()]
negative_covid_patients_sus = df[~df['primary_covid_hospital_admission'].notna()]

positive_covid_patients_ecds = df[(df['ae_attendance']==1) & ((df['ae_attendance_covid_status']==1) | (df['positive_covid_test_before_ae_attendance'] ==1) | (df['covid_primary_care_before_ae_attendance'] ==1))]
negative_covid_patients_ecds = df[(df['ae_attendance']==0) | ((df['ae_attendance']==1) & ((df['ae_attendance_covid_status']==0) & (df['positive_covid_test_before_ae_attendance'] ==0) & (df['covid_primary_care_before_ae_attendance'] ==0)))]


sus_patients_positive = set(list(positive_covid_patients_sus['patient_id']))
ecds_patients_positive = set(list(positive_covid_patients_ecds['patient_id']))

sus_patients_negative = set(list(negative_covid_patients_sus['patient_id']))
ecds_patients_negative = set(list(negative_covid_patients_ecds['patient_id']))


sus_pos_ecds_pos = len(list(set(sus_patients_positive) & set(ecds_patients_positive)))
sus_pos_ecds_neg = len(list(set(sus_patients_positive) & set(ecds_patients_negative)))
sus_neg_ecds_pos = len(list(set(sus_patients_negative) & set(ecds_patients_positive)))
sus_neg_ecds_neg = len(list(set(sus_patients_negative) & set(ecds_patients_negative)))

In [None]:
output = pd.DataFrame([[sus_pos_ecds_pos, sus_neg_ecds_pos, (sus_pos_ecds_pos + sus_neg_ecds_pos)], [sus_pos_ecds_neg, sus_neg_ecds_neg, (sus_pos_ecds_neg + sus_neg_ecds_neg)], [(sus_pos_ecds_pos+sus_pos_ecds_neg), (sus_neg_ecds_pos+sus_neg_ecds_neg), (sus_pos_ecds_pos + sus_pos_ecds_neg + sus_neg_ecds_pos + sus_neg_ecds_neg)]], columns=["SUS-positive", "SUS-negative", "Total"], index=["ECDS-positive", "ECDS-negative", "Total"])
output

In [None]:
sensitivity = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_pos_ecds_neg))*100
print(f"Sensitivity: {sensitivity:.2f}%")

specificity = (sus_neg_ecds_neg/(sus_neg_ecds_pos + sus_neg_ecds_neg))*100
print(f"Specificity : {specificity:.2f}%")

ppv = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_neg_ecds_pos))*100
print((f"PPV: {ppv:.2f}%"))

npv = (sus_neg_ecds_neg/(sus_neg_ecds_neg + sus_pos_ecds_neg))*100
print(f"NPV : {npv:.2f}%")

MCC = ((sus_pos_ecds_pos * sus_neg_ecds_neg)-(sus_neg_ecds_pos * sus_pos_ecds_neg))/math.sqrt((sus_pos_ecds_pos + sus_neg_ecds_pos)*(sus_pos_ecds_neg+sus_neg_ecds_neg)*(sus_pos_ecds_pos + sus_pos_ecds_neg)*(sus_neg_ecds_pos+sus_neg_ecds_neg))
print(f"MCC: {MCC:.3f}")

The confusion matrix below shows the relationship between those patients attending emergency care (no discharge filter) who are predicted to later be hospitalised due to COVID-19 (identified through ECDS) (using emergency care covid status, recent positive test and recent covid confirmed in pc) and those who actually attend hospital due to COVID-19 (SUS). 

In [8]:
positive_covid_patients_sus = df[df['primary_covid_hospital_admission'].notna()]
negative_covid_patients_sus = df[~df['primary_covid_hospital_admission'].notna()]

positive_covid_patients_ecds = df[(df['ae_attendance_any']==1) & ((df['ae_attendance_covid_status']==1) | (df['positive_covid_test_before_ae_attendance'] ==1) | (df['covid_primary_care_before_ae_attendance'] ==1))]
negative_covid_patients_ecds = df[(df['ae_attendance_any']==0) | ((df['ae_attendance']==1) & ((df['ae_attendance_covid_status']==0) & (df['positive_covid_test_before_ae_attendance'] ==0) & (df['covid_primary_care_before_ae_attendance'] ==0)))]


sus_patients_positive = set(list(positive_covid_patients_sus['patient_id']))
ecds_patients_positive = set(list(positive_covid_patients_ecds['patient_id']))

sus_patients_negative = set(list(negative_covid_patients_sus['patient_id']))
ecds_patients_negative = set(list(negative_covid_patients_ecds['patient_id']))


sus_pos_ecds_pos = len(list(set(sus_patients_positive) & set(ecds_patients_positive)))
sus_pos_ecds_neg = len(list(set(sus_patients_positive) & set(ecds_patients_negative)))
sus_neg_ecds_pos = len(list(set(sus_patients_negative) & set(ecds_patients_positive)))
sus_neg_ecds_neg = len(list(set(sus_patients_negative) & set(ecds_patients_negative)))

In [9]:
output = pd.DataFrame([[sus_pos_ecds_pos, sus_neg_ecds_pos, (sus_pos_ecds_pos + sus_neg_ecds_pos)], [sus_pos_ecds_neg, sus_neg_ecds_neg, (sus_pos_ecds_neg + sus_neg_ecds_neg)], [(sus_pos_ecds_pos+sus_pos_ecds_neg), (sus_neg_ecds_pos+sus_neg_ecds_neg), (sus_pos_ecds_pos + sus_pos_ecds_neg + sus_neg_ecds_pos + sus_neg_ecds_neg)]], columns=["SUS-positive", "SUS-negative", "Total"], index=["ECDS-positive", "ECDS-negative", "Total"])
output

Unnamed: 0,SUS-positive,SUS-negative,Total
ECDS-positive,1152,2746,3898
ECDS-negative,1834,4211,6045
Total,2986,6957,9943


Predictive ability shown below

In [10]:

#sensitivity - number of sus identified by ecds
#specificity - number of those not in sus who are not in ecds
#ppv - number of positive in ecds who are positive in sus
#npv - number of negative in ecds who are negative in sus

sensitivity = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_pos_ecds_neg))*100
print(f"Sensitivity: {sensitivity:.2f}%")

specificity = (sus_neg_ecds_neg/(sus_neg_ecds_pos + sus_neg_ecds_neg))*100
print(f"Specificity : {specificity:.2f}%")

ppv = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_neg_ecds_pos))*100
print((f"PPV: {ppv:.2f}%"))

npv = (sus_neg_ecds_neg/(sus_neg_ecds_neg + sus_pos_ecds_neg))*100
print(f"NPV : {npv:.2f}%")

MCC = ((sus_pos_ecds_pos * sus_neg_ecds_neg)-(sus_neg_ecds_pos * sus_pos_ecds_neg))/math.sqrt((sus_pos_ecds_pos + sus_neg_ecds_pos)*(sus_pos_ecds_neg+sus_neg_ecds_neg)*(sus_pos_ecds_pos + sus_pos_ecds_neg)*(sus_neg_ecds_pos+sus_neg_ecds_neg))
print(f"MCC: {MCC:.3f}")


Sensitivity: 38.58%
Specificity : 60.53%
PPV: 29.55%
NPV : 69.66%
MCC: -0.008


The confusion matrix below shows the relationship between those patients attending emergency care (no discharge filter) who are predicted to later be hospitalised due to COVID-19 (identified through ECDS) (using emergency care covid status, recent positive test and recent covid confirmed in pc and emergency care resp status) and those who actually attend hospital due to COVID-19 (SUS). 

In [11]:
positive_covid_patients_sus = df[df['primary_covid_hospital_admission'].notna()]
negative_covid_patients_sus = df[~df['primary_covid_hospital_admission'].notna()]

positive_covid_patients_ecds = df[(df['ae_attendance_any']==1) & ((df['ae_attendance_covid_status']==1) | (df['positive_covid_test_before_ae_attendance'] ==1) | (df['covid_primary_care_before_ae_attendance'] ==1) | (df['ae_attendance_respiratory_status'] ==1))]
negative_covid_patients_ecds = df[(df['ae_attendance_any']==0) | ((df['ae_attendance']==1) & ((df['ae_attendance_covid_status']==0) & (df['positive_covid_test_before_ae_attendance'] ==0) & (df['covid_primary_care_before_ae_attendance'] ==0) & (df['ae_attendance_respiratory_status'] ==0)))]


sus_patients_positive = set(list(positive_covid_patients_sus['patient_id']))
ecds_patients_positive = set(list(positive_covid_patients_ecds['patient_id']))

sus_patients_negative = set(list(negative_covid_patients_sus['patient_id']))
ecds_patients_negative = set(list(negative_covid_patients_ecds['patient_id']))


sus_pos_ecds_pos = len(list(set(sus_patients_positive) & set(ecds_patients_positive)))
sus_pos_ecds_neg = len(list(set(sus_patients_positive) & set(ecds_patients_negative)))
sus_neg_ecds_pos = len(list(set(sus_patients_negative) & set(ecds_patients_positive)))
sus_neg_ecds_neg = len(list(set(sus_patients_negative) & set(ecds_patients_negative)))

In [12]:
output = pd.DataFrame([[sus_pos_ecds_pos, sus_neg_ecds_pos, (sus_pos_ecds_pos + sus_neg_ecds_pos)], [sus_pos_ecds_neg, sus_neg_ecds_neg, (sus_pos_ecds_neg + sus_neg_ecds_neg)], [(sus_pos_ecds_pos+sus_pos_ecds_neg), (sus_neg_ecds_pos+sus_neg_ecds_neg), (sus_pos_ecds_pos + sus_pos_ecds_neg + sus_neg_ecds_pos + sus_neg_ecds_neg)]], columns=["SUS-positive", "SUS-negative", "Total"], index=["ECDS-positive", "ECDS-negative", "Total"])
output

Unnamed: 0,SUS-positive,SUS-negative,Total
ECDS-positive,1174,2812,3986
ECDS-negative,1826,4182,6008
Total,3000,6994,9994


In [13]:
#sensitivity - number of sus identified by ecds
#specificity - number of those not in sus who are not in ecds
#ppv - number of positive in ecds who are positive in sus
#npv - number of negative in ecds who are negative in sus

sensitivity = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_pos_ecds_neg))*100
print(f"Sensitivity: {sensitivity:.2f}%")

specificity = (sus_neg_ecds_neg/(sus_neg_ecds_pos + sus_neg_ecds_neg))*100
print(f"Specificity : {specificity:.2f}%")

ppv = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_neg_ecds_pos))*100
print((f"PPV: {ppv:.2f}%"))

npv = (sus_neg_ecds_neg/(sus_neg_ecds_neg + sus_pos_ecds_neg))*100
print(f"NPV : {npv:.2f}%")

MCC = ((sus_pos_ecds_pos * sus_neg_ecds_neg)-(sus_neg_ecds_pos * sus_pos_ecds_neg))/math.sqrt((sus_pos_ecds_pos + sus_neg_ecds_pos)*(sus_pos_ecds_neg+sus_neg_ecds_neg)*(sus_pos_ecds_pos + sus_pos_ecds_neg)*(sus_neg_ecds_pos+sus_neg_ecds_neg))
print(f"MCC: {MCC:.3f}")

Sensitivity: 39.13%
Specificity : 59.79%
PPV: 29.45%
NPV : 69.61%
MCC: -0.010


In [14]:
def get_stats_var(df, var):
    positive_ecds = df[(df['ae_attendance_any']==1) & ((df[var]==1))]
    negative_ecds = df[((df['ae_attendance_any']==0) | ((df['ae_attendance_any']==1) & (df[var]==0)))]

    ecds_patients_positive = set(list(positive_ecds['patient_id']))
    ecds_patients_negative = set(list(negative_ecds['patient_id']))

    sus_pos_ecds_pos = len(list(set(sus_patients_positive) & set(ecds_patients_positive)))
    sus_pos_ecds_neg = len(list(set(sus_patients_positive) & set(ecds_patients_negative)))
    sus_neg_ecds_pos = len(list(set(sus_patients_negative) & set(ecds_patients_positive)))
    sus_neg_ecds_neg = len(list(set(sus_patients_negative) & set(ecds_patients_negative)))

    output=pd.DataFrame([[sus_pos_ecds_pos, sus_neg_ecds_pos, (sus_pos_ecds_pos + sus_neg_ecds_pos)], [sus_pos_ecds_neg, sus_neg_ecds_neg, (sus_pos_ecds_neg + sus_neg_ecds_neg)], [(sus_pos_ecds_pos+sus_pos_ecds_neg), (sus_neg_ecds_pos+sus_neg_ecds_neg), (sus_pos_ecds_pos + sus_pos_ecds_neg + sus_neg_ecds_pos + sus_neg_ecds_neg)]], columns=["SUS-positive", "SUS-negative", "Total"], index=["AE Covid +", "AE Covid -", "Total"])
    
    sensitivity = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_pos_ecds_neg))*100
    specificity = (sus_neg_ecds_neg/(sus_neg_ecds_pos + sus_neg_ecds_neg))*100
    ppv = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_neg_ecds_pos))*100
    npv = (sus_neg_ecds_neg/(sus_neg_ecds_neg + sus_pos_ecds_neg))*100
    MCC = ((sus_pos_ecds_pos * sus_neg_ecds_neg)-(sus_neg_ecds_pos * sus_pos_ecds_neg))/math.sqrt((sus_pos_ecds_pos + sus_neg_ecds_pos)*(sus_pos_ecds_neg+sus_neg_ecds_neg)*(sus_pos_ecds_pos + sus_pos_ecds_neg)*(sus_neg_ecds_pos+sus_neg_ecds_neg))

    
    return output, sensitivity, specificity, ppv, npv, MCC

In [15]:
cm_covid_status, sensitivity_covid_status, specificity_covid_status, ppv_covid_status, npv_covid_status, mcc_covid_status = get_stats_var(df, 'ae_attendance_covid_status')
cm_respiratory_status, sensitivity_respiratory_status, specificity_respiratory_status, ppv_respiratory_status, npv_respiratory_status, mcc_respiratory_status = get_stats_var(df, 'ae_attendance_respiratory_status')
cm_test, sensitivity_test, specificity_test, ppv_test, npv_test, mcc_test = get_stats_var(df, 'positive_covid_test_before_ae_attendance')
cm_test_month, sensitivity_test_month, specificity_test_month, ppv_test_month, npv_test_month, mcc_test_month = get_stats_var(df, 'positive_covid_test_month_before_ae_attendance')
cm_covid_pc, sensitivity_covid_pc, specificity_covid_pc, ppv_covid_pc, npv_covid_pc, mcc_covid_pc = get_stats_var(df, 'covid_primary_care_before_ae_attendance')


The performance of individual variables when combined with emergency attendance (no discharge filter) in identifying COVID hospitalisations is shown below.

In [22]:
pd.options.display.float_format = '{:.2f}'.format

data = [
    ['Covid Status',sensitivity_covid_status, specificity_covid_status, ppv_covid_status, npv_covid_status, mcc_covid_status],
    ['Respiratory Status', sensitivity_respiratory_status, specificity_respiratory_status, ppv_respiratory_status, npv_respiratory_status, mcc_respiratory_status],
    ['Positive Covid Test Last 2 weeks',sensitivity_test, specificity_test, ppv_test, npv_test, mcc_test],
    ['Positive Covid Test Last Month', sensitivity_test_month, specificity_test_month, ppv_test_month, npv_test_month, mcc_test_month],
    ['Positive Covid in Primary Care',sensitivity_covid_pc, specificity_covid_pc, ppv_covid_pc, npv_covid_pc, mcc_covid_pc],
]

pd.DataFrame(data, columns=["Variable", "Sensitivity (%)", "Specificity (%)", "PPV (%)", "NPV (%)", "MCC"])


Unnamed: 0,Variable,Sensitivity (%),Specificity (%),PPV (%),NPV (%),MCC
0,Covid Status,35.37,63.73,29.47,69.7,-0.01
1,Respiratory Status,34.67,63.74,29.07,69.48,-0.02
2,Positive Covid Test Last 2 weeks,19.43,80.34,29.76,69.94,-0.0
3,Positive Covid Test Last Month,19.57,79.81,29.35,69.84,-0.01
4,Positive Covid in Primary Care,19.53,79.67,29.17,69.79,-0.01


In [23]:
pd.options.display.float_format = '{:.0f}'.format

Below are results in those for which a emergency care discharge destination is recorded.

In [17]:
mask = df['discharge_destination'].notna()

df_subset = df[df['discharge_destination'].notna()]


positive_covid_patients_sus = df_subset[df_subset['primary_covid_hospital_admission'].notna()]
negative_covid_patients_sus = df_subset[~df_subset['primary_covid_hospital_admission'].notna()]

positive_covid_patients_ecds = df_subset[(df['ae_attendance']==1) & ((df['ae_attendance_covid_status']==1) | (df['positive_covid_test_before_ae_attendance'] ==1) | (df['covid_primary_care_before_ae_attendance'] ==1))]
negative_covid_patients_ecds = df_subset[(df['ae_attendance']==0) | ((df['ae_attendance']==1) & ((df['ae_attendance_covid_status']==0) & (df['positive_covid_test_before_ae_attendance'] ==0) & (df['covid_primary_care_before_ae_attendance'] ==0)))]


sus_patients_positive = set(list(positive_covid_patients_sus['patient_id']))
ecds_patients_positive = set(list(positive_covid_patients_ecds['patient_id']))

sus_patients_negative = set(list(negative_covid_patients_sus['patient_id']))
ecds_patients_negative = set(list(negative_covid_patients_ecds['patient_id']))


sus_pos_ecds_pos = len(list(set(sus_patients_positive) & set(ecds_patients_positive)))
sus_pos_ecds_neg = len(list(set(sus_patients_positive) & set(ecds_patients_negative)))
sus_neg_ecds_pos = len(list(set(sus_patients_negative) & set(ecds_patients_positive)))
sus_neg_ecds_neg = len(list(set(sus_patients_negative) & set(ecds_patients_negative)))

  positive_covid_patients_ecds = df_subset[(df['ae_attendance']==1) & ((df['ae_attendance_covid_status']==1) | (df['positive_covid_test_before_ae_attendance'] ==1) | (df['covid_primary_care_before_ae_attendance'] ==1))]
  negative_covid_patients_ecds = df_subset[(df['ae_attendance']==0) | ((df['ae_attendance']==1) & ((df['ae_attendance_covid_status']==0) & (df['positive_covid_test_before_ae_attendance'] ==0) & (df['covid_primary_care_before_ae_attendance'] ==0)))]


In [18]:
output = pd.DataFrame([[sus_pos_ecds_pos, sus_neg_ecds_pos, (sus_pos_ecds_pos + sus_neg_ecds_pos)], [sus_pos_ecds_neg, sus_neg_ecds_neg, (sus_pos_ecds_neg + sus_neg_ecds_neg)], [(sus_pos_ecds_pos+sus_pos_ecds_neg), (sus_neg_ecds_pos+sus_neg_ecds_neg), (sus_pos_ecds_pos + sus_pos_ecds_neg + sus_neg_ecds_pos + sus_neg_ecds_neg)]], columns=["SUS-positive", "SUS-negative", "Total"], index=["ECDS-positive", "ECDS-negative", "Total"])
output

Unnamed: 0,SUS-positive,SUS-negative,Total
ECDS-positive,482,1080,1562
ECDS-negative,709,1729,2438
Total,1191,2809,4000


In [19]:
sensitivity = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_pos_ecds_neg))*100
print(f"Sensitivity: {sensitivity:.2f}%")

specificity = (sus_neg_ecds_neg/(sus_neg_ecds_pos + sus_neg_ecds_neg))*100
print(f"Specificity : {specificity:.2f}%")

ppv = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_neg_ecds_pos))*100
print((f"PPV: {ppv:.2f}%"))

npv = (sus_neg_ecds_neg/(sus_neg_ecds_neg + sus_pos_ecds_neg))*100
print(f"NPV : {npv:.2f}%")

MCC = ((sus_pos_ecds_pos * sus_neg_ecds_neg)-(sus_neg_ecds_pos * sus_pos_ecds_neg))/math.sqrt((sus_pos_ecds_pos + sus_neg_ecds_pos)*(sus_pos_ecds_neg+sus_neg_ecds_neg)*(sus_pos_ecds_pos + sus_pos_ecds_neg)*(sus_neg_ecds_pos+sus_neg_ecds_neg))
print(f"MCC: {MCC:.3f}")

Sensitivity: 40.47%
Specificity : 61.55%
PPV: 30.86%
NPV : 70.92%
MCC: 0.019


## To Do

* Look at performance in subset who do have discharge dsetination.
* Think about dates
* What discharge destination to include
* Resp codes
* Discharge destination in those with covid hosp admission