## Identification of patients admitted to hospital due to COVID-19 using emergence care attendance data

### Background
There is an urgent need to understand which patients are being admitted to hospital with COVID-19.  SUS-APCS data is the 'gold standard' for COVID-19 hospital admission but is only made available once the patient is discharged from hospital which can be many weeks or months after their admission.  Data for ongoing hospital spells at the time of the SUS-APCS extract is therefore not available, creating an ascertainment bias against longer spells and more recent spells. 

Hospital admission is a crucial outcome for vaccine evaluation.  The absence of rapidly available hospital admissions data means we cannot rapidly evaluate vaccine effectiveness with respect to reducing hospital admission, compare different vaccines' impact on hospital admission or identify changes in vaccine effectives on admission over time.

A large proportion of hospital admission comes through A&E attendance. Emergency admission data through ECDS is much more rapidly available than SUS-APCS.  We therefore set out to validate if those patients being admitted to hospital due to COVID-19 could be identified earlier using data made available through SUS-APCS.

### Methods

Working on behalf of NHS England we used routine clinical data from 24 million patients to conduct a retrospective cohort study of comprehensive electronic health record data in NHS England, using the OpenSAFELY-TPP platform which covers approximately 40% of the general population in England.  Using data between 2020-09-01 and 2021-01-01, we selected a study population that included all those that were registered with a GP practice at the start of the study period, were aged between 18-110 at the start of the study period, had not died before the start of the study period and those who had either attended emergency care or had been admitted to hospital at any point during the study period.  

Patients that were admitted to hospital with COVID-19 as their primary diagnosis were identified using a COVID-19 identification codelist made available at [OpenSAFELY Codelists here](https://codelists.opensafely.org/codelist/opensafely/covid-identification/2020-06-03/).  Patients that attended emergency care with COVID-19 were identified using the following SNOMED diagnosis code: 1240751000000100 and their discharge destination is extracted.  Whether patients attending emergency care had had a positive COVID-19 test in the period inclusive of the 2 weeks prior to attendance and 1 weeks after attendance was also extracted, as was whether patients attending emergency care had had COVID-19 confirmed in primary care in the period 2 weeks prior to admission using the following [codelists]().

We first assessed how many patients hospitalised for any cause could be identified in those attending emergency care for any reason.  We then explore how many patients hospitalised for any cause could be identified in those attending emergency care who were discharged to either the ward or emergency short stay ward.  

We then look at identifying those admitted to hospital for COVID-19 in the emergency care data.  In the emergency care data, patients are defined as being admitted to hospital for COVID-19 if they attend emergency care with subsequent discharge to the ward or emergency short stay ward and they have either had a recent positive COVID-19 test, have had recently confirmed COVID-19 in primary care, or are recorded as hhaving a COVID-19 diagnosis code.  Using this classification we provide a contingency matrix and measures of measures of the predictive ability of identifying COVID-19 hospitalisations using emergency care data.  Variable importance of this classification is then assessed.

In [17]:
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib.colors import ListedColormap


%matplotlib inline

pd.options.display.float_format = '{:.0f}'.format

In [18]:
df = pd.read_csv('../output/input.csv')
num_patients = len(df['patient_id'].unique())
num_patients_hosp = len(df[df['hospital_admission'].notna()]['patient_id'].unique())
num_patients_hosp_prim_covid = len(df[df['primary_covid_hospital_admission'].notna()]['patient_id'].unique())
num_patients_hosp_covid = len(df[df['covid_hospital_admission'].notna()]['patient_id'].unique())
num_patients_hosp_positive_cov_test = len(df[(df['hospital_admission'].notna()) & (df['positive_covid_test_before_hospital_admission']==1)]['patient_id'].unique())
num_patients_hosp_positive_cov_pc = len(df[(df['hospital_admission'].notna()) & (df['covid_primary_care_before_hospital_admission']==1)]['patient_id'].unique())




num_patients_ae = len(df[df['ae_attendance_any']==1]['patient_id'].unique())
num_patients_ae_hosp_discharge = len(df[df['ae_attendance']==1]['patient_id'].unique())
num_patients_ae_pos_covid = len(df[(df['ae_attendance']==1) & (df['ae_attendance_covid_status']==1)]['patient_id'].unique())
num_patients_ae_pos_covid_test = len(df[(df['ae_attendance']==1) & (df['positive_covid_test_before_ae_attendance']==1)]['patient_id'].unique())
num_patients_ae_pos_covid_pc = len(df[(df['ae_attendance']==1) & (df['covid_primary_care_before_ae_attendance']==1)]['patient_id'].unique())

### Results

Between 2020-09-01 and 2021-09-01 {{num_patients}} people out of 24 million in our dataset either attended emergency care ({{num_patients_ae}}) or were admitted to hospital ({{num_patients_hosp}}). In those attending emergency care, {{num_patients_ae_hosp_discharge}} ({{(num_patients_ae_hosp_discharge/num_patients_ae)*100}}) were discharged to the hospital ward or emergency short stay hospital ward.

<table>
  <tr>
    <th>Group</th>
    <th>Total</th>
    <th>%</th>
  </tr>
  <tr>
    <th>Hospital Admission</th>
  </tr>
  <tr>
    <td>Admitted</td>
    <td>{{num_patients_hosp}}</td>
    <td>{{(num_patients_hosp/num_patients_hosp)*100}}</td>
  </tr>
  <tr>
    <td>Admitted with primary COVID-19</td>
    <td>{{num_patients_hosp_prim_covid}}</td>
    <td>{{(num_patients_hosp_prim_covid/num_patients_hosp)*100}}</td>
  </tr>
  <tr>
    <td>Admitted with COVID-19</td>
    <td>{{num_patients_hosp_covid}}</td>
    <td>{{(num_patients_hosp_covid/num_patients_hosp)*100}}</td>
  </tr>
  <tr>
    <td>Recent + COVID Test</td>
    <td>{{num_patients_hosp_positive_cov_test}}</td>
    <td>{{(num_patients_hosp_positive_cov_test/num_patients_hosp)*100}}</td>
  </tr>  
  <tr>
    <td>Recent + COVID in Primary Care</td>
    <td>{{num_patients_hosp_positive_cov_pc}}</td>
    <td>{{(num_patients_hosp_positive_cov_pc/num_patients_hosp)*100}}</td>
  </tr>
  <tr>
    <th>Emergence Care Attendance</th>
  </tr>
  <tr>
    <td>Attended</td>
    <td>{{num_patients_ae}}</td>
    <td>{{(num_patients_ae/num_patients_ae)*100}}</td>
  </tr>
  <tr>
    <td>Attended with hospital discharge</td>
    <td>{{num_patients_ae_hosp_discharge}}</td>
    <td>{{(num_patients_ae_hosp_discharge/num_patients_ae)*100}}</td>
  </tr>
  <tr>
    <td>Attended with COVID</td>
    <td>{{num_patients_ae_pos_covid}}</td>
    <td>{{(num_patients_ae_pos_covid/num_patients_ae)*100}}</td>
  </tr>
  <tr>
    <td>Recent + COVID Test</td>
    <td>{{num_patients_ae_pos_covid_test}}</td>
    <td>{{(num_patients_ae_pos_covid_test/num_patients_ae)*100}}</td>
  </tr>
  <tr>
    <td>Recent + COVID in Primary Care</td>
    <td>{{num_patients_ae_pos_covid_pc}}</td>
    <td>{{(num_patients_ae_pos_covid_pc/num_patients_ae)*100}}</td>
  </tr>
</table>

In those admitted to hospital (any cause) how many could be identified looking at emergency care attendance (without appplying discharge filter).

In [None]:
positive_patients_sus = df[df['hospital_admission'].notna()]
negative_patients_sus = df[~df['hospital_admission'].notna()]

positive_patients_ecds = df[(df['ae_attendance_any']==1)]
negative_patients_ecds = df[(df['ae_attendance_any']==0)]

total_sus_patients_positive = set(list(positive_patients_sus['patient_id']))
total_sus_patients_negative = set(list(negative_patients_sus['patient_id']))


ecds_patients_positive = set(list(positive_patients_ecds['patient_id']))
ecds_patients_negative = set(list(negative_patients_ecds['patient_id']))

sus_total_pos_ecds_pos = len(list(set(total_sus_patients_positive) & set(ecds_patients_positive)))
sus_total_pos_ecds_neg = len(list(set(total_sus_patients_positive) & set(ecds_patients_negative)))
sus_total_neg_ecds_pos = len(list(set(total_sus_patients_negative) & set(ecds_patients_positive)))
sus_total_neg_ecds_neg = len(list(set(total_sus_patients_negative) & set(ecds_patients_negative)))

In [None]:
pd.DataFrame([[sus_total_pos_ecds_pos, sus_total_neg_ecds_pos, (sus_total_pos_ecds_pos + sus_total_neg_ecds_pos)], [sus_total_pos_ecds_neg, sus_total_neg_ecds_neg, (sus_total_pos_ecds_neg + sus_total_neg_ecds_neg)], [(sus_total_pos_ecds_pos+sus_total_pos_ecds_neg), (sus_total_neg_ecds_pos+sus_total_neg_ecds_neg), (sus_total_pos_ecds_pos + sus_total_pos_ecds_neg + sus_total_neg_ecds_pos + sus_total_neg_ecds_neg)]], columns=["SUS-total-positive", "SUS-total-negative", "Total"], index=["ECDS-positive", "ECDS-negative", "Total"])


In [None]:
#sensitivity - number of sus identified by ecds
#specificity - number of those not in sus who are not in ecds

sensitivity = (sus_total_pos_ecds_pos/(sus_total_pos_ecds_pos + sus_total_pos_ecds_neg))*100
print(f"Sensitivity: {sensitivity:.2f}%")

specificity = (sus_total_neg_ecds_neg/(sus_total_neg_ecds_pos + sus_total_neg_ecds_neg))*100
print(f"Specificity : {specificity:.2f}%")

In those admitted to hospital (any cause) how many could be identified looking at emergency care attendance that results in discharge to ward or emergency short stay ward.

In [3]:
positive_patients_sus = df[df['hospital_admission'].notna()]
negative_patients_sus = df[~df['hospital_admission'].notna()]

positive_patients_ecds = df[(df['ae_attendance']==1)]
negative_patients_ecds = df[(df['ae_attendance']==0)]

total_sus_patients_positive = set(list(positive_patients_sus['patient_id']))
total_sus_patients_negative = set(list(negative_patients_sus['patient_id']))


ecds_patients_positive = set(list(positive_patients_ecds['patient_id']))
ecds_patients_negative = set(list(negative_patients_ecds['patient_id']))

sus_total_pos_ecds_pos = len(list(set(total_sus_patients_positive) & set(ecds_patients_positive)))
sus_total_pos_ecds_neg = len(list(set(total_sus_patients_positive) & set(ecds_patients_negative)))
sus_total_neg_ecds_pos = len(list(set(total_sus_patients_negative) & set(ecds_patients_positive)))
sus_total_neg_ecds_neg = len(list(set(total_sus_patients_negative) & set(ecds_patients_negative)))

In [5]:
pd.DataFrame([[sus_total_pos_ecds_pos, sus_total_neg_ecds_pos, (sus_total_pos_ecds_pos + sus_total_neg_ecds_pos)], [sus_total_pos_ecds_neg, sus_total_neg_ecds_neg, (sus_total_pos_ecds_neg + sus_total_neg_ecds_neg)], [(sus_total_pos_ecds_pos+sus_total_pos_ecds_neg), (sus_total_neg_ecds_pos+sus_total_neg_ecds_neg), (sus_total_pos_ecds_pos + sus_total_pos_ecds_neg + sus_total_neg_ecds_pos + sus_total_neg_ecds_neg)]], columns=["SUS-total-positive", "SUS-total-negative", "Total"], index=["ECDS-positive", "ECDS-negative", "Total"])


Unnamed: 0,SUS-total-positive,SUS-total-negative,Total
ECDS-positive,1175,2825,4000
ECDS-negative,1825,4175,6000
Total,3000,7000,10000


In [6]:
#sensitivity - number of sus identified by ecds
#specificity - number of those not in sus who are not in ecds

sensitivity = (sus_total_pos_ecds_pos/(sus_total_pos_ecds_pos + sus_total_pos_ecds_neg))*100
print(f"Sensitivity: {sensitivity:.2f}%")

specificity = (sus_total_neg_ecds_neg/(sus_total_neg_ecds_pos + sus_total_neg_ecds_neg))*100
print(f"Specificity : {specificity:.2f}%")

Sensitivity: 39.17%
Specificity : 59.64%


In those admitted to hospital as a result of COVID-19, how many could be identified by finding those that attended emergency care, were later discharged to hospital and had had either a previous positive covid test, had recently had covid confirmed in primary care or attended emergency care with a covid diagnosis code.

In [7]:
positive_covid_patients_sus = df[df['primary_covid_hospital_admission'].notna()]
negative_covid_patients_sus = df[~df['primary_covid_hospital_admission'].notna()]

positive_covid_patients_ecds = df[(df['ae_attendance']==1) & ((df['ae_attendance_covid_status']==1) | (df['positive_covid_test_before_ae_attendance'] ==1) | (df['covid_primary_care_before_ae_attendance'] ==1))]
negative_covid_patients_ecds = df[(df['ae_attendance']==0) | ((df['ae_attendance']==1) & ((df['ae_attendance_covid_status']==0) & (df['positive_covid_test_before_ae_attendance'] ==0) & (df['covid_primary_care_before_ae_attendance'] ==0)))]


sus_patients_positive = set(list(positive_covid_patients_sus['patient_id']))
ecds_patients_positive = set(list(positive_covid_patients_ecds['patient_id']))

sus_patients_negative = set(list(negative_covid_patients_sus['patient_id']))
ecds_patients_negative = set(list(negative_covid_patients_ecds['patient_id']))


sus_pos_ecds_pos = len(list(set(sus_patients_positive) & set(ecds_patients_positive)))
sus_pos_ecds_neg = len(list(set(sus_patients_positive) & set(ecds_patients_negative)))
sus_neg_ecds_pos = len(list(set(sus_patients_negative) & set(ecds_patients_positive)))
sus_neg_ecds_neg = len(list(set(sus_patients_negative) & set(ecds_patients_negative)))

In [9]:
pd.DataFrame([[sus_pos_ecds_pos, sus_neg_ecds_pos, (sus_pos_ecds_pos + sus_neg_ecds_pos)], [sus_pos_ecds_neg, sus_neg_ecds_neg, (sus_pos_ecds_neg + sus_neg_ecds_neg)], [(sus_pos_ecds_pos+sus_pos_ecds_neg), (sus_neg_ecds_pos+sus_neg_ecds_neg), (sus_pos_ecds_pos + sus_pos_ecds_neg + sus_neg_ecds_pos + sus_neg_ecds_neg)]], columns=["SUS-positive", "SUS-negative", "Total"], index=["ECDS-positive", "ECDS-negative", "Total"])


Unnamed: 0,SUS-positive,SUS-negative,Total
ECDS-positive,1184,2713,3897
ECDS-negative,1816,4287,6103
Total,3000,7000,10000


In [10]:
#sensitivity - number of sus identified by ecds
#specificity - number of those not in sus who are not in ecds

sensitivity = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_pos_ecds_neg))*100
print(f"Sensitivity: {sensitivity:.2f}%")

specificity = (sus_neg_ecds_neg/(sus_neg_ecds_pos + sus_neg_ecds_neg))*100
print(f"Specificity : {specificity:.2f}%")


Sensitivity: 39.47%
Specificity : 61.24%


## Variable Breakdown 

### AE Attendance (and subsequent discharge to ward or icu)


In [11]:
positive_ae_covid_ecds = df[((df['ae_attendance']==1))]
negative_ae_covid_ecds = df[((df['ae_attendance']==0))]


ecds_patients_positive = set(list(positive_ae_covid_ecds['patient_id']))
ecds_patients_negative = set(list(negative_ae_covid_ecds['patient_id']))

sus_pos_ecds_pos = len(list(set(sus_patients_positive) & set(ecds_patients_positive)))
sus_pos_ecds_neg = len(list(set(sus_patients_positive) & set(ecds_patients_negative)))
sus_neg_ecds_pos = len(list(set(sus_patients_negative) & set(ecds_patients_positive)))
sus_neg_ecds_neg = len(list(set(sus_patients_negative) & set(ecds_patients_negative)))

pd.DataFrame([[sus_pos_ecds_pos, sus_neg_ecds_pos, (sus_pos_ecds_pos + sus_neg_ecds_pos)], [sus_pos_ecds_neg, sus_neg_ecds_neg, (sus_pos_ecds_neg + sus_neg_ecds_neg)], [(sus_pos_ecds_pos+sus_pos_ecds_neg), (sus_neg_ecds_pos+sus_neg_ecds_neg), (sus_pos_ecds_pos + sus_pos_ecds_neg + sus_neg_ecds_pos + sus_neg_ecds_neg)]], columns=["SUS-positive", "SUS-negative", "Total"], index=["AE attendance +", "AE attendance -", "Total"])


Unnamed: 0,SUS-positive,SUS-negative,Total
AE attendance +,1224,2776,4000
AE attendance -,1776,4224,6000
Total,3000,7000,10000


In [12]:
sensitivity = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_pos_ecds_neg))*100
print(f"Sensitivity: {sensitivity:.2f}%")

specificity = (sus_neg_ecds_neg/(sus_neg_ecds_pos + sus_neg_ecds_neg))*100
print(f"Specificity : {specificity:.2f}%")


Sensitivity: 40.80%
Specificity : 60.34%


### AE Attendance + AE Covid Status

In [13]:
positive_ae_covid_ecds = df[(df['ae_attendance']==1) & ((df['ae_attendance_covid_status']==1))]
negative_ae_covid_ecds = df[((df['ae_attendance']==0) | ((df['ae_attendance']==1) & (df['ae_attendance_covid_status']==0)))]



ecds_patients_positive = set(list(positive_ae_covid_ecds['patient_id']))
ecds_patients_negative = set(list(negative_ae_covid_ecds['patient_id']))

sus_pos_ecds_pos = len(list(set(sus_patients_positive) & set(ecds_patients_positive)))
sus_pos_ecds_neg = len(list(set(sus_patients_positive) & set(ecds_patients_negative)))
sus_neg_ecds_pos = len(list(set(sus_patients_negative) & set(ecds_patients_positive)))
sus_neg_ecds_neg = len(list(set(sus_patients_negative) & set(ecds_patients_negative)))

pd.DataFrame([[sus_pos_ecds_pos, sus_neg_ecds_pos, (sus_pos_ecds_pos + sus_neg_ecds_pos)], [sus_pos_ecds_neg, sus_neg_ecds_neg, (sus_pos_ecds_neg + sus_neg_ecds_neg)], [(sus_pos_ecds_pos+sus_pos_ecds_neg), (sus_neg_ecds_pos+sus_neg_ecds_neg), (sus_pos_ecds_pos + sus_pos_ecds_neg + sus_neg_ecds_pos + sus_neg_ecds_neg)]], columns=["SUS-positive", "SUS-negative", "Total"], index=["AE Covid +", "AE Covid -", "Total"])



Unnamed: 0,SUS-positive,SUS-negative,Total
AE Covid +,1101,2502,3603
AE Covid -,1899,4498,6397
Total,3000,7000,10000


In [14]:
sensitivity = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_pos_ecds_neg))*100
print(f"Sensitivity: {sensitivity:.2f}%")

specificity = (sus_neg_ecds_neg/(sus_neg_ecds_pos + sus_neg_ecds_neg))*100
print(f"Specificity : {specificity:.2f}%")


Sensitivity: 36.70%
Specificity : 64.26%


###  AE Attendance + Recent Positive Covid Test

In [15]:
positive_cov_test_ecds = df[(df['ae_attendance']==1) & ((df['positive_covid_test_before_ae_attendance']==1))]
negative_cov_test_ecds = df[((df['ae_attendance']==0) | ((df['ae_attendance']==1) & (df['positive_covid_test_before_ae_attendance']==0)))]


ecds_patients_positive = set(list(positive_cov_test_ecds['patient_id']))
ecds_patients_negative = set(list(negative_cov_test_ecds['patient_id']))

sus_pos_ecds_pos = len(list(set(sus_patients_positive) & set(ecds_patients_positive)))
sus_pos_ecds_neg = len(list(set(sus_patients_positive) & set(ecds_patients_negative)))
sus_neg_ecds_pos = len(list(set(sus_patients_negative) & set(ecds_patients_positive)))
sus_neg_ecds_neg = len(list(set(sus_patients_negative) & set(ecds_patients_negative)))

pd.DataFrame([[sus_pos_ecds_pos, sus_neg_ecds_pos, (sus_pos_ecds_pos + sus_neg_ecds_pos)], [sus_pos_ecds_neg, sus_neg_ecds_neg, (sus_pos_ecds_neg + sus_neg_ecds_neg)], [(sus_pos_ecds_pos+sus_pos_ecds_neg), (sus_neg_ecds_pos+sus_neg_ecds_neg), (sus_pos_ecds_pos + sus_pos_ecds_neg + sus_neg_ecds_pos + sus_neg_ecds_neg)]], columns=["SUS-positive", "SUS-negative", "Total"], index=["Covid Test +", "Covid Test -", "Total"])


Unnamed: 0,SUS-positive,SUS-negative,Total
Covid Test +,604,1410,2014
Covid Test -,2396,5590,7986
Total,3000,7000,10000


In [16]:
sensitivity = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_pos_ecds_neg))*100
print(f"Sensitivity: {sensitivity:.2f}%")

specificity = (sus_neg_ecds_neg/(sus_neg_ecds_pos + sus_neg_ecds_neg))*100
print(f"Specificity : {specificity:.2f}%")


Sensitivity: 20.13%
Specificity : 79.86%


### AE Attendance + Covid Positive Primary Care

In [17]:
positive_cov_pc_ecds = df[(df['ae_attendance']==1) & ((df['covid_primary_care_before_ae_attendance']==1))]
negative_cov_pc_ecds = df[((df['ae_attendance']==0) | ((df['ae_attendance']==1) & (df['covid_primary_care_before_ae_attendance']==0)))]


ecds_patients_positive = set(list(positive_cov_pc_ecds['patient_id']))
ecds_patients_negative = set(list(negative_cov_pc_ecds['patient_id']))

sus_pos_ecds_pos = len(list(set(sus_patients_positive) & set(ecds_patients_positive)))
sus_pos_ecds_neg = len(list(set(sus_patients_positive) & set(ecds_patients_negative)))
sus_neg_ecds_pos = len(list(set(sus_patients_negative) & set(ecds_patients_positive)))
sus_neg_ecds_neg = len(list(set(sus_patients_negative) & set(ecds_patients_negative)))

pd.DataFrame([[sus_pos_ecds_pos, sus_neg_ecds_pos, (sus_pos_ecds_pos + sus_neg_ecds_pos)], [sus_pos_ecds_neg, sus_neg_ecds_neg, (sus_pos_ecds_neg + sus_neg_ecds_neg)], [(sus_pos_ecds_pos+sus_pos_ecds_neg), (sus_neg_ecds_pos+sus_neg_ecds_neg), (sus_pos_ecds_pos + sus_pos_ecds_neg + sus_neg_ecds_pos + sus_neg_ecds_neg)]], columns=["SUS-positive", "SUS-negative", "Total"], index=["Primary Care Covid +", "Primary Care Covid -", "Total"])


Unnamed: 0,SUS-positive,SUS-negative,Total
Primary Care Covid +,618,1352,1970
Primary Care Covid -,2382,5648,8030
Total,3000,7000,10000


In [18]:
sensitivity = (sus_pos_ecds_pos/(sus_pos_ecds_pos + sus_pos_ecds_neg))*100
print(f"Sensitivity: {sensitivity:.2f}%")

specificity = (sus_neg_ecds_neg/(sus_neg_ecds_pos + sus_neg_ecds_neg))*100
print(f"Specificity : {specificity:.2f}%")


Sensitivity: 20.60%
Specificity : 80.69%
