# 2. Add other relevant data: EPR suppliers and trust populations

This notebook joins our tidy ERIC data with other relevant datasets: EPR suppliers, and trust catchment populations.

Richard has obtained data on the primary EPR (electronic patient record) supplier to each NHS primary trust, as of January 2024; and also on trust catchment populations. This notebook does some exploratory analysis on those datasets, and then joins them to the ERIC data.

Observations on the EPR suppliers data:


- There are 209 trusts of which 135 are acute, 10 ambulance, 18 mental health, 13 community, and 33 combined mental health & community.
- These trusts between them use 24 different EPR suppliers; the suppliers are mostly different depending on the type of trust.
- In acute trusts, Oracle Cerner is the most widely (and only) used by acute trusts, and surprisingly 20 acute trusts don't have an EPR as of Jan 2024.
- Mental health and community trusts all have EPRs, and are dominated by TPP and The Access Group.
- The market is fairly concentrated in these big suppliers, but still there are quite a few small suppliers, across all types of trust.
- This data also tells us trust type, and includes only active trusts (and seems to be a more reliable source of active status than the NHS ODS data)

In [1]:
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

## Join with EPR data

First quickly import and explore the raw EPR data, to get a sense of what we're dealing with.

In [2]:
df_foi = pd.read_excel("./data/source/requests/secondary_trusts_primary_epr_suppliers_20240101.xlsx")
df_foi['Trust Type (Primary)'] = df_foi['Trust Type (Primary)'].str.strip() # Get rid of rogue whitespace.
df_foi.head()

  warn(msg)


Unnamed: 0,Organisation Code,Organisation,Trust Type (Primary),Supplier (Primary EPR)
0,RCF,AIREDALE NHS FOUNDATION TRUST,Acute,TPP
1,RBS,ALDER HEY CHILDREN'S NHS FOUNDATION TRUST,Acute,Meditech Group Limited
2,RTK,ASHFORD AND ST PETER'S HOSPITALS NHS FOUNDATION TRUST,Acute,Oracle Cerner Corporation
3,RVN,AVON AND WILTSHIRE MENTAL HEALTH PARTNERSHIP NHS TRUST,Mental Health and Community (Combined),The Access Group
4,RF4,"BARKING, HAVERING AND REDBRIDGE UNIVERSITY HOSPITALS NHS TRUST",Acute,System C Healthcare Ltd


In [3]:
print('There are', len(df_foi), 'trusts, of the following types:')
print(df_foi.groupby("Trust Type (Primary)").size().reset_index(name='count').sort_values(['count'], ascending=False).to_string(index=False))
print('using', df_foi['Supplier (Primary EPR)'].nunique(), 'different suppliers')

There are 209 trusts, of the following types:
                  Trust Type (Primary)  count
                                 Acute    135
Mental Health and Community (Combined)     33
                         Mental Health     18
                             Community     13
                             Ambulance     10
using 24 different suppliers


In [4]:
df_foi.head()

Unnamed: 0,Organisation Code,Organisation,Trust Type (Primary),Supplier (Primary EPR)
0,RCF,AIREDALE NHS FOUNDATION TRUST,Acute,TPP
1,RBS,ALDER HEY CHILDREN'S NHS FOUNDATION TRUST,Acute,Meditech Group Limited
2,RTK,ASHFORD AND ST PETER'S HOSPITALS NHS FOUNDATION TRUST,Acute,Oracle Cerner Corporation
3,RVN,AVON AND WILTSHIRE MENTAL HEALTH PARTNERSHIP NHS TRUST,Mental Health and Community (Combined),The Access Group
4,RF4,"BARKING, HAVERING AND REDBRIDGE UNIVERSITY HOSPITALS NHS TRUST",Acute,System C Healthcare Ltd


Different types of trust seem to use quite different suppliers - explore by trust type and supplier.

In [5]:
grouped_by_trust = (df_foi.groupby(["Trust Type (Primary)", "Supplier (Primary EPR)"])
                   .size()
                   .reset_index(name='count')
                   .sort_values(['Trust Type (Primary)', 'count'], ascending=[True, False]))
print(grouped_by_trust.to_string(index=False))

                  Trust Type (Primary)              Supplier (Primary EPR)  count
                                 Acute           Oracle Cerner Corporation     30
                                 Acute No EPR -  in process of getting one     20
                                 Acute             System C Healthcare Ltd     18
                                 Acute    Generic or In House EPR Provider     15
                                 Acute              Dedalus Healthcare Ltd     14
                                 Acute               Altera Digital Health     10
                                 Acute            Epic Systems Corporation      9
                                 Acute              Meditech Group Limited      5
                                 Acute            Nervecentre Software Ltd      3
                                 Acute                            Alcidion      2
                                 Acute                      IMS Maxims Ltd      2
                

### Now join the EPR and ERIC data

Happily, this joins cleanly on organisational code.

In [6]:
df_eric = pd.read_csv("./data/output/trust_mrc_clean.csv")
len(df_eric)

234

In [7]:
df_eric.head()

Unnamed: 0,trust_code,trust_name,mrc_off_201718,mrc_off_201819,mrc_off_201920,mrc_off_202021,mrc_off_202122,mrc_off_202223,mrc_off_202324,mrc_on_201718,mrc_on_201819,mrc_on_201920,mrc_on_202021,mrc_on_202122,mrc_on_202223,mrc_on_202324,mrc_service_provision_201718,mrc_service_provision_201819,mrc_service_provision_201920,mrc_service_provision_202021,mrc_service_provision_202122,mrc_service_provision_202223,mrc_service_provision_202324,mrc_type_201718,mrc_type_201819,mrc_type_201920,mrc_type_202021,mrc_type_202122,mrc_type_202223,mrc_type_202324,mrc_total_201718,mrc_total_201819,mrc_total_201920,mrc_total_202021,mrc_total_202122,mrc_total_202223,mrc_total_202324
0,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,203096.0,145631.0,277200.0,458075.0,438847.0,296103.0,524104.0,3612846.0,2871339.0,3285737.0,3285787.0,4875116.0,5629069.0,5557232.0,Internal,Internal,Internal,Internal,Internal,Internal,Internal,Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,3815942.0,3016970.0,3562937.0,3743862.0,5313963.0,5925172.0,6081336.0
1,R0B,"SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST, SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST",0.0,0.0,305675.0,474199.0,306977.0,309358.0,528858.0,0.0,0.0,1703707.0,1141689.0,687573.0,651088.0,1154972.0,,,Hybrid,Hybrid,Hybrid,Hybrid,Hybrid,,,3. Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,0.0,0.0,2009382.0,1615888.0,994550.0,960446.0,1683830.0
2,R0D,UNIVERSITY HOSPITALS DORSET NHS FOUNDATION TRUST,0.0,0.0,0.0,989106.0,477552.0,460039.0,227047.0,0.0,0.0,0.0,1052080.0,1269710.0,1243808.0,1318124.0,,,,Hybrid,Hybrid,Hybrid,Hybrid,,,,3. Mixed,3. Mixed,3. Mixed,3. Mixed,0.0,0.0,0.0,2041186.0,1747262.0,1703847.0,1545171.0
3,R1A,"WORCESTERSHIRE HEALTH AND CARE NHS TRUST, HEREFORDSHIRE AND WORCESTERSHIRE HEALTH AND CARE NHS TRUST",107682.0,118345.0,128996.0,140606.0,211261.0,221822.0,123636.0,155193.0,169347.0,184588.0,201201.0,147636.0,155015.0,187229.0,Hybrid,Hybrid,Hybrid,Internal,Internal,Internal,Hybrid,Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,262875.0,287692.0,313584.0,341807.0,358897.0,376837.0,310865.0
4,R1C,SOLENT NHS TRUST,71981.0,83204.0,66826.0,63516.0,62072.0,66994.0,62480.0,53002.0,39584.0,61596.0,27673.0,41186.0,15703.0,31174.0,Hybrid,Hybrid,Hybrid,Hybrid,Hybrid,Hybrid,Hybrid,Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,124983.0,122788.0,128422.0,91189.0,103258.0,82697.0,93654.0


In [8]:
df_joined = pd.merge(df_eric, df_foi, left_on="trust_code", right_on="Organisation Code", how="outer")
df_joined.columns

Index(['trust_code', 'trust_name', 'mrc_off_201718', 'mrc_off_201819',
       'mrc_off_201920', 'mrc_off_202021', 'mrc_off_202122', 'mrc_off_202223',
       'mrc_off_202324', 'mrc_on_201718', 'mrc_on_201819', 'mrc_on_201920',
       'mrc_on_202021', 'mrc_on_202122', 'mrc_on_202223', 'mrc_on_202324',
       'mrc_service_provision_201718', 'mrc_service_provision_201819',
       'mrc_service_provision_201920', 'mrc_service_provision_202021',
       'mrc_service_provision_202122', 'mrc_service_provision_202223',
       'mrc_service_provision_202324', 'mrc_type_201718', 'mrc_type_201819',
       'mrc_type_201920', 'mrc_type_202021', 'mrc_type_202122',
       'mrc_type_202223', 'mrc_type_202324', 'mrc_total_201718',
       'mrc_total_201819', 'mrc_total_201920', 'mrc_total_202021',
       'mrc_total_202122', 'mrc_total_202223', 'mrc_total_202324',
       'Organisation Code', 'Organisation', 'Trust Type (Primary)',
       'Supplier (Primary EPR)'],
      dtype='object')

### Check for rows that didn't join cleanly

Everything in the FOI data is in the ERIC data. But there are 23 'active' trusts in the ERIC data not in the FOI data - all with zero spending in 23/24.

Debugging this, Wikipedia sugests that many of these are really ex-trusts ([example](https://en.wikipedia.org/wiki/Basildon_and_Thurrock_University_Hospitals_NHS_Foundation_Trust)) - NB they are in the ODS dataset as "active", so ODS status may not reflect reality.

We will exclude these from any analysis of the current data.

In [9]:
print(len(df_joined[df_joined.trust_name.isnull()]), "trust codes in the FOI data that are not in the ERIC data")
print(len(df_joined[df_joined.Organisation.isnull()]), "trust codes in the ERIC data that are not in the FOI data:")
df_joined[df_joined.Organisation.isnull()].sort_values(by="trust_name")[['trust_code', 'trust_name', 'mrc_total_202324']]

0 trust codes in the FOI data that are not in the ERIC data
25 trust codes in the ERIC data that are not in the FOI data:


Unnamed: 0,trust_code,trust_name,mrc_total_202324
45,RDD,BASILDON AND THURROCK UNIVERSITY HOSPITALS NHS FOUNDATION TRUST,0.0
35,RC1,BEDFORD HOSPITAL NHS TRUST,0.0
195,RXH,BRIGHTON AND SUSSEX UNIVERSITY HOSPITALS NHS TRUST,0.0
82,RJF,BURTON HOSPITALS NHS FOUNDATION TRUST,0.0
94,RLN,CITY HOSPITALS SUNDERLAND NHS FOUNDATION TRUST,0.0
224,RYK,DUDLEY AND WALSALL MENTAL HEALTH PARTNERSHIP NHS TRUST,0.0
9,R1J,GLOUCESTERSHIRE CARE SERVICES NHS TRUST,0.0
130,RR1,HEART OF ENGLAND NHS FOUNDATION TRUST,0.0
65,RGQ,IPSWICH HOSPITAL NHS TRUST,0.0
208,RY1,LIVERPOOL COMMUNITY HEALTH NHS TRUST,0.0


Add a utility column to flag whether the trust is in the FOI data, to make life easier.

In [10]:
df_joined['in_foi_data'] = ~df_joined.Organisation.isnull()
df_joined.head(2)

Unnamed: 0,trust_code,trust_name,mrc_off_201718,mrc_off_201819,mrc_off_201920,mrc_off_202021,mrc_off_202122,mrc_off_202223,mrc_off_202324,mrc_on_201718,mrc_on_201819,mrc_on_201920,mrc_on_202021,mrc_on_202122,mrc_on_202223,mrc_on_202324,mrc_service_provision_201718,mrc_service_provision_201819,mrc_service_provision_201920,mrc_service_provision_202021,mrc_service_provision_202122,mrc_service_provision_202223,mrc_service_provision_202324,mrc_type_201718,mrc_type_201819,mrc_type_201920,mrc_type_202021,mrc_type_202122,mrc_type_202223,mrc_type_202324,mrc_total_201718,mrc_total_201819,mrc_total_201920,mrc_total_202021,mrc_total_202122,mrc_total_202223,mrc_total_202324,Organisation Code,Organisation,Trust Type (Primary),Supplier (Primary EPR),in_foi_data
0,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,203096.0,145631.0,277200.0,458075.0,438847.0,296103.0,524104.0,3612846.0,2871339.0,3285737.0,3285787.0,4875116.0,5629069.0,5557232.0,Internal,Internal,Internal,Internal,Internal,Internal,Internal,Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,3815942.0,3016970.0,3562937.0,3743862.0,5313963.0,5925172.0,6081336.0,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,Acute,Epic Systems Corporation,True
1,R0B,"SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST, SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST",0.0,0.0,305675.0,474199.0,306977.0,309358.0,528858.0,0.0,0.0,1703707.0,1141689.0,687573.0,651088.0,1154972.0,,,Hybrid,Hybrid,Hybrid,Hybrid,Hybrid,,,3. Mixed,3. Mixed,3. Mixed,3. Mixed,3. Mixed,0.0,0.0,2009382.0,1615888.0,994550.0,960446.0,1683830.0,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST,Acute,Meditech Group Limited,True


Tidy up column names.

In [11]:
df_joined.rename(columns={
    'Trust Type (Primary)': 'trust_type',
    'Supplier (Primary EPR)': 'trust_epr',
    'Organisation': 'current_name' # Use the name from the FOI data as the current main name.
}, inplace=True)
df_joined.drop('Organisation Code', axis=1, inplace=True) # Drop duplicate column.

## Add trust population data

We have [data on trust catchment populations](https://www.eastsussexjsna.org.uk/resources/nhs-acute-hospital-trust-catchment-populations/) - these are experimental statistics, based on modelled estimates from the Office for Health Improvement and Disparities.

Add these to our data to help us normalise the spending by patient.

In [13]:
df_pop = pd.read_excel("./data/source/ohid/2022_trust_catchment_populations.xlsx")
df_pop.head()
df_pop.rename(columns={
    'All admissions': 'all_admissions_22',
    'Emergency admissions': 'emergency_admissions_22',
}, inplace=True)
df_pop.all_admissions_22.sum()

56550138

In [14]:
df_all = pd.merge(df_joined, df_pop, left_on="trust_code", right_on="TrustCode", how="outer")
df_all.head()
len(df_all)

234

Note that some trusts don't have population catchment data: these look like specialist trusts (eye, cancer, orthopaedic, children, women's etc). 

In [15]:
df_all[(df_all.all_admissions_22.isnull()) & (df_all.trust_type == 'Acute')][['trust_name', 'trust_type', 'in_foi_data']]

Unnamed: 0,trust_name,trust_type,in_foi_data
20,ROYAL NATIONAL ORTHOPAEDIC HOSPITAL NHS TRUST,Acute,True
30,"LIVERPOOL HEART AND CHEST NHS FOUNDATION TRUST, LIVERPOOL HEART AND CHEST HOSPITAL NHS FOUNDATION TRUST",Acute,True
31,ALDER HEY CHILDRENS NHS FOUNDATION TRUST,Acute,True
33,THE CHRISTIE NHS FOUNDATION TRUST,Acute,True
40,SHEFFIELD CHILDRENS NHS FOUNDATION TRUST,Acute,True
54,CLATTERBRIDGE CANCER CENTRE NHS FOUNDATION TRUST,Acute,True
55,LIVERPOOL WOMEN'S NHS FOUNDATION TRUST,Acute,True
56,WALTON CENTRE NHS FOUNDATION TRUST,Acute,True
62,ROYAL PAPWORTH HOSPITAL NHS FOUNDATION TRUST,Acute,True
92,ROBERT JONES AND AGNES HUNT ORTHOPAEDIC HOSPITAL NHS FOUNDATION TRUST,Acute,True


Now we can calculate the spend per patient, which will help us compare trusts more meaningfully.

In [16]:
df_all['mrc_total_202324_perpatient'] = df_all.mrc_total_202324.astype(float) / df_all.all_admissions_22.astype(float)
df_all['mrc_total_202324_perpatient'] = df_all['mrc_total_202324_perpatient'].round(2)

## Export all to CSV

In [17]:
# Remove unused columns.
df_all.drop(columns=['TrustCode', 'TrustName'], inplace=True)
# Rename other columns to be more readable. 
df_all.rename(columns={
    "current_name": "current_name_in_foi", 
    "trust_type": "trust_type_from_foi"
}, inplace=True)
df_all.sort_values(by="trust_name").to_csv('./data/output/trust_mrc_and_epr.csv', index=False)