# Data processing for the Organisational Audit Portfolio spreadsheet 2: rename and reduce data

This data reduces redundant parts of the data and then renames what's left for ease of use.

In [68]:
import os
import pandas as pd

### Import data

In [69]:
dir_files = '../data/organisational_audit/processed'
file_excel = 'processed_2019_portfolio_key_indicators_summary.csv'

In [70]:
df = pd.read_csv(
    os.path.join(dir_files, file_excel),
    header=[0, 1, 2, 3, 4, 5],
    index_col=[0, 1, 2, 3]
)

In [71]:
df.columns.names

FrozenList(['SCN', 'Trust name', 'Site name', 'Hospital names', 'hospital_name_2', 'hospital_name_3'])

In [72]:
df.index.names

FrozenList(['Key indicator', 'Response required to meet indicator', 'key_indicator_group', 'question'])

### Header: Check for redundant site names

In [73]:
print(f'Total hospitals: {len(df.columns)}')

for header in df.columns.names:
    n_unique = len(df.columns.get_level_values(header).unique())
    print(f'{header}: {n_unique}')

Total hospitals: 172
SCN: 16
Trust name: 138
Site name: 172
Hospital names: 172
hospital_name_2: 172
hospital_name_3: 172


All hospitals have a unique site name, so we'll move the 'site name' data into another dataframe.

### Header: reduce hospital names

Re-jig "hospital names" column header. In the Excel sheet it is a merged cell and applies to three rows of data. The hospital can have multiple names in English and in Welsh so all should be kept.

Only keep the top row in this DataFrame. Store the other rows' data in a new DataFrame for reference.

Also move the 'site names' data to this new DataFrame.

In [74]:
headers_to_move = ['hospital_name_2', 'hospital_name_3', 'SCN', 'Trust name', 'Site name']

In [75]:
# Combine those rows into one dataframe:
df_hospital_names = pd.DataFrame(
    [df.columns.get_level_values(h) for h in ['Hospital names'] + headers_to_move],
    index=['hospital_name_1', 'hospital_name_2', 'hospital_name_3', 'scn', 'trust', 'site_name']
).T

In [76]:
df_hospital_names.head(3)

Unnamed: 0,hospital_name_1,hospital_name_2,hospital_name_3,scn,trust,site_name
0,Unnamed: 4_level_3,Unnamed: 4_level_4,Unnamed: 4_level_5,National Results,Unnamed: 4_level_1,Unnamed: 4_level_2
1,Queens Hospital Romford HASU,Queens Hospital Romford SU,Unnamed: 5_level_5,London,"Barking, Havering and Redbridge University Hos...","Barking, Havering and Redbridge University Hos..."
2,Newham General Hospital,Unnamed: 6_level_4,Unnamed: 6_level_5,London,Barts Health NHS Trust,Barts Health NHS Trust (Newham University Hosp...


In [77]:
# Remove the 'National Results' row:
# (assuming it's first in the list)
df_hospital_names = df_hospital_names.iloc[1:]

In [78]:
for col in df_hospital_names.columns:
    mask = df_hospital_names[col].str.startswith('Unnamed')
    df_hospital_names.loc[mask, col] = ''

Save hospital names df to file:

In [79]:
df_hospital_names.to_csv(os.path.join(dir_files, 'hospital_names_trusts.csv'), index=False)

Example of how to access a hospital if you know one of its names but not which column of the DataFrame that name is in:

In [80]:
def find_main_hospital_name(
        df_hospital_names: 'pd.DataFrame',
        name_to_look_up: 'str',
        column_main_name: 'str'='hospital_name_1'
        ):
    # df of True/False for name matches the name_to_look_up:
    df_bool = df_hospital_names.eq(name_to_look_up)
    # series of True/False, one for each row in the dataframe,
    # and the row is True when any value in the row in df_bool is True.
    series_bool = df_bool.any(axis='columns')
    # Use that series as a mask to pick out only the right row,
    # then pick out the value in that row and the right column:
    main_hospital_name = df_hospital_names.loc[series_bool, column_main_name].values[0]
    return main_hospital_name

In [81]:
name_to_look_up = 'Northwick Park Hospital SU'

find_main_hospital_name(df_hospital_names, name_to_look_up)

'Northwick Park Hospital HASU'

Drop the other hospital name headers from the organisational audit dataframe:

In [82]:
df.columns = df.columns.droplevel(headers_to_move)

Rename the 'National results' column:

In [87]:
col = df.columns[df.columns.str.startswith('Unnamed')].values[0]

df = df.rename(columns={col: 'National Results'})

In [88]:
df.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Hospital names,National results,Queens Hospital Romford HASU,Newham General Hospital,Royal London Hospital HASU,Whipps Cross University Hospital,Charing Cross Hospital HASU,King's College Hospital HASU,Princess Royal University Hospital HASU,Northwick Park Hospital HASU,St George's Hospital HASU,...,Causeway Hospital,Antrim Area Hospital,Ulster Hospital,Craigavon Area Hospital,Daisy Hill Hospital,Altnagelvin Hospital,South West Acute Hospital,Noble's Hospital,Walton Centre Stroke Team,Queen's Medical Centre - Nottingham
Key indicator,Response required to meet indicator,key_indicator_group,question,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
"Total Key Indicators Achieved \n(Post 72hrs sites receive points from KIs 3,4, and 6 from their main acute site)",,,,1: 5% (8/169)_x000D_\n2: 7% (12/169)_x000D_\n3...,6,6,5,3,5,9,7,9,6,...,2,2,5,2,2,3,3,3,,
1,Key Indicator 1: Minimum establishment of band 6 and band 7 nurses per 10 beds (Criterion: Sum of band 6 and 7 (WTE) nurses per 10 stroke unit beds is equal to/above 2.375 per 10 beds for ALL stroke beds.),Staffing/Workforce,,58% (98/169),Yes,Yes,Yes,Yes,Yes,Yes,Yes,Yes,Yes,...,No,Yes,Yes,Yes,No,No,No,Yes,,
1,Band 6 nurses WTE per 10 beds,Staffing/Workforce,,1.9 (1.4-2.9)_x000D_\nMedian (IQR),2.73,1.54,2.85,2.63,3.67,3.21,3.25,4.2,3.33,...,1.25,2.33,2.08,2.63,1.5,1.2,1.56,5,,


## Set up new index names

Make a new set of data that links the new short names with the full names they were once called and any other notes.

Copy the existing index, make a new column for the short name, and save to file.

In the final dataframe, only keep the short names.

In [49]:
print(stop, here, please)

NameError: name 'stop' is not defined

### Split off 'National Results'

This column is formatted too differently from the individual hospitals to keep it in with the rest.

In [37]:
df_national_results = df['National Results']

In [38]:
df_national_results.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,National Results
Key indicator,Response required to meet indicator,key_indicator_group,question,Unnamed: 4_level_1
"Total Key Indicators Achieved \n(Post 72hrs sites receive points from KIs 3,4, and 6 from their main acute site)",,,,1: 5% (8/169)_x000D_\n2: 7% (12/169)_x000D_\n3...
1,Key Indicator 1: Minimum establishment of band 6 and band 7 nurses per 10 beds (Criterion: Sum of band 6 and 7 (WTE) nurses per 10 stroke unit beds is equal to/above 2.375 per 10 beds for ALL stroke beds.),Staffing/Workforce,,58% (98/169)
1,Band 6 nurses WTE per 10 beds,Staffing/Workforce,,1.9 (1.4-2.9)_x000D_\nMedian (IQR)
1,Band 7 nurses WTE per 10 beds,Staffing/Workforce,,0.5 (0.4-0.8)_x000D_\nMedian (IQR)
2,Key Indicator 2: Presence of a clinical psychologist (qualified) (Criterion: Presence of at least one (WTE) qualified clinical psychologist per 30 stroke unit beds),Staffing/Workforce,,7% (12/169)
2,Clinical psychologist WTE per 30 beds (qualified),Staffing/Workforce,,0.1 (0-0.3)_x000D_\nMedian (IQR)
3,Key Indicator 3: Out of hours presence of stroke specialist nurse (Criterion: Met if there is at least one stroke specialist nurse per 10 beds on 10pm weekdays and 10am and 10pm weekends),7-day working,,71% (101/142)
3,Out of hours,7-day working,1.7. Do you have stroke specialist nurses (band 6 or above) who undertake hyper-acute assessments of suspected stroke patients in A&E?,71% (101/142)
3,Registered nurses Type 1 Beds (weekdays 10 pm),7-day working,1.7. Do you have stroke specialist nurses (band 6 or above) who undertake hyper-acute assessments of suspected stroke patients in A&E?,215
3,Registered nurses Type 1 Beds (Saturdays),7-day working,1.7. Do you have stroke specialist nurses (band 6 or above) who undertake hyper-acute assessments of suspected stroke patients in A&E?,243


Drop the national results from the main organisational audit dataframe:

In [39]:
df = df.drop('National Results', axis='columns)

In [42]:
df.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Key indicator,"Total Key Indicators Achieved \n(Post 72hrs sites receive points from KIs 3,4, and 6 from their main acute site)",1,1,1,2,2,3,3,3,3,...,8,8,9,9,9,9,10,10,10,10
Unnamed: 0_level_1,Unnamed: 1_level_1,Response required to meet indicator,NaN,Key Indicator 1: Minimum establishment of band 6 and band 7 nurses per 10 beds (Criterion: Sum of band 6 and 7 (WTE) nurses per 10 stroke unit beds is equal to/above 2.375 per 10 beds for ALL stroke beds.),Band 6 nurses WTE per 10 beds,Band 7 nurses WTE per 10 beds,Key Indicator 2: Presence of a clinical psychologist (qualified) (Criterion: Presence of at least one (WTE) qualified clinical psychologist per 30 stroke unit beds),Clinical psychologist WTE per 30 beds (qualified),Key Indicator 3: Out of hours presence of stroke specialist nurse (Criterion: Met if there is at least one stroke specialist nurse per 10 beds on 10pm weekdays and 10am and 10pm weekends),Out of hours,Registered nurses Type 1 Beds (weekdays 10 pm),Registered nurses Type 1 Beds (Saturdays),...,Key Indicator 8: Formal survey undertaken seeking patient/carer views on stroke services (Criterion: Met if at least one a year),9.7. How often is there a formal survey seeking patient/carer views on stroke services? (This does not include the Friends and Family Test),"Key Indicator 9: First line of brain imaging for TIA patients is MRI\n(Criterion: Met if MRI is first line brain imaging for suspected TIA AND investigations are completed within in 2 days (Next weekday, the next day, the same day (5 days a week) or the same day (7 days a week))",(a) First line brain imaging,Outpatient,Outpatient timescale,"Key Indicator 10: Management level that takes responsibility for audit results \n(Criterion: Met if Executive on the Board, Non-executive on the Board, or Chairman of Clinical Governance takes responsibility for the follow-up of stroke audit results)",Executive on the Board,Non-executive on the Board,Chairman of Clinical Governance (or equivalent)
Unnamed: 0_level_2,Unnamed: 1_level_2,key_indicator_group,NaN,Staffing/Workforce,Staffing/Workforce,Staffing/Workforce,Staffing/Workforce,Staffing/Workforce,7-day working,7-day working,7-day working,7-day working,...,Patient and carer engagement,Patient and carer engagement,TIA service,TIA service,TIA service,TIA service,Quality improvement and leadership,Quality improvement and leadership,Quality improvement and leadership,Quality improvement and leadership
Unnamed: 0_level_3,Unnamed: 1_level_3,question,NaN,NaN,NaN,NaN,NaN,NaN,NaN,1.7. Do you have stroke specialist nurses (band 6 or above) who undertake hyper-acute assessments of suspected stroke patients in A&E?,1.7. Do you have stroke specialist nurses (band 6 or above) who undertake hyper-acute assessments of suspected stroke patients in A&E?,1.7. Do you have stroke specialist nurses (band 6 or above) who undertake hyper-acute assessments of suspected stroke patients in A&E?,...,NaN,NaN,NaN,4.10. Which imaging modality do you most frequently use in your neurovascular clinic for suspected TIAs?,"7.11. Within what timescale can you see, investigate and initiate treatment for ALL your TIA patients?","7.11. Within what timescale can you see, investigate and initiate treatment for ALL your TIA patients?",NaN,9.1. What level of management takes responsibility for the follow-up of the results and recommendations of the Sentinel Stroke Audit?,9.1. What level of management takes responsibility for the follow-up of the results and recommendations of the Sentinel Stroke Audit?,9.1. What level of management takes responsibility for the follow-up of the results and recommendations of the Sentinel Stroke Audit?
SCN,Trust name,Hospital names,Unnamed: 3_level_4,Unnamed: 4_level_4,Unnamed: 5_level_4,Unnamed: 6_level_4,Unnamed: 7_level_4,Unnamed: 8_level_4,Unnamed: 9_level_4,Unnamed: 10_level_4,Unnamed: 11_level_4,Unnamed: 12_level_4,Unnamed: 13_level_4,Unnamed: 14_level_4,Unnamed: 15_level_4,Unnamed: 16_level_4,Unnamed: 17_level_4,Unnamed: 18_level_4,Unnamed: 19_level_4,Unnamed: 20_level_4,Unnamed: 21_level_4,Unnamed: 22_level_4,Unnamed: 23_level_4
London,"Barking, Havering and Redbridge University Hospitals NHS Trust",Queens Hospital Romford HASU,6,Yes,2.73,1.14,No,0.48,Yes,Yes,5,5,...,Yes,Continuous (every patient),No,Computed Tomography,Yes,The same day (7 days a week),Yes,Yes,Yes,Yes
London,Barts Health NHS Trust,Newham General Hospital,6,Yes,1.54,1.54,No,0.0,At site treating your patients during the firs...,At site treating your patients during the firs...,Not Applicable,Not Applicable,...,Yes,Continuous (every patient),Yes,Magnetic Resonance Imaging,Yes,The same day (5 days a week),No,No,No,No
London,Barts Health NHS Trust,Royal London Hospital HASU,5,Yes,2.85,0.38,No,0.23,Yes,Yes,4,4,...,No,Less than once a year,Yes,Magnetic Resonance Imaging,Yes,The same day (5 days a week),No,No,No,No
London,Barts Health NHS Trust,Whipps Cross University Hospital,3,Yes,2.63,0.53,No,0.0,No,No,Not Applicable,Not Applicable,...,No,Never,No,Computed Tomography,Yes,The same day (5 days a week),No,No,No,No
London,Imperial College Healthcare NHS Trust,Charing Cross Hospital HASU,5,Yes,3.67,0.44,No,0.27,Yes,Yes,6,7,...,Yes,3-4 times a year,No,Computed Tomography,Yes,The same day (5 days a week),No,No,No,No


## Save results to file