Hypothesis:
 
We believe there is likely to be an association with deprivation levels and rates of injury. We will do this by comparing 1 month of NHS maternity dataset - tramua reported to the deprivation deciles at birth and observe if there is a positive correlation from there.

Accounting for misreporting:

From previous analysis we are aware there is misreporting of trauma data, with some trusts reporting that they have no trauma in their maternity wards. We will calculate trauma deciles and remove the 90th percentile of no trauma reported in order to remove the most obvious outliers. Unfortunately due to a number of factors it is simply not possible just to remove the trusts that are misreporting, so we will remove the most likely offenders (and some non-offenders)

Due to the offical nature of the reporting it is highly unlikely trusts will over-report trauma, so the 10th percentile doesn't need to be removed.

Update - we will only remove trusts that have trauma rates under 10% as there are only 3 clear cases of misreporting. We will still calculate trauma deciles in an interactive Bokeh chart for fun and visualisation.

These trusts are:
IMPERIAL COLLEGE HEALTHCARE NHS TRUST,
ROYAL FREE LONDON NHS FOUNDATION TRUST,
THE SHREWSBURY AND TELFORD HOSPITAL NHS TRUST


Results:

We have found that the Pearson correlation coefficient is 0.12, with the p-value associated with this being 0.24. As the p-value is greater than 0.05 we fail to reject the null hypothesis, there is no strong evidence to suggest a statistically significant correlation between deprivation and trauma rates.

In [89]:
import pandas as pd
import glob
import numpy as np
import os
from datetime import datetime
import bokeh
import pandas_bokeh
from bokeh.plotting import figure, output_notebook, show
from bokeh.layouts import column, row
from bokeh.models import ColumnDataSource, HoverTool, LabelSet
from bokeh.transform import cumsum
from bokeh.palettes import Category20c
from math import pi
from IPython.display import FileLink

In [90]:
#Initalises our maternity dataset
pd.set_option('plotting.backend', 'pandas_bokeh')
output_notebook()
file_path = 'C:/NHS maternity data/2023/msds-apr2023-exp-data-final.csv'

df = pd.read_csv(file_path)
df.head()

Unnamed: 0,ReportingPeriodStartDate,ReportingPeriodEndDate,Dimension,Org_Level,Org_Code,Org_Name,Measure,Count_Of,Final_value
0,01/04/2023,30/04/2023,AgeAtBookingMotherAvg,National,ALL,ALL SUBMITTERS,,Average over women,31
1,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,National,ALL,ALL SUBMITTERS,20 to 24,Women,6525
2,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,National,ALL,ALL SUBMITTERS,25 to 29,Women,13710
3,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,National,ALL,ALL SUBMITTERS,30 to 34,Women,17640
4,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,National,ALL,ALL SUBMITTERS,35 to 39,Women,9895


In [91]:
#Filters dftrusts to only show NHS trusts, removing counties ect.
dftrusts = df[df['Org_Name'].str.contains('trust', case=False, na=False)]

In [92]:
# Define the measures we want to remove
unwanted_measures = [
    "Missing Value / Value outside reporting parameters",
    "Pseudo postcode recorded (includes no fixed abode or resident overseas)",
    "Resident Elsewhere in UK, Channel Islands or Isle of Man"
]
dftrusts.head(100)

Unnamed: 0,ReportingPeriodStartDate,ReportingPeriodEndDate,Dimension,Org_Level,Org_Code,Org_Name,Measure,Count_Of,Final_value
33452,01/04/2023,30/04/2023,AgeAtBookingMotherAvg,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,,Average over women,31
33453,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,20 to 24,Women,180
33454,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,25 to 29,Women,400
33455,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,30 to 34,Women,525
33456,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,35 to 39,Women,285
...,...,...,...,...,...,...,...,...,...
33547,01/04/2023,30/04/2023,SmokingStatusGroupBooking,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,Non-Smoker / Ex-Smoker,Women,1095
33548,01/04/2023,30/04/2023,SmokingStatusGroupBooking,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,Smoker,Women,95
33549,01/04/2023,30/04/2023,TotalBabies,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,,Babies,1210
33550,01/04/2023,30/04/2023,TotalBookings,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,,Women,1525


In [93]:
dftrusts = dftrusts[~dftrusts['Measure'].isin(unwanted_measures)]
dftrusts.head(100)

Unnamed: 0,ReportingPeriodStartDate,ReportingPeriodEndDate,Dimension,Org_Level,Org_Code,Org_Name,Measure,Count_Of,Final_value
33452,01/04/2023,30/04/2023,AgeAtBookingMotherAvg,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,,Average over women,31
33453,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,20 to 24,Women,180
33454,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,25 to 29,Women,400
33455,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,30 to 34,Women,525
33456,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,35 to 39,Women,285
...,...,...,...,...,...,...,...,...,...
33562,01/04/2023,30/04/2023,ApgarScore5TermGroup7,Provider,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,7 to 10,Babies,240
33564,01/04/2023,30/04/2023,BabyFirstFeedBreastMilkStatus,Provider,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,Maternal or Donor Breast Milk,Babies,135
33565,01/04/2023,30/04/2023,BabyFirstFeedBreastMilkStatus,Provider,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,Not Breast Milk,Babies,135
33566,01/04/2023,30/04/2023,BirthweightTermGroup,Provider,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,2000g to 2499g,Babies,10


In [94]:
dftrusts.loc[dftrusts['Dimension'] == 'DeprivationDecileAtBooking', 'Measure'] = (
    dftrusts.loc[dftrusts['Dimension'] == 'DeprivationDecileAtBooking', 'Measure']
    .apply(lambda x: x[:2] if len(x) > 2 else x) 
)
#Removes 'most/least' deprived segments from measure
dftrusts.loc[(dftrusts['Dimension'] == 'DeprivationDecileAtBooking') & (dftrusts['Measure'] == '01'), 'Measure'] = '1'

# Display the modified DataFrame to verify changes
dftrusts.head(100)


Unnamed: 0,ReportingPeriodStartDate,ReportingPeriodEndDate,Dimension,Org_Level,Org_Code,Org_Name,Measure,Count_Of,Final_value
33452,01/04/2023,30/04/2023,AgeAtBookingMotherAvg,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,,Average over women,31
33453,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,20 to 24,Women,180
33454,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,25 to 29,Women,400
33455,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,30 to 34,Women,525
33456,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,35 to 39,Women,285
...,...,...,...,...,...,...,...,...,...
33562,01/04/2023,30/04/2023,ApgarScore5TermGroup7,Provider,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,7 to 10,Babies,240
33564,01/04/2023,30/04/2023,BabyFirstFeedBreastMilkStatus,Provider,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,Maternal or Donor Breast Milk,Babies,135
33565,01/04/2023,30/04/2023,BabyFirstFeedBreastMilkStatus,Provider,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,Not Breast Milk,Babies,135
33566,01/04/2023,30/04/2023,BirthweightTermGroup,Provider,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,2000g to 2499g,Babies,10


In [95]:
dftrusts.loc[dftrusts['Dimension'] == 'DeprivationDecileAtBooking', 'Measure'] = (
    dftrusts.loc[dftrusts['Dimension'] == 'DeprivationDecileAtBooking', 'Measure']
    .astype(int)  # Convert to integer
)
dftrusts.head(100)

Unnamed: 0,ReportingPeriodStartDate,ReportingPeriodEndDate,Dimension,Org_Level,Org_Code,Org_Name,Measure,Count_Of,Final_value
33452,01/04/2023,30/04/2023,AgeAtBookingMotherAvg,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,,Average over women,31
33453,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,20 to 24,Women,180
33454,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,25 to 29,Women,400
33455,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,30 to 34,Women,525
33456,01/04/2023,30/04/2023,AgeAtBookingMotherGroup,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,35 to 39,Women,285
...,...,...,...,...,...,...,...,...,...
33562,01/04/2023,30/04/2023,ApgarScore5TermGroup7,Provider,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,7 to 10,Babies,240
33564,01/04/2023,30/04/2023,BabyFirstFeedBreastMilkStatus,Provider,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,Maternal or Donor Breast Milk,Babies,135
33565,01/04/2023,30/04/2023,BabyFirstFeedBreastMilkStatus,Provider,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,Not Breast Milk,Babies,135
33566,01/04/2023,30/04/2023,BirthweightTermGroup,Provider,R0B,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,2000g to 2499g,Babies,10


In [96]:
# Filter dftrusts to show only rows where 'Dimension' is 'DeprivationDecileAtBooking'
dfdeprivation = dftrusts[dftrusts['Dimension'] == 'DeprivationDecileAtBooking']

# Display the filtered DataFrame
dfdeprivation.head(100)


Unnamed: 0,ReportingPeriodStartDate,ReportingPeriodEndDate,Dimension,Org_Level,Org_Code,Org_Name,Measure,Count_Of,Final_value
33485,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,2,Women,260
33486,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,3,Women,180
33487,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,4,Women,150
33488,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,5,Women,110
33489,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,6,Women,90
...,...,...,...,...,...,...,...,...,...
34428,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,RAE,BRADFORD TEACHING HOSPITALS NHS FOUNDATION TRUST,1,Women,260
34429,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,RAE,BRADFORD TEACHING HOSPITALS NHS FOUNDATION TRUST,10,Women,5
34532,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,RAJ,MID AND SOUTH ESSEX NHS FOUNDATION TRUST,2,Women,80
34533,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,RAJ,MID AND SOUTH ESSEX NHS FOUNDATION TRUST,3,Women,120


In [97]:
# Sort dfdeprivation by 'Org_Code' and then by 'Measure' in ascending order
dfdeprivation = dfdeprivation.sort_values(by=['Org_Code', 'Measure'])

# Display the sorted DataFrame to verify the sorting
dfdeprivation.head(100)


Unnamed: 0,ReportingPeriodStartDate,ReportingPeriodEndDate,Dimension,Org_Level,Org_Code,Org_Name,Measure,Count_Of,Final_value
33493,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,1,Women,445
33485,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,2,Women,260
33486,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,3,Women,180
33487,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,4,Women,150
33488,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,R0A,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,5,Women,110
...,...,...,...,...,...,...,...,...,...
34427,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,RAE,BRADFORD TEACHING HOSPITALS NHS FOUNDATION TRUST,9,Women,5
34429,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,RAE,BRADFORD TEACHING HOSPITALS NHS FOUNDATION TRUST,10,Women,5
34540,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,RAJ,MID AND SOUTH ESSEX NHS FOUNDATION TRUST,1,Women,65
34532,01/04/2023,30/04/2023,DeprivationDecileAtBooking,Provider,RAJ,MID AND SOUTH ESSEX NHS FOUNDATION TRUST,2,Women,80


To calculate our stats, we will take each measure * final value, then divide this by the total final value. This will give us our 'average deprivation' for each trust

In [98]:
# Calculate total deprivation for each Org_Name
dfdeprivation['Total Deprivation'] = dfdeprivation['Measure'] * dfdeprivation['Final_value']
total_deprivation_by_org = dfdeprivation.groupby('Org_Name')['Total Deprivation'].sum()

# Calculate total value for each Org_Name
total_value_by_org = dfdeprivation.groupby('Org_Name')['Final_value'].sum()

# Display intermediate calculations
total_deprivation_by_org


Org_Name
AIREDALE NHS FOUNDATION TRUST                                      950
AIREDALE NHS TRUST                                                 730
ASHFORD AND ST PETER'S HOSPITALS NHS FOUNDATION TRUST             1985
BARKING, HAVERING AND REDBRIDGE UNIVERSITY HOSPITALS NHS TRUST    2790
BARNSLEY HOSPITAL NHS FOUNDATION TRUST                             875
                                                                  ... 
WIRRAL UNIVERSITY TEACHING HOSPITAL NHS FOUNDATION TRUST          1000
WORCESTERSHIRE ACUTE HOSPITALS NHS TRUST                          2585
WRIGHTINGTON, WIGAN AND LEIGH NHS FOUNDATION TRUST                 790
WYE VALLEY NHS TRUST                                               830
YORK AND SCARBOROUGH TEACHING HOSPITALS NHS FOUNDATION TRUST       225
Name: Total Deprivation, Length: 124, dtype: object

In [108]:
actual_deprivation = total_deprivation_by_org / total_value_by_org
actual_deprivation_df = actual_deprivation.reset_index()
actual_deprivation_df.columns = ['Org_Name', 'Deprivation']

# Display the actual deprivation
actual_deprivation_df.head(100)

Unnamed: 0,Org_Name,Deprivation
0,AIREDALE NHS FOUNDATION TRUST,5.277778
1,AIREDALE NHS TRUST,5.214286
2,ASHFORD AND ST PETER'S HOSPITALS NHS FOUNDATIO...,7.218182
3,"BARKING, HAVERING AND REDBRIDGE UNIVERSITY HOS...",4.292308
4,BARNSLEY HOSPITAL NHS FOUNDATION TRUST,3.723404
...,...,...
95,THE PRINCESS ALEXANDRA HOSPITAL NHS TRUST,6.424242
96,"THE QUEEN ELIZABETH HOSPITAL, KING'S LYNN, NHS...",4.103448
97,THE ROTHERHAM NHS FOUNDATION TRUST,3.744186
98,THE ROYAL WOLVERHAMPTON NHS TRUST,3.613208


We now have our deprivation stats. Now we will Calculate our rates of trauma as a percentage. We will then analyse for an association between the trauma rates and our deprivation rates. Based on this we may analyse for associations between injury rates and other factors.

In [100]:
dftrauma = dftrusts[dftrusts['Dimension'] == 'GenitalTractTraumaticLesionGroup']

# Select only the required columns
dftrauma = dftrauma[['Dimension', 'Org_Name', 'Measure', 'Final_value']]

# Since you want each unique entry under Org_Name, you might consider dropping duplicates if needed
# dftrauma = dftrauma.drop_duplicates(subset=['Org_Name'])

# Display the new DataFrame
dftrauma.head(100)

Unnamed: 0,Dimension,Org_Name,Measure,Final_value
33511,GenitalTractTraumaticLesionGroup,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,At least one traumatic lesion,255
33512,GenitalTractTraumaticLesionGroup,MANCHESTER UNIVERSITY NHS FOUNDATION TRUST,No traumatic lesion reported,390
33617,GenitalTractTraumaticLesionGroup,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,At least one traumatic lesion,170
33618,GenitalTractTraumaticLesionGroup,SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION T...,No traumatic lesion reported,30
33724,GenitalTractTraumaticLesionGroup,UNIVERSITY HOSPITALS DORSET NHS FOUNDATION TRUST,At least one traumatic lesion,110
...,...,...,...,...
38471,GenitalTractTraumaticLesionGroup,ST GEORGE'S UNIVERSITY HOSPITALS NHS FOUNDATIO...,At least one traumatic lesion,110
38472,GenitalTractTraumaticLesionGroup,ST GEORGE'S UNIVERSITY HOSPITALS NHS FOUNDATIO...,No traumatic lesion reported,90
38582,GenitalTractTraumaticLesionGroup,SOUTH WARWICKSHIRE UNIVERSITY NHS FOUNDATION T...,At least one traumatic lesion,115
38583,GenitalTractTraumaticLesionGroup,SOUTH WARWICKSHIRE UNIVERSITY NHS FOUNDATION T...,No traumatic lesion reported,30


In [103]:

# Group by 'Org_Name' and 'Measure', and sum up 'Final_value'
dftrauma2 = dftrauma.groupby(['Org_Name', 'Measure'])['Final_value'].sum().unstack()

# Calculate total Final_value for each Org_Name
dftrauma2['Total'] = dftrauma2.sum(axis=1)

# Display grouped to verify

# Calculate the trauma percentage
dftrauma2['Trauma Percentage'] = (dftrauma2['At least one traumatic lesion'] / dftrauma2['Total']) * 100
columns = dftrauma2.columns.tolist()



# Assign the modified list back to the DataFrame's columns
dftrauma2.columns = columns

# Display to verify
dftrauma2.head(200)


Unnamed: 0_level_0,At least one traumatic lesion,No traumatic lesion reported,Total,Trauma Percentage
Org_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AIREDALE NHS FOUNDATION TRUST,20.0,60.0,80.0,25.000000
AIREDALE NHS TRUST,20.0,55.0,75.0,26.666667
ASHFORD AND ST PETER'S HOSPITALS NHS FOUNDATION TRUST,80.0,35.0,115.0,69.565217
"BARKING, HAVERING AND REDBRIDGE UNIVERSITY HOSPITALS NHS TRUST",170.0,140.0,310.0,54.838710
BARNSLEY HOSPITAL NHS FOUNDATION TRUST,85.0,50.0,135.0,62.962963
...,...,...,...,...
WIRRAL UNIVERSITY TEACHING HOSPITAL NHS FOUNDATION TRUST,75.0,60.0,135.0,55.555556
WORCESTERSHIRE ACUTE HOSPITALS NHS TRUST,155.0,70.0,225.0,68.888889
"WRIGHTINGTON, WIGAN AND LEIGH NHS FOUNDATION TRUST",30.0,25.0,55.0,54.545455
WYE VALLEY NHS TRUST,45.0,15.0,60.0,75.000000


Here we find an example of misreporting. If you look in the dataframe above and sort by trauma percentage ascending you'll find Imperial College Healthcare, Royal Free London NHS foundation trust and Shrewsbury and Telford Hospital report trauma at ridiculously low rates. This data is certainly incorrect. For the others it may be possible that they have such low injury rates as they have such low numbers in their maternity suite.

In [104]:
print("Index:", dftrauma2.index.names)
print("Columns:", dftrauma2.columns)

Index: ['Org_Name']
Columns: Index(['At least one traumatic lesion', 'No traumatic lesion reported',
       'Total', 'Trauma Percentage'],
      dtype='object')


In [110]:
# Reset the index to make 'Org_Name' a column
dftrauma2.reset_index(inplace=True)

# Verify that 'Org_Name' is now a column
print(dftrauma2.columns)

final_df = actual_deprivation_df.merge(dftrauma2[['Org_Name', 'Trauma Percentage']], on='Org_Name', how='left')

# Display the final DataFrame to verify
print(final_df.head())

Index(['Org_Name', 'At least one traumatic lesion',
       'No traumatic lesion reported', 'Total', 'Trauma Percentage'],
      dtype='object')
                                            Org_Name Deprivation  \
0                      AIREDALE NHS FOUNDATION TRUST    5.277778   
1                                 AIREDALE NHS TRUST    5.214286   
2  ASHFORD AND ST PETER'S HOSPITALS NHS FOUNDATIO...    7.218182   
3  BARKING, HAVERING AND REDBRIDGE UNIVERSITY HOS...    4.292308   
4             BARNSLEY HOSPITAL NHS FOUNDATION TRUST    3.723404   

   Trauma Percentage  
0          25.000000  
1          26.666667  
2          69.565217  
3          54.838710  
4          62.962963  


In [111]:
final_df.head(100)

Unnamed: 0,Org_Name,Deprivation,Trauma Percentage
0,AIREDALE NHS FOUNDATION TRUST,5.277778,25.000000
1,AIREDALE NHS TRUST,5.214286,26.666667
2,ASHFORD AND ST PETER'S HOSPITALS NHS FOUNDATIO...,7.218182,69.565217
3,"BARKING, HAVERING AND REDBRIDGE UNIVERSITY HOS...",4.292308,54.838710
4,BARNSLEY HOSPITAL NHS FOUNDATION TRUST,3.723404,62.962963
...,...,...,...
95,THE PRINCESS ALEXANDRA HOSPITAL NHS TRUST,6.424242,61.538462
96,"THE QUEEN ELIZABETH HOSPITAL, KING'S LYNN, NHS...",4.103448,75.000000
97,THE ROTHERHAM NHS FOUNDATION TRUST,3.744186,75.000000
98,THE ROYAL WOLVERHAMPTON NHS TRUST,3.613208,69.767442


In [130]:

# Calculate deciles for trauma percentage
final_df['Decile'] = pd.qcut(final_df['Trauma Percentage'], 10, labels=False) + 1  # +1 to make deciles start from 1



final_df['Deprivation'] = pd.to_numeric(final_df['Deprivation'], errors='coerce')
final_df['Trauma Percentage'] = pd.to_numeric(final_df['Trauma Percentage'], errors='coerce')
print(final_df['Deprivation'].dtype)
print(final_df['Trauma Percentage'].dtype)
final_df.head(100)


float64
float64


Unnamed: 0,Org_Name,Deprivation,Trauma Percentage,Decile
0,AIREDALE NHS FOUNDATION TRUST,5.277778,25.000000,1.0
1,AIREDALE NHS TRUST,5.214286,26.666667,1.0
2,ASHFORD AND ST PETER'S HOSPITALS NHS FOUNDATIO...,7.218182,69.565217,7.0
3,"BARKING, HAVERING AND REDBRIDGE UNIVERSITY HOS...",4.292308,54.838710,3.0
4,BARNSLEY HOSPITAL NHS FOUNDATION TRUST,3.723404,62.962963,5.0
...,...,...,...,...
95,THE PRINCESS ALEXANDRA HOSPITAL NHS TRUST,6.424242,61.538462,4.0
96,"THE QUEEN ELIZABETH HOSPITAL, KING'S LYNN, NHS...",4.103448,75.000000,9.0
97,THE ROTHERHAM NHS FOUNDATION TRUST,3.744186,75.000000,9.0
98,THE ROYAL WOLVERHAMPTON NHS TRUST,3.613208,69.767442,7.0


In [132]:
output_notebook()

# Assuming 'final_df' DataFrame is already defined and includes 'Org_Name' and 'Trauma Percentage'
# Add decile information
final_df['Decile'] = pd.qcut(final_df['Trauma Percentage'], 10, labels=False) + 1

# Create a data source for Bokeh
source = ColumnDataSource(data={
    'x': final_df['Trauma Percentage'],
    'y': final_df['Deprivation'],
    'names': final_df['Org_Name']
})

# Create the plot
p = figure(title="Trauma Percentage by Organization",
           x_axis_label='Trauma Percentage', y_axis_label='Deprivation',  # Changed here
           tools="pan,wheel_zoom,box_zoom,reset", width=800, height=600)

# Add labels for each point
labels = LabelSet(x='x', y='y', text='names', x_offset=5, y_offset=5, source=source)
p.add_layout(labels)

# Add decile lines
decile_positions = final_df['Trauma Percentage'].quantile([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]).values
for pos in decile_positions:
    decile_line = Span(location=pos, dimension='height', line_color='green', line_dash='dashed', line_width=1)
    p.add_layout(decile_line)

# Display the plot
show(p)
