Download and process Out of Area Placement Data

In [None]:
!pip install pandas pyarrow odfpy

In [None]:
import os
import pandas as pd

In [None]:
os.chdir('/home/jovyan/work')

Load file from URL, and stitch two sheets together (respondents and scores). Save this as a parquet file, to prevent having to retreive in future. Delete the file to force a download.

In [None]:
source_url = 'https://www.cqc.org.uk/sites/default/files/20211201_cmh21_BenchmarkData%20V1.ods'
parquet_file = 'scratch/chms.parquet'
try:
    data = pd.read_parquet(parquet_file)
except:
    respondents = pd.read_excel(source_url, sheet_name='CMH21_Trust_Respondents')
    scores = pd.read_excel(source_url, sheet_name='CMH21_Trust_Scores').drop(columns=['Trustname'])
    data = respondents.merge(scores, left_on='TrustCode', right_on='Trustcode').drop(columns=['Trustcode'])
    data.to_parquet(parquet_file)

Load some reference data

In [None]:
ccg = pd.read_csv('data/ref_ccg21.csv')
trust_lookup = pd.read_csv('data/trust_ccg_lookup_ods_datapoint.csv').drop(columns=['Name', 'Primary Role Name', 'Geographic Primary Care Organisation Name'])
trust_lookup.columns = ['TrustCode', 'CCG21CDH']
trust_lookup = trust_lookup.merge(ccg).drop(columns=['CCG21NM'])

Reshape the CMHS data by dropping some columns, and merging with the CCG info. Write a file of failed lines, then summarise by CCG. Reformat the file then write out.

In [None]:
cmhs = data.drop(columns=data.columns[~data.columns.str.match("^(meanQ|TrustCode)")]).merge(trust_lookup, how='left', on='TrustCode')
cmhs.loc[cmhs.CCG21CDH.isnull()].to_csv('scratch/cmhs_failure.csv')
cmhs = cmhs.loc[~cmhs.CCG21CDH.isnull()].groupby('CCG21CD').agg('mean')
cmhs.columns = cmhs.columns.str.replace(r'mean(.*)', r'CMHS \1 Mean', regex=True)
cmhs.round(3).to_csv('data/cmhs.processed.csv')