# PRMT-1932 Look at effect of Lloyd George Digitisation on failure rates for CCGs and practices

## Context
Do practices and CCGs that are doing Lloyd George digitisation, experience higher levels of failure? We want to look at our data to see if this is true, and if so, why are transfers failing?

## Scope

### CCGs
- Look at failure rates on transfers out over the last 6 months for:

    - Blackpool CCG
    - Chorley and South Ribble CCG
    - Sunderland CCG

- Document failure rate for each month so we can see if they have gone down
- Look at each CCG separately
- If failures have increased month on month, show breakdown of failure reasons per supplier pathway for each CCG

### Practices
- Look at the failure rates on transfers out over the last 6 months for:

    - Adelaide St practice
    - Library House practice

- Document failure rate for each month so we can see if they have gone down

- Look at each practice separately

- If failures have increased month on month, show breakdown of failure reasons per supplier pathway for each Practice

## Acceptance Criteria
We have documented the failures for each organisation and understand for each whether failures have been increasing. 

For any organisation that has seen an increase in failures, we know the reasons for failures and what suppliers they’re going to 

In [19]:
import pandas as pd
import numpy as np
import json
import urllib

In [20]:
error_code_lookup_file = pd.read_csv("https://raw.githubusercontent.com/nhsconnect/prm-gp2gp-data-sandbox/master/data/gp2gp_response_codes.csv")

In [21]:
transfer_file_location = "s3://prm-gp2gp-data-sandbox-dev/transfers-duplicates-hypothesis/"
transfer_files = [
    "9-2020-transfers.parquet",
    "10-2020-transfers.parquet",
    "11-2020-transfers.parquet",
    "12-2020-transfers.parquet",
    "1-2021-transfers.parquet",
    "2-2021-transfers.parquet"
]
transfer_input_files = [transfer_file_location + f for f in transfer_files]
transfers_raw = pd.concat((
    pd.read_parquet(f)
    for f in transfer_input_files
))
# This is only needed when using transfers-duplicates-hypothesis datasets
transfers_raw = transfers_raw.drop(["sending_supplier", "requesting_supplier"], axis=1)

In [22]:
asid_lookup_file = "s3://prm-gp2gp-data-sandbox-dev/asid-lookup/asidLookup-Mar-2021.csv.gz"
asid_lookup = pd.read_csv(asid_lookup_file)

In [23]:
# Given the findings in PRMT-1742 - many duplicate EHR errors are misclassified, the below reclassifies the relevant data
successful_transfers_bool = transfers_raw['request_completed_ack_codes'].apply(lambda ack_codes: True in [(np.isnan(code) or code==15) for code in ack_codes])
transfers = transfers_raw.copy()
transfers.loc[successful_transfers_bool, "status"] = "INTEGRATED"

In [24]:
# Given the findings in PRMT-1960 - we re-classify transfers with certain sender error codes as failed
pending_sender_error_codes=[6,7,10,24,30,23,14,99]
transfers_with_pending_sender_code_bool=transfers['sender_error_code'].isin(pending_sender_error_codes)
transfers_with_pending_with_error_bool=transfers['status']=='PENDING_WITH_ERROR'
transfers_which_need_pending_to_failure_change_bool=transfers_with_pending_sender_code_bool & transfers_with_pending_with_error_bool
transfers.loc[transfers_which_need_pending_to_failure_change_bool,'status']='FAILED'

In [25]:
# Supplier name mapping
supplier_renaming = {
    "EGTON MEDICAL INFORMATION SYSTEMS LTD (EMIS)":"EMIS",
    "IN PRACTICE SYSTEMS LTD":"Vision",
    "MICROTEST LTD":"Microtest",
    "THE PHOENIX PARTNERSHIP":"TPP",
    None: "Unknown"
}

lookup = asid_lookup[["ASID", "MName", "NACS"]]

transfers = transfers.merge(lookup, left_on='requesting_practice_asid',right_on='ASID',how='left').drop("NACS", axis=1)
transfers = transfers.rename({'MName': 'requesting_supplier', 'ASID': 'requesting_supplier_asid'}, axis=1)
transfers = transfers.merge(lookup, left_on='sending_practice_asid',right_on='ASID',how='left')
transfers = transfers.rename({'MName': 'sending_supplier', 'ASID': 'sending_supplier_asid', 'NACS': 'sending_ods_code'}, axis=1)

transfers["sending_supplier"] = transfers["sending_supplier"].replace(supplier_renaming.keys(), supplier_renaming.values())
transfers["requesting_supplier"] = transfers["requesting_supplier"].replace(supplier_renaming.keys(), supplier_renaming.values())

# Generate View of each CCG or practice

In [26]:
NACS_to_investigate=dict()
NACS_to_investigate['Blackpool']=["P81004","P81074","P81714","P81042","P81081","P81115","P81073","P81072","P81066","P81054","P81172","P81681","P81043","P81159","P81063","P81092","P81016"]
NACS_to_investigate['Chorley']=["P81082","P81010","Y02466","P81076","Y03656","P81740","P81692","P81117","P81180","P81033","P81701","Y00347","P81154","P81181","P81044","P81186","P81687","P81062","P81083","P81741","P81038","P81171","P81127","P81655","P81143","P81057"]
NACS_to_investigate['Sunderland']=["A89036","A89012","A89018","A89008","A89025","A89009","A89011","A89031","A89013","A89040","A89614","A89019","A89001","A89623","A89022","A89002","A89010","A89020","A89015","A89028","A89041","A89004","A89023","A89617","A89021","A89017","A89032","A89026","A89007","A89034","A89616","A89035","A89027","A89016","A89024","A89005","A89006","A89030","Y01262"]
NACS_to_investigate['Adelaide']=["P81042"]
NACS_to_investigate['Library']=["P81044"]

In [27]:
lg_transfers=transfers.copy()
lg_transfers['Investigation Group']=None

for investigation_group in NACS_to_investigate.keys():
    list_lg_practices = NACS_to_investigate[investigation_group]
    is_lg_practice_bool = lg_transfers["sending_ods_code"].apply(lambda ods_code: ods_code in list_lg_practices)
    lg_transfers.loc[is_lg_practice_bool,'Investigation Group']=investigation_group

lg_transfers['Investigation Group'].value_counts()

Sunderland    4902
Chorley       3991
Blackpool     3625
Adelaide       352
Library        246
Name: Investigation Group, dtype: int64

In [28]:
lg_transfers['Month']=lg_transfers['date_requested'].dt.to_period('M')
lg_outcomes=pd.pivot_table(lg_transfers,index=['Investigation Group','Month'],columns='status',values='conversation_id',aggfunc='count')
lg_outcomes=lg_outcomes.fillna(0).astype(int)

lg_outcomes_pc=lg_outcomes.copy()
lg_outcomes_pc=(lg_outcomes_pc.div(lg_outcomes_pc.sum(axis=1),axis=0)*100).round(2)
lg_outcomes_pc.columns=lg_outcomes_pc.columns + " %"
lg_outcomes=pd.concat([lg_outcomes,lg_outcomes_pc],axis=1)


### Output to Excel

In [29]:
excel_tables=lg_outcomes.index.get_level_values('Investigation Group').drop_duplicates()

writer = pd.ExcelWriter('prmt-1932-ccg-outcomes.xlsx', engine='xlsxwriter')

[lg_outcomes.loc[excel_table].to_excel(writer,sheet_name=excel_table) for excel_table in excel_tables]

writer.save()