# Summary of transfer outcomes for practice M85092

**Context**

We would like to see a summary of transfer outcomes where the sending practice is M85092 for April data (if available), otherwise March data. 

NB: Upon finding there were only 8 relevant transfers, we also used March data (which contained 2600 transfers)

**Scope**

- Breakdown of transfers out, per month
    - Show outcome of each transfer
    - Show which practices they were sent to

In [1]:
import pandas as pd

In [48]:
# Import transfer files to extract whether message creator is sender or requester
# Using data generated from branch PRMT-1742-duplicates-analysis.
# This is needed to correctly handle duplicates.
# Once the upstream pipeline has a fix for duplicate EHRs, then we can go back to using the main output.

transfer_file_location = "s3://prm-gp2gp-data-sandbox-dev/duplicate-fix-14-day-cut-off/"
transfer_files = [
    "2021/03/transfers.parquet",
    "2021/04/transfers.parquet"
]

transfer_input_files = [transfer_file_location + f for f in transfer_files]
transfers = pd.concat((
    pd.read_parquet(f)
    for f in transfer_input_files
))

# Correctly interpret certain sender errors as failed.
# This is explained in PRMT-1974. Eventually this will be fixed upstream in the pipeline.
pending_sender_error_codes=[6,7,10,24,30,23,14,99]
transfers_with_pending_sender_code_bool=transfers['sender_error_code'].isin(pending_sender_error_codes)
transfers_with_pending_with_error_bool=transfers['status']=='PENDING_WITH_ERROR'
transfers_which_need_pending_to_failure_change_bool=transfers_with_pending_sender_code_bool & transfers_with_pending_with_error_bool
transfers.loc[transfers_which_need_pending_to_failure_change_bool,'status']='FAILED'

# Add integrated Late status
eight_days_in_seconds=8*24*60*60
transfers_after_sla_bool=transfers['sla_duration']>eight_days_in_seconds
transfers_with_integrated_bool=transfers['status']=='INTEGRATED'
transfers_integrated_late_bool=transfers_after_sla_bool & transfers_with_integrated_bool
transfers.loc[transfers_integrated_late_bool,'status']='INTEGRATED LATE'

# If the record integrated after 14 days, change the status back to pending.
# This is to handle each month consistently and to always reflect a transfers status 28 days after it was made.
# TBD how this is handled upstream in the pipeline
fourteen_days_in_seconds=14*24*60*60
transfers_after_month_bool=transfers['sla_duration']>fourteen_days_in_seconds
transfers_pending_at_month_bool=transfers_after_month_bool & transfers_integrated_late_bool
transfers.loc[transfers_pending_at_month_bool,'status']='PENDING'
transfers_with_early_error_bool=(~transfers.loc[:,'sender_error_code'].isna()) |(~transfers.loc[:,'intermediate_error_codes'].apply(len)>0)
transfers.loc[transfers_with_early_error_bool & transfers_pending_at_month_bool,'status']='PENDING_WITH_ERROR'

# Supplier name mapping
supplier_renaming = {
    "SystmOne":"TPP",
    None: "Unknown"
}

asid_lookup_file = "s3://prm-gp2gp-data-sandbox-dev/asid-lookup/asidLookup-Mar-2021.csv.gz"
asid_lookup = pd.read_csv(asid_lookup_file)
lookup = asid_lookup[["ASID","NACS","OrgName"]]

transfers = transfers.merge(lookup, left_on='requesting_practice_asid',right_on='ASID',how='left')
transfers = transfers.rename({'ASID': 'requesting_supplier_asid', 'NACS': 'requesting_ods_code','OrgName':'requesting_practice_name'}, axis=1)
transfers = transfers.merge(lookup, left_on='sending_practice_asid',right_on='ASID',how='left')
transfers = transfers.rename({'ASID': 'sending_supplier_asid', 'NACS': 'sending_ods_code','OrgName':'sending_practice_name'}, axis=1)

transfers["sending_supplier"] = transfers["sending_supplier"].replace(supplier_renaming.keys(), supplier_renaming.values())
transfers["requesting_supplier"] = transfers["requesting_supplier"].replace(supplier_renaming.keys(), supplier_renaming.values())

In [49]:
# Select the transfers where the sending practice is the practice of interest
practice_of_interest_bool = transfers["sending_ods_code"] == "M85092"
practice_transfers = transfers[practice_of_interest_bool]

In [55]:
# Create a table showing numbers of transfers to each practice and the status (at 14 days)
# Both the practice (rows) and status (columns) are ordered by most common first
ordered_requesting_practice_names=practice_transfers['requesting_practice_name'].value_counts().index
ordered_status=practice_transfers['status'].value_counts().index

practice_transfers_count_table=practice_transfers.pivot_table(index='requesting_practice_name',columns='status',values='conversation_id',aggfunc='count')
practice_transfers_count_table=practice_transfers_count_table.loc[ordered_requesting_practice_names,ordered_status].fillna(0).astype(int)
practice_transfers_count_table.to_csv( "s3://prm-gp2gp-data-sandbox-dev/notebook-outputs/38-PRMT-2076-M85092-Mar-Apr-transfers-out.csv")