# Hypothesis: Are digitised practices causing more failures?

Off the notebook PRMT-2332: validating the revised output.
- Write some validation that supplier is the same in both dataset
- Compare with Excel: GP Practice list sent by CCG

In [1]:
import pandas as pd
import numpy as np
import paths
from data.practice_metadata import read_asid_metadata

In [2]:
asid_lookup=read_asid_metadata("prm-gp2gp-ods-metadata-preprod", "v2/2021/8/organisationMetadata.json")

transfer_file_location = "s3://prm-gp2gp-transfer-data-preprod/v4/"

transfer_files = [
    "2021/5/transfers.parquet",
    "2021/6/transfers.parquet",
    "2021/7/transfers.parquet"
]
transfer_input_files = [transfer_file_location + f for f in transfer_files]

transfers_raw = pd.concat((
    pd.read_parquet(f)
    for f in transfer_input_files
))

transfers = transfers_raw\
    .join(asid_lookup.add_prefix("requesting_"), on="requesting_practice_asid", how="left")\
    .join(asid_lookup.add_prefix("sending_"), on="sending_practice_asid", how="left")\

transfers['month']=transfers['date_requested'].dt.to_period('M')

In [3]:
def generate_monthly_outcome_breakdown(transfers, columns):
    total_transfers = (
        transfers
            .groupby(columns)
            .size()
            .to_frame("Total Transfers")
    )
    
    transfer_outcomes=pd.pivot_table(
        transfers,
        index=columns,
        columns=["status"],
        aggfunc='size'
    )
    

    transfer_outcomes_pc = (
        transfer_outcomes
            .div(total_transfers["Total Transfers"],axis=0)
            .multiply(100)
            .round(2)
            .add_suffix(" %")
    )
    
    failed_transfers = (
        transfers
            .assign(failed_transfer=transfers["status"] != "INTEGRATED_ON_TIME")
            .groupby(columns)
            .agg({'failed_transfer': 'sum'})
            .rename(columns={'failed_transfer': 'ALL_FAILURE'})
    )
    
    failed_transfers_pc = (
        failed_transfers
            .div(total_transfers["Total Transfers"],axis=0)
            .multiply(100)
            .round(2)
            .add_suffix(" %")
    )
    
    

    return pd.concat([
        total_transfers,
        transfer_outcomes,
        failed_transfers,
        transfer_outcomes_pc,
        failed_transfers_pc,
    ],axis=1).fillna(0)


## Generate national transfer outcomes

In [4]:
national_metrics_monthly=generate_monthly_outcome_breakdown(transfers, ["month"])

## Generate digitised CCG transfer outcomes

In [5]:
ccgs_to_investigate = [
    "NHS SUNDERLAND CCG",
    'NHS FYLDE AND WYRE CCG',
    'NHS CHORLEY AND SOUTH RIBBLE CCG',
    'NHS BLACKPOOL CCG',
    'NHS BIRMINGHAM AND SOLIHULL CCG'
]
is_requesting_ccg_of_interest = transfers.requesting_ccg_name.isin(ccgs_to_investigate)
is_sending_ccg_of_interest = transfers.sending_ccg_name.isin(ccgs_to_investigate)

requesting_transfers_of_interest = transfers[is_requesting_ccg_of_interest]
sending_transfers_of_interest = transfers[is_sending_ccg_of_interest]

### Requesting CCGs (Digitised)

In [6]:
requesting_ccgs_monthly=generate_monthly_outcome_breakdown(
    transfers=requesting_transfers_of_interest,
    columns=["requesting_ccg_name", "month"]
)

### Sending CCGs (Digitised)

In [7]:
sending_ccgs_monthly=generate_monthly_outcome_breakdown(
    transfers=sending_transfers_of_interest,
    columns=["sending_ccg_name", "month"]
)

### Requesting practices (digitised)

In [8]:
requesting_practices_monthly=generate_monthly_outcome_breakdown(
    transfers=requesting_transfers_of_interest,
    columns=["requesting_ccg_name", "requesting_practice_name", "requesting_practice_ods_code", "requesting_supplier", "month"]
)

### Sending practices (digitised)

In [9]:
sending_practices_monthly=generate_monthly_outcome_breakdown(
    transfers=sending_transfers_of_interest,
    columns=["sending_ccg_name", "sending_practice_name", "sending_practice_ods_code", "sending_supplier", "month"]
)

## Validation

### Comparing supplier names do not change between requesting and sending 

In [10]:
sending_practices_monthly_flat = sending_practices_monthly.reset_index().set_index("sending_practice_ods_code")
requesting_practices_monthly_flat = requesting_practices_monthly.reset_index().set_index("requesting_practice_ods_code")
sending_practices_monthly_flat = sending_practices_monthly_flat[["sending_supplier"]]
requesting_practices_monthly_flat = requesting_practices_monthly_flat[["requesting_supplier"]]

In [11]:
all_practices_monthly_flat = sending_practices_monthly_flat.join(requesting_practices_monthly_flat, how="outer")
all_practices_monthly_flat["sending_supplier"].value_counts()

EMIS        1804
SystmOne     459
Vision        31
Name: sending_supplier, dtype: int64

In [12]:
all_practices_monthly_flat["requesting_supplier"].value_counts()

EMIS        1804
SystmOne     459
Vision        31
Name: requesting_supplier, dtype: int64

In [13]:
same_supplier_bool = all_practices_monthly_flat["sending_supplier"] == all_practices_monthly_flat["requesting_supplier"]
same_supplier_bool.value_counts()

True    2294
dtype: int64

This indicates that the sending supplier and requesting supplier names match

### Comparing supplier names against provided GP practice list

In [14]:
provided_gp_list = pd.read_csv("s3://prm-gp2gp-notebook-data-prod/PRMT-2332-Lloyd-George-digitalisation/GP-Practice-list.csv")
provided_gp_list = provided_gp_list.set_index("NACS")

In [15]:
provided_practices_monthly = provided_gp_list.join(all_practices_monthly_flat, how="left", rsuffix="_provided")

In [16]:
provided_practices_monthly['Clinical System'].value_counts()

EMIS        913
SystmOne    459
Vision       31
Name: Clinical System, dtype: int64

In [17]:
same_provided_supplier_bool = provided_practices_monthly['Clinical System'] == provided_practices_monthly["requesting_supplier"]
same_provided_supplier_bool.value_counts()

True     1402
False       1
dtype: int64

#### Looking at the one practice with mismatch of supplier names

In [19]:
missing = provided_practices_monthly[same_provided_supplier_bool == False]
missing[['Practice', 'Clinical System', 'requesting_supplier', 'sending_supplier']]

Unnamed: 0,Practice,Clinical System,requesting_supplier,sending_supplier
Y01057,The Health Xchange,EMIS,,


## Conclusion
The output from PRMT-2332 (revised edition) appears to not have the issue where supplier names are different between sending and requesting. Also compared with the provided GP practice list - the supplier names do match (all apart from one - The Health Xchange where we do not have data on this practice)