# PRMT-2477 Pre GP2GP failures from MI data

## Context

We have been doing some research with practices to understand issues that occur during registrations that prevent GP2GP from happening. There are known scenarios such as patients coming from abroad, but we want to see if there are unknown scenarios that also contribute, e.g. to do with PDS or SDS issues. 

Questions to be answered using MI data:

For a given month (Oct or Nov)

1. How many registrations failed with any of the following process failure points:

- 10 = PDS trace
- 20 = PDS update
- 30 = SDS lookup Practice (not used)
- 40 = SDS lookup ASID

2. Are there any registrations that have any of these failure points and eventually go to GP2GP (i.e. have a conversation ID?)

3. Do these process failure points correlate with any of the specific failure types? : 

- 0 = Attempted,
- 1 = Sent,
- 2 = Not Sent - Patient at current practice,
- 3 = Not Sent - Patient known at current practice transferring from non-GP2GP practice,
- 4 = Not Sent - Patient not known at current practice transferring from a non-GP2GP practice,
- 5 = Not Sent – Patient has no previous practice registered,
- 6 = Negative acknowledgement received.

4. Can we tell which registrations are failed but could have gone via GP2GP, vs. which are not eligible for GP2GP e.g. new born, coming from Scotland or Wales, Army, prison, International etc. 

## Notes

Data downloaded from Splunk using the following query:
```
index="gp2gp_nms_prod" sourcetype="gp2gpmi-rr"
| table *
```

In [1]:
import pandas as pd
import numpy as np
import paths, data
from data.practice_metadata import read_asid_metadata

In [2]:
def convert_to_float(val):
    try:
        return int(val)
    except:
        return val

mi_data_file_location = "s3://prm-gp2gp-notebook-data-prod/PRMT-2477-pre-gp2gp-failures/MI_RR-Nov_2021.csv"

dates_fields = ["RegistrationTime", "RequestFailureTime", "RequestTime", "ExtractTime", "ExtractAckTime", "ExtractAckFailureTime"]
practice_registrations = pd.read_csv(mi_data_file_location, parse_dates=dates_fields).fillna("None")

practice_registrations["RequestErrorCode"] = practice_registrations["RequestErrorCode"].apply(convert_to_float)
practice_registrations["RequestFailureType"] = practice_registrations["RequestFailureType"].apply(convert_to_float)

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
practice_registrations["UniqueKey"] = (practice_registrations["RegistrationSmartcardUID"]
                                       + "-"
                                       + practice_registrations["RegistrationTime"].astype(str))

practice_registrations = (
    practice_registrations
        .sort_values(by="_time", ascending=True)
        .drop_duplicates(subset=["UniqueKey"], keep="last")
    )

In [4]:
# Check how many registrations have no Smartcard ID
practice_registrations_without_smartcard_bool = practice_registrations["RegistrationSmartcardUID"]=="None"
practice_registrations_without_smartcard = practice_registrations[practice_registrations_without_smartcard_bool]
practice_registrations_without_smartcard.shape[0]

1

In [5]:
# Check total number of registrations
practice_registrations.shape[0]

553282

In [6]:
# Check how many registrations have no conversation ID
practice_registrations["ConversationID"].value_counts()

None                                    316708
96766a8f-bb0f-43f5-97e2-092a52f2b16c         2
ef7f29ee-95d4-4f75-8824-612fa218b238         2
9cd025c0-41cc-4e10-9414-38bf278078dd         2
7c97eb78-5031-4490-92e8-24b28546cd2e         2
                                         ...  
eb795c4d-14eb-4f6c-8169-17c007a33828         1
9b01e10f-d8af-42f8-ba65-4001579cd4bd         1
df7553d4-1b1d-497b-a94c-b641674f2804         1
bee5c158-103b-4f15-8fd2-b12b40b2676b         1
022D0860-42F1-11EC-A1C7-115A7D3E9379         1
Name: ConversationID, Length: 236558, dtype: int64

In [7]:
# Breakdown of all registrations that did not trigger GP2GP
def has_conversation_id(value):
    if value=="None":
        return 0
    else:
        return 1
    
practice_registrations["Error scenario"] = practice_registrations[["RequestFailurePoint", "RequestFailureType", "RequestErrorCode"]].apply(lambda x: '_'.join(x.astype(str)), axis=1)
practice_registrations["Triggered GP2GP"] = practice_registrations.apply(lambda row: has_conversation_id(row["ConversationID"]), axis=1)

did_not_trigger_gp2gp_bool = practice_registrations["Triggered GP2GP"]==False
practice_registrations_no_gp2gp = practice_registrations[did_not_trigger_gp2gp_bool]

registrations_that_didnt_trigger_gp2gp_grouped_by_failures = (
    practice_registrations_no_gp2gp
        .groupby(by=["RequestFailurePoint", "RequestFailureType", "RequestErrorCode", "Triggered GP2GP"])
        .agg({"UniqueKey": "count"})
        .rename(columns={"UniqueKey": "count"})
        .sort_values(by="count", ascending=False)
    )
registrations_that_didnt_trigger_gp2gp_grouped_by_failures

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count
RequestFailurePoint,RequestFailureType,RequestErrorCode,Triggered GP2GP,Unnamed: 4_level_1
60,5.0,,0,149446
60,0.0,,0,79387
0,0.0,,0,49610
10,,20,0,22395
60,2.0,,0,7466
40,3.0,24,0,3142
40,4.0,24,0,2749
20,0.0,20,0,1121
60,0.0,20,0,793
40,,20,0,509


In [8]:
# Breakdown of all registrations with specific request failure points (in list below)

failure_points_of_interest = [10, 20, 30, 40]
is_failure_point_of_interest = practice_registrations["RequestFailurePoint"].apply(lambda error_code: error_code in failure_points_of_interest)
registrations_with_failure_points_of_interest = practice_registrations[is_failure_point_of_interest]

all_registrations_grouped_by_failures = (
    registrations_with_failure_points_of_interest
        .groupby(by=["RequestFailurePoint", "RequestFailureType", "RequestErrorCode", "Triggered GP2GP"])
        .agg({"UniqueKey": "count"})
        .rename(columns={"UniqueKey": "count"})
        .sort_values(by="count", ascending=False)
    )
all_registrations_grouped_by_failures

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,count
RequestFailurePoint,RequestFailureType,RequestErrorCode,Triggered GP2GP,Unnamed: 4_level_1
10,,20,0,22395
40,3.0,24,0,3142
40,4.0,24,0,2749
20,0.0,20,0,1121
40,,20,0,509
20,0.0,IU030,0,70
20,0.0,-8,1,38
20,0.0,IU056,0,10
20,0.0,,0,5
20,0.0,-3,0,2


In [9]:
# Breakdown of all registrations that did not trigger GP2GP by practice and error scenario
practice_registrations_with_pre_gp2gp_error_scenarios = practice_registrations_no_gp2gp.pivot_table(index=["RequestorODS"], 
        columns=["Error scenario"], 
        values="UniqueKey", 
        aggfunc="count").fillna(0).astype(int)

In [10]:
# Create table with total number of registrations, number of registrations that triggered GP2GP and pre-GP2GP error scenario counts by practice
practice_registrations_summary = practice_registrations.groupby("RequestorODS").agg({"UniqueKey":"count", "Triggered GP2GP": "sum"}).rename(columns={"UniqueKey": "Total registrations"})

all_practice_registrations_with_pre_gp2gp_breakdown = practice_registrations_summary.join(practice_registrations_with_pre_gp2gp_error_scenarios, how="left").fillna(0).astype(int)

In [11]:
# Add practice names (via ASID lookup) to the table above
asid_lookup = read_asid_metadata("prm-gp2gp-ods-metadata-prod", "v2/2021/12/organisationMetadata.json")[["practice_ods_code", "practice_name"]]
asid_lookup = asid_lookup.set_index("practice_ods_code")

practice_registrations_with_pivot_with_practice_names = all_practice_registrations_with_pre_gp2gp_breakdown.join(asid_lookup, on="RequestorODS", how="left").fillna("None")
practice_registrations_with_pivot_with_practice_names = practice_registrations_with_pivot_with_practice_names.drop_duplicates()
practice_registrations_with_pivot_with_practice_names = practice_registrations_with_pivot_with_practice_names.reset_index().rename(columns={"practice_name": "Requesting practice name", "RequestorODS": "Requesting practice ODS"})

column_order = ["Requesting practice name", "Requesting practice ODS", "Total registrations", "Triggered GP2GP", "60_5_None", "60_0_None", "0_0_None", "10_None_20", "60_2_None", "40_3_24", "40_4_24", "20_0_20", "60_0_20", "40_None_20", "20_0_IU030", "20_0_IU056", "20_0_None", "20_0_-3", "20_0_IU052", "20_0_IU066"]
practice_registrations_with_pivot_with_practice_names = practice_registrations_with_pivot_with_practice_names[column_order].sort_values(by="60_5_None", ascending=False)

In [12]:
with pd.ExcelWriter("PRMT-2477-Practice-level-pre-GP2GP-error-scenarios-Nov-2021.xlsx") as writer:
     practice_registrations_with_pivot_with_practice_names.to_excel(writer, sheet_name="Pre GP2GP errors Nov 2021",index=False)

**RequestFailurePoint:**
- 0 = No failure
- 10 = PDS trace
- 20 = PDS update

- 30 = SDS lookup Practice (not used)
- 40 = SDS lookup ASID
- 50 = SDS lookup Contract Props
- 60 = Send Request
- 70 = Manual Request

**RequestFailureType:**
- 0 = Attempted
- 1 = Sent
- 2 = Not Sent - Patient at current practice
- 3 = Not Sent - Patient known at current practice transferring from non-GP2GP practice
- 4 = Not Sent - Patient not known at current practice transferring from a non-GP2GP practice
- 5 = Not Sent – Patient has no previous practice registered
- 6 = Negative acknowledgement received

**RequestErrorCode:**
- 3 = Record available but cannot be sent - DEPRECATED
- 8 = The system’s configuration prevents it from processing this message - DEPRECATED
- 20 = Spine system responded with an error
- 24 = SDS lookup provided zero or more than one result to the query for each interaction
- 25 = Large messages rejected due to timeout duration reached of overall transfer