
# GP2GP Technical Error Counts - Jan 2020 to Oct 2020

Context:

JGPIT Futures would like to know the current impact/size of the problem of paper processing as a result of GP2GP failures. 
Finding the totals of technical failures involves calculating, for each month:
- the total number of transfers
- the total number of successful transfers completed within 8 days SLA
- the total number of successful transfers completed beyond 8 days SLA
- the total number of technical errors
- the total number of other process/tech errors

Assumptions:

We defined technical errors as:
- conversations that have completed but there is an error code in the final Application Acknowledgement Message MCCI_IN010000UK13 of the RCMR_IN030000UK06 Request Started message.
- conversations that are still pending (i.e. no final Application Acknowledgement Message MCCI_IN010000UK13 of the RCMR_IN030000UK06 Request Started message received within the following month), and have error codes in any of the intermediate messages (for example any other MCCI_IN010000UK13 Application Acknowledgement  Messages.

We defined other process/tech errors as:
- As conversations that are still pending (i.e. there is no final Application Acknowledgement Message MCCI_IN010000UK13 of the RCMR_IN030000UK06 Request Started message received within the following month) and without any errors

Requirements:

This notebook uses the following Splunk query, where we obtained data from entire months of January 2020, Feburary 2020, March 2020, April 2020, May 2020, June 2020, July 2020, August 2020, September 2020, October 2020 and November 2020:
```
index="spine2vfmmonitor" service="gp2gp"
| search interactionID="urn:nhs:names:services:gp2gp/*"
| rex field=fromPartyID "(?<fromNACS>.+)(-\d*)"
| rex field=toPartyID "(?<toNACS>.+)(-\d*)"
| fields _time, conversationID, GUID, interactionID, fromNACS, toNACS, messageRef, jdiEvent
| fields - _raw
```

`gzip filename` to create the <month>.csv.gz files  

In [1]:
import pandas as pd
from matplotlib import pyplot as plt

In [2]:
import paths
from datetime import datetime
from dateutil.tz import tzutc

from gp2gp.date.range import DateTimeRange
from scripts.gp2gp_spine_outcomes import calculate_counts

In [3]:
january_data_file_name="../data/months/Jan-2020.csv.gz"
february_data_file_name="../data/months/Feb-2020.csv.gz"
march_data_file_name = "../data/months/Mar-2020.csv.gz"
april_data_file_name = "../data/months/Apr-2020.csv.gz"
may_data_file_name = "../data/months/May-2020.csv.gz"
june_data_file_name = "../data/months/Jun-2020.csv.gz"
july_data_file_name = "../data/months/July-2020.csv.gz"
august_data_file_name = "../data/months/Aug-2020.csv.gz"
september_data_file_name="../data/months/Sept-2020.csv.gz"
october_data_file_name = "../data/months/Oct-2020.csv.gz"
november_data_file_name = "../data/months/Nov-2020.csv.gz"

In [4]:
january_time_range = DateTimeRange(
    datetime(year=2020, month=1, day=1, tzinfo=tzutc()),
    datetime(year=2020, month=2, day=1, tzinfo=tzutc()),
)

january_transfer_outcomes = calculate_counts(january_data_file_name, february_data_file_name, january_time_range)

In [5]:
february_time_range = DateTimeRange(
    datetime(year=2020, month=2, day=1, tzinfo=tzutc()),
    datetime(year=2020, month=3, day=1, tzinfo=tzutc()),
)

february_transfer_outcomes = calculate_counts(february_data_file_name, march_data_file_name, february_time_range)

In [6]:
march_time_range = DateTimeRange(
    datetime(year=2020, month=3, day=1, tzinfo=tzutc()),
    datetime(year=2020, month=4, day=1, tzinfo=tzutc()),
)

march_transfer_outcomes = calculate_counts(march_data_file_name, april_data_file_name, march_time_range)

In [7]:
april_time_range = DateTimeRange(
    datetime(year=2020, month=4, day=1, tzinfo=tzutc()),
    datetime(year=2020, month=5, day=1, tzinfo=tzutc()),
)

april_transfer_outcomes = calculate_counts(april_data_file_name, may_data_file_name, april_time_range)

In [8]:
may_time_range = DateTimeRange(
    datetime(year=2020, month=5, day=1, tzinfo=tzutc()),
    datetime(year=2020, month=6, day=1, tzinfo=tzutc()),
)

may_transfer_outcomes = calculate_counts(may_data_file_name, june_data_file_name, may_time_range)

In [9]:
june_time_range = DateTimeRange(
    datetime(year=2020, month=6, day=1, tzinfo=tzutc()),
    datetime(year=2020, month=7, day=1, tzinfo=tzutc()),
)

june_transfer_outcomes = calculate_counts(june_data_file_name, july_data_file_name, june_time_range)

In [10]:
july_time_range = DateTimeRange(
    datetime(year=2020, month=7, day=1, tzinfo=tzutc()),
    datetime(year=2020, month=8, day=1, tzinfo=tzutc()),
)

july_transfer_outcomes = calculate_counts(july_data_file_name, august_data_file_name, july_time_range)

In [11]:
august_time_range = DateTimeRange(
    datetime(year=2020, month=8, day=1, tzinfo=tzutc()),
    datetime(year=2020, month=9, day=1, tzinfo=tzutc()),
)

august_transfer_outcomes = calculate_counts(august_data_file_name, september_data_file_name, august_time_range)

In [13]:
september_time_range = DateTimeRange(
    datetime(year=2020, month=9, day=1, tzinfo=tzutc()),
    datetime(year=2020, month=10, day=1, tzinfo=tzutc()),
)

september_transfer_outcomes = calculate_counts(september_data_file_name, october_data_file_name, september_time_range)

In [14]:
october_time_range = DateTimeRange(
    datetime(year=2020, month=10, day=1, tzinfo=tzutc()),
    datetime(year=2020, month=11, day=1, tzinfo=tzutc()),
)

october_transfer_outcomes = calculate_counts(october_data_file_name, november_data_file_name, october_time_range)

In [15]:
transfer_outcomes_df = pd.DataFrame.from_dict([january_transfer_outcomes, february_transfer_outcomes, march_transfer_outcomes, april_transfer_outcomes, may_transfer_outcomes, june_transfer_outcomes, july_transfer_outcomes, august_transfer_outcomes, september_transfer_outcomes, october_transfer_outcomes])
transfer_outcomes_df["months"] = ["January", "February", "March", "April", "May", "June", "July", "August","September", "October"]
transfer_outcomes_df = transfer_outcomes_df.set_index("months")
transfer_outcomes_df

Unnamed: 0_level_0,DIDN'T COMPLETE - ERROR MID CONVERSATION,COMPLETED - WITHIN 8 DAYS,COMPLETED - ERROR IN FINAL ACK,DIDN'T COMPLETE - STUCK,COMPLETED - BEYOND 8 DAYS
months,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
January,5234,205375,4133,7235,12232
February,4384,175517,3559,6329,10097
March,3617,153099,3495,5512,12261
April,2293,77236,2001,3549,4745
May,2330,75461,1910,2699,4014
June,3270,112464,2995,3687,5588
July,4012,144487,3570,4893,8107
August,3839,142943,3633,4835,10370
September,5488,236905,5737,8096,17439
October,4845,205238,4840,6759,12734


In [16]:
transfer_outcomes_df["TECHNICAL ERRORS"] = transfer_outcomes_df["DIDN'T COMPLETE - ERROR MID CONVERSATION"] + transfer_outcomes_df["COMPLETED - ERROR IN FINAL ACK"]
del transfer_outcomes_df["DIDN'T COMPLETE - ERROR MID CONVERSATION"]
del transfer_outcomes_df["COMPLETED - ERROR IN FINAL ACK"]

transfer_outcomes_df["TOTAL TRANSFERS"] = transfer_outcomes_df.sum(axis=1)

transfer_outcomes_df["TOTAL INTEGRATED"] = transfer_outcomes_df["COMPLETED - WITHIN 8 DAYS"] + transfer_outcomes_df["COMPLETED - BEYOND 8 DAYS"]

transfer_outcomes_df["PAPER FALLBACK"] = transfer_outcomes_df["COMPLETED - BEYOND 8 DAYS"] + transfer_outcomes_df["TECHNICAL ERRORS"] + transfer_outcomes_df["DIDN'T COMPLETE - STUCK"]

transfer_outcomes_df.rename(columns={"COMPLETED - WITHIN 8 DAYS": "INTEGRATED WITHIN 8 DAYS", "COMPLETED - BEYOND 8 DAYS": "INTEGRATED BEYOND 8 DAYS", "DIDN'T COMPLETE - STUCK": "OTHER PROCESS/TECH ERRORS"}, inplace=True)

transfer_outcomes_df = transfer_outcomes_df[["TOTAL TRANSFERS", "TOTAL INTEGRATED", "INTEGRATED WITHIN 8 DAYS", "PAPER FALLBACK", "INTEGRATED BEYOND 8 DAYS", "TECHNICAL ERRORS", "OTHER PROCESS/TECH ERRORS"]]

transfer_outcomes_df

Unnamed: 0_level_0,TOTAL TRANSFERS,TOTAL INTEGRATED,INTEGRATED WITHIN 8 DAYS,PAPER FALLBACK,INTEGRATED BEYOND 8 DAYS,TECHNICAL ERRORS,OTHER PROCESS/TECH ERRORS
months,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
January,234209,217607,205375,28834,12232,9367,7235
February,199886,185614,175517,24369,10097,7943,6329
March,177984,165360,153099,24885,12261,7112,5512
April,89824,81981,77236,12588,4745,4294,3549
May,86414,79475,75461,10953,4014,4240,2699
June,128004,118052,112464,15540,5588,6265,3687
July,165069,152594,144487,20582,8107,7582,4893
August,165620,153313,142943,22677,10370,7472,4835
September,273665,254344,236905,36760,17439,11225,8096
October,234416,217972,205238,29178,12734,9685,6759


In [17]:
outcome_percentages_df = transfer_outcomes_df.iloc[:, 1:].apply(lambda x: x / transfer_outcomes_df.iloc[:, 0] * 100)

outcome_percentages_df = outcome_percentages_df.add_suffix(' (%)')

outcome_percentages_df.round(2)

Unnamed: 0_level_0,TOTAL INTEGRATED (%),INTEGRATED WITHIN 8 DAYS (%),PAPER FALLBACK (%),INTEGRATED BEYOND 8 DAYS (%),TECHNICAL ERRORS (%),OTHER PROCESS/TECH ERRORS (%)
months,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
January,92.91,87.69,12.31,5.22,4.0,3.09
February,92.86,87.81,12.19,5.05,3.97,3.17
March,92.91,86.02,13.98,6.89,4.0,3.1
April,91.27,85.99,14.01,5.28,4.78,3.95
May,91.97,87.32,12.68,4.65,4.91,3.12
June,92.23,87.86,12.14,4.37,4.89,2.88
July,92.44,87.53,12.47,4.91,4.59,2.96
August,92.57,86.31,13.69,6.26,4.51,2.92
September,92.94,86.57,13.43,6.37,4.1,2.96
October,92.99,87.55,12.45,5.43,4.13,2.88
