## Hypothesis: TPP transfers failing due to error 30s have attachments over 100MB
We believe that transfers to and from TPP practices that happened after the 11am 27th May 2021 
all have at least one attachment over 100MB. 

In [2]:
import pandas as pd
from datetime import datetime


In [3]:
fix_time = datetime(2021, 5, 27, 11, 00, 0, 0)

In [4]:
file_name="s3://prm-gp2gp-data-sandbox-dev/MI_athena_outputs/MI-data-A-PRMT-2128.csv"
raw_mi_rr_data=pd.read_csv(file_name,parse_dates=['RegistrationTime','RequestFailureTime'])

In [5]:
# Supplier name mapping
supplier_renaming = {
    "EGTON MEDICAL INFORMATION SYSTEMS LTD (EMIS)":"EMIS",
    "IN PRACTICE SYSTEMS LTD":"Vision",
    "MICROTEST LTD":"Microtest",
    "THE PHOENIX PARTNERSHIP":"TPP",
    None: "Unknown"
}

# Generate ASID lookup that contains all the most recent entry for all ASIDs encountered
asid_file_location = "s3://prm-gp2gp-data-sandbox-dev/asid-lookup/"
asid_files = [
    "asidLookup-Nov-2020.csv.gz",
    "asidLookup-Dec-2020.csv.gz",
    "asidLookup-Jan-2021.csv.gz",
    "asidLookup-Feb-2021.csv.gz",
    "asidLookup-Mar-2021.csv.gz",
    "asidLookup-Apr-2021.csv.gz"
]
asid_lookup_files = [asid_file_location + f for f in asid_files]
asid_lookup = pd.concat((
    pd.read_csv(f)
    for f in asid_lookup_files
))
asid_lookup = asid_lookup.drop_duplicates().groupby("ASID").last().reset_index()
lookup = asid_lookup[["ASID", "MName", "NACS","OrgName"]]
lookup['supplier']=lookup['MName'].replace(supplier_renaming)
lookup=lookup[['NACS','supplier']].drop_duplicates().groupby("NACS").last().reset_index()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [6]:
mi_rr_data_with_supplier=raw_mi_rr_data.merge(lookup,left_on='RequestorODS',right_on='NACS',how='left').rename({'supplier':'requesting supplier'},axis=1).drop('NACS',axis=1)
mi_rr_data_with_supplier=mi_rr_data_with_supplier.merge(lookup,left_on='SenderODS',right_on='NACS',how='left').rename({'supplier':'sending supplier'},axis=1).drop('NACS',axis=1)

In [7]:
mi_rr_data_with_supplier['RequestErrorCode'].value_counts()

20       4360
24        982
30        224
10        107
29         61
100        49
25         31
06         25
IU030      19
19         17
23          7
6           5
21          3
14          3
Name: RequestErrorCode, dtype: int64

In [8]:
print('Test: After joining supplier on, the rows in the dataframe should not change.')
mi_rr_data_with_supplier.shape[0]==raw_mi_rr_data.shape[0]

Test: After joining supplier on, the rows in the dataframe should not change.


True

In [9]:
MI_data=mi_rr_data_with_supplier.copy().loc[:,['requesting supplier','sending supplier','RequestorODS','SenderODS','RegistrationTime','RequestFailureTime','RequestErrorCode','ExtractAckCode','RequestErrorDescription']]
MI_data['Registered after Fix']=MI_data['RegistrationTime']>fix_time
MI_data['Failed after Fix']=MI_data['RequestFailureTime']>fix_time
MI_data['Contains Error Code 30']=((MI_data['RequestErrorCode']=='30') | (MI_data['ExtractAckCode']==30)).fillna(False)
MI_data['Contains TPP limit error']=MI_data['RequestErrorDescription'].str.contains('is larger than TPP limit').fillna(False)
MI_data['Contains Error Code 30'].sum()
(MI_data['RequestErrorCode']=='30').sum()

224

In [10]:
MI_data.loc[MI_data['Contains Error Code 30']].groupby(['sending supplier','requesting supplier']).agg('count')
print('% transfers with error code 30 by supplier pathway')
MI_data.pivot_table(index=['sending supplier','requesting supplier'],columns='Registered after Fix',values='Contains Error Code 30',aggfunc='mean').multiply(100)

% transfers with error code 30 by supplier pathway


Unnamed: 0_level_0,Registered after Fix,False,True
sending supplier,requesting supplier,Unnamed: 2_level_1,Unnamed: 3_level_1
EMIS,EMIS,0.0,0.012132
EMIS,INHEALTHCARE LTD,0.0,0.0
EMIS,Microtest,0.0,0.0
EMIS,NATIONAL PROGRAMME FOR IT,0.0,0.0
EMIS,RX SYSTEMS,0.0,0.0
EMIS,STREETS HEAVER COMPUTER SYSTEMS LTD,,0.0
EMIS,TPP,0.94839,0.177883
EMIS,Unknown,0.0,0.0
EMIS,Vision,0.0,0.0
INHEALTHCARE LTD,EMIS,0.0,0.0


In [11]:
MI_data.groupby('Registered after Fix').agg({'RegistrationTime':['min','max'],'Contains Error Code 30':['count','mean','sum'],'Contains TPP limit error':['mean','sum']})

Unnamed: 0_level_0,RegistrationTime,RegistrationTime,Contains Error Code 30,Contains Error Code 30,Contains Error Code 30,Contains TPP limit error,Contains TPP limit error
Unnamed: 0_level_1,min,max,count,mean,sum,mean,sum
Registered after Fix,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
False,2021-05-25 11:00:01,2021-05-27 11:00:00,55199,0.003134,173,0.000779,43
True,2021-05-27 11:00:01,2021-05-29 21:14:00,39984,0.002501,100,0.00015,6


In [12]:
large_attach_data=MI_data.loc[MI_data['Contains TPP limit error']].sort_values(by='Registered after Fix')
large_attach_data['Attachment Size']=large_attach_data['RequestErrorDescription'].str.split().apply(lambda message_list: int(message_list[3]))
large_attach_data['Attachment Limit']=large_attach_data['RequestErrorDescription'].str.split().apply(lambda message_list: int(message_list[-1]))
large_attach_data.to_excel('PRMT-2128.xlsx')
large_attach_data.head()

Unnamed: 0,requesting supplier,sending supplier,RequestorODS,SenderODS,RegistrationTime,RequestFailureTime,RequestErrorCode,ExtractAckCode,RequestErrorDescription,Registered after Fix,Failed after Fix,Contains Error Code 30,Contains TPP limit error,Attachment Size,Attachment Limit
2876,TPP,EMIS,B83055,B86003,2021-05-26 16:18:20,2021-05-26 16:29:46,100,30.0,Attachment size : 76967884 is larger than TPP ...,False,False,True,True,76967884,62914560
43658,TPP,EMIS,F81030,K82048,2021-05-26 11:33:46,2021-05-26 11:39:50,100,30.0,Attachment size : 64142048 is larger than TPP ...,False,False,True,True,64142048,62914560
50304,TPP,EMIS,C88015,C88019,2021-05-26 16:52:55,2021-05-26 17:01:59,100,30.0,Attachment size : 83395852 is larger than TPP ...,False,False,True,True,83395852,62914560
50306,TPP,EMIS,C88015,C88019,2021-05-26 17:06:04,2021-05-26 17:13:20,100,30.0,Attachment size : 83395852 is larger than TPP ...,False,False,True,True,83395852,62914560
50307,TPP,EMIS,C88015,C88019,2021-05-26 17:11:59,2021-05-26 17:14:50,100,30.0,Attachment size : 114669292 is larger than TPP...,False,False,True,True,114669292,62914560
