# PRMT-2170 - TPP Attachment Limit validation

## Hypothesis

We believe that for EMIS to TPP transfers, there will be a reduction of large attachment failures after 27th May 11amWe will know this to be true when we see a reduction in these failures from before 27th May 11am

## Scope 

Look at MI data for after 27th May to as recent as we’ve got, and the same time frame before

Show proportion of Large attachment failures out of total EMIS to TPP  transfers for the given time frame (not individual messages)


In [1]:
import pandas as pd
from datetime import datetime, timedelta

In [2]:
fix_time = datetime(2021, 5, 27, 11, 00, 0, 0)

### Athena Query
Athena Query used to generate MI-data-athena--PRMT-2170-notebook-44.csv

Used previously defined `mi_rr` view (see notebook 42).

```sql
SELECT *
FROM mi_rr
WHERE from_iso8601_timestamp(RegistrationTime)
    > from_iso8601_timestamp('2021-05-12T11:00:00')
```

In [3]:
file_name="s3://prm-gp2gp-data-sandbox-dev/MI_athena_outputs/MI-data-athena--PRMT-2170-notebook-44.csv"
raw_mi_rr_data=pd.read_csv(file_name,parse_dates=['RegistrationTime','RequestFailureTime'])

  interactivity=interactivity, compiler=compiler, result=result)


In [4]:
# Supplier name mapping
supplier_renaming = {
    "EGTON MEDICAL INFORMATION SYSTEMS LTD (EMIS)":"EMIS",
    "IN PRACTICE SYSTEMS LTD":"Vision",
    "MICROTEST LTD":"Microtest",
    "THE PHOENIX PARTNERSHIP":"TPP",
    None: "Unknown"
}

# Generate ASID lookup that contains all the most recent entry for all ASIDs encountered
asid_file_location = "s3://prm-gp2gp-data-sandbox-dev/asid-lookup/"
asid_files = [
    "asidLookup-Nov-2020.csv.gz",
    "asidLookup-Dec-2020.csv.gz",
    "asidLookup-Jan-2021.csv.gz",
    "asidLookup-Feb-2021.csv.gz",
    "asidLookup-Mar-2021.csv.gz",
    "asidLookup-Apr-2021.csv.gz",
    "asidLookup-May-2021.csv.gz"
]
asid_lookup_files = [asid_file_location + f for f in asid_files]
asid_lookup = pd.concat((
    pd.read_csv(f)
    for f in asid_lookup_files
))
asid_lookup = asid_lookup.drop_duplicates().groupby("ASID").last().reset_index()
lookup = asid_lookup[["ASID", "MName", "NACS","OrgName"]]
lookup['supplier']=lookup['MName'].replace(supplier_renaming)
lookup=lookup[['NACS','supplier']].drop_duplicates().groupby("NACS").last().reset_index()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [5]:
mi_rr_data_with_supplier=raw_mi_rr_data.merge(lookup,left_on='RequestorODS',right_on='NACS',how='left').rename({'supplier':'requesting supplier'},axis=1).drop('NACS',axis=1)
mi_rr_data_with_supplier=mi_rr_data_with_supplier.merge(lookup,left_on='SenderODS',right_on='NACS',how='left').rename({'supplier':'sending supplier'},axis=1).drop('NACS',axis=1)

In [6]:
# Select EMIS to TPP conversations
emis_to_tpp_conversations_bool = (mi_rr_data_with_supplier["sending supplier"] == "EMIS") & (mi_rr_data_with_supplier["requesting supplier"] == "TPP")
emis_to_tpp_conversations = mi_rr_data_with_supplier[emis_to_tpp_conversations_bool]

# Filter by 2 weeks before fix and 2 weeks after
time_window = timedelta(days=14)
date_bool = (emis_to_tpp_conversations["RegistrationTime"] >= fix_time - time_window) & (emis_to_tpp_conversations["RegistrationTime"] <= fix_time + time_window)
emis_to_tpp_conversations = emis_to_tpp_conversations[date_bool]

In [7]:
# Adding before and after fix
emis_to_tpp_conversations["Fix implemented"] = emis_to_tpp_conversations["RegistrationTime"] > fix_time

In [8]:
# Add contains error code 30 column
emis_to_tpp_conversations['Contains Error Code 30']=((emis_to_tpp_conversations['RequestErrorCode']=='30') | (emis_to_tpp_conversations['ExtractAckCode']==30)).fillna(False)

In [9]:
# N.B. Conversations without a conversation ID are not counted
error_code_30_prevalence_change = emis_to_tpp_conversations.pivot_table(index="Fix implemented", columns="Contains Error Code 30", aggfunc="count", values="ConversationID")
error_code_30_prevalence_change["Total"] = error_code_30_prevalence_change.sum(axis=1)
error_code_30_prevalence_change["%"] = (error_code_30_prevalence_change[True] / error_code_30_prevalence_change["Total"]).multiply(100).round(2)

In [10]:
error_code_30_prevalence_change

Contains Error Code 30,False,True,Total,%
Fix implemented,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
False,20555,176,20731,0.85
True,18383,43,18426,0.23


The proportion of conversations with error code 30s has reduced from 0.85% to 0.23%. 

Now we'll look at how many of the error code 30s were due to the TPP attachment limit.

In [11]:
# Only looking at conversations with Error Code 30
conversations_with_error_code_30 = emis_to_tpp_conversations[emis_to_tpp_conversations['Contains Error Code 30']].copy()
conversations_with_error_code_30['Contains TPP limit error'] = conversations_with_error_code_30['RequestErrorDescription'].str.contains('is larger than TPP limit').fillna(False)
conversations_with_error_code_30['Contains TPP limit error'].value_counts(dropna=False)

True    219
Name: Contains TPP limit error, dtype: int64

Therefore, we conclude that all error code 30s have an error description that contains "is larger than TPP limit" (i.e. a TPP attachment limit error)

N.B. Remember this is only looking at conversations from EMIS to TPP within this timeframe.

Now we will look at what the attachment limit (given in the error description) is before and after the fix was implemented.

In [12]:
# Attachment limit by MB
conversations_with_error_code_30['Attachment Limit']=conversations_with_error_code_30['RequestErrorDescription'].str.split().apply(lambda message_list: int(message_list[-1]))
attachment_limit_table = conversations_with_error_code_30.pivot_table(index="Fix implemented", columns="Attachment Limit", aggfunc="count", values="ConversationID").fillna(0).astype(int)
attachment_limit_table.columns = attachment_limit_table.columns / (1024**2)
attachment_limit_table

Attachment Limit,60.0,100.0
Fix implemented,Unnamed: 1_level_1,Unnamed: 2_level_1
False,176,0
True,0,43


The attachment limit was 60MB before the fix, and after it was 100MB for all conversations that reached the attachment limit.