# [HYPOTHESIS] Impact of changing the pipeline to re-allocate certain sender error codes as failures

**We believe that** Changing the data pipeline to re-allocate pending with error transfers with specified sender error codes will

**Result** remove the vast majority of Pending with Error transfers

**We will know this to be true when** we identify the prescence of these error codes in the vast majority of Pending with Error transfers generated over a 6 month period

In [1]:
import pandas as pd
import numpy as np

## Import 6 months of data

In [2]:
transfer_file_location = "s3://prm-gp2gp-data-sandbox-dev/transfers-duplicates-hypothesis/"
transfer_files = [
    "9-2020-transfers.parquet",
    "10-2020-transfers.parquet",
    "11-2020-transfers.parquet",
    "12-2020-transfers.parquet",
    "1-2021-transfers.parquet",
    "2-2021-transfers.parquet"
]
transfer_input_files = [transfer_file_location + f for f in transfer_files]
transfers_raw = pd.concat((
    pd.read_parquet(f)
    for f in transfer_input_files
))

## Change status of duplicates to match expected pipeline change

In [3]:
transfers=transfers_raw.copy()
successful_transfers_bool = transfers['request_completed_ack_codes'].apply(lambda x: True in [(np.isnan(i) or i==15) for i in x])
transfers.loc[successful_transfers_bool,'status']='INTEGRATED'

## Add Proposed Status Change

In [4]:
pending_sender_error_codes=[6,7,10,24,30,23,14]
transfers['New status']=transfers['status'].copy()

transfers_with_pending_sender_codes_bool=transfers['sender_error_code'].isin(pending_sender_error_codes)
transfers.loc[transfers_with_pending_sender_codes_bool,'New status']='FAILED'

## Assess Impact

In [5]:
status_change_table=pd.pivot_table(transfers,index='status',columns='New status',values='conversation_id',aggfunc='count')
status_change_table=status_change_table.fillna(0).astype(int)
status_change_table

New status,FAILED,INTEGRATED,PENDING,PENDING_WITH_ERROR
status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
FAILED,22771,0,0,0
INTEGRATED,7,1254795,0,0
PENDING,0,0,39087,0
PENDING_WITH_ERROR,23285,0,0,3289


In [6]:
print('Total Number of Transfers: ' + str(status_change_table.sum().sum()))

print('\nInitial Number of Pending With Error: ' + str(status_change_table.loc['PENDING_WITH_ERROR'].sum()))
print('Final Number of Pending With Error: ' + str(status_change_table.loc[:, 'PENDING_WITH_ERROR'].sum()))

print('\nInitial Number of Failures: ' + str(status_change_table.loc['FAILED'].sum()))
print('Final Number of Failures: ' + str(status_change_table.loc[:, 'FAILED'].sum()))

Total Number of Transfers: 1343234

Initial Number of Pending With Error: 26574
Final Number of Pending With Error: 3289

Initial Number of Failures: 22771
Final Number of Failures: 46063


Error Codes 23 & 14 are responsible for 4 and 3 of the 7 integrations containing these error codes, respectively