# [HYPOTHESIS] All error codes can be classified as completely fatal or completely non fatal

## Hypothesis

**We believe** all error codes can be classified as completely fatal or completely non fatal

**We will know this to be true** when we can attribute every failed transfer to an error code which is not present in any successfully integrated transfer

 

## Approach/Scope

- Take 6 months of data - Sept 2020 to Feb 2021

- Clearly label each transfer as integrated or failed 

  - Correct duplicate transfers 

  - Remove pending transfers

- Merge intermediate and final error codes

- Identify % failure for each error code and designate anything with 100% error code as fatal 

- Ensure that all failures contain one of these fatal error codes

In [None]:
import pandas as pd
import numpy as np

In [None]:
def Series_of_lists_value_counts(Series):
    # Replace any nan values in list
    Series=Series.apply(lambda row: ['None' if np.isnan(x) else x for x in row])
    # Convert this into a dataframe of list items in order
    journey_frame=pd.DataFrame.from_records(Series.tolist())
    # To ensure grouping of different list lengths, fill gaps
    journey_frame=journey_frame.fillna('n/a')
    # Store index for grouping
    grouping_index=list(journey_frame.columns)
    # Add column to aggreate on for group
    journey_frame['Total Occurences']=1

    # Now do the actual aggregate
    journey_frame=journey_frame.groupby(grouping_index).agg('count').sort_values(by='Total Occurences',ascending=False)
    
    return journey_frame.reset_index().replace({'n/a':np.nan})

In [None]:
transfer_file_location = "s3://<bucket-name>"
transfer_files = [
    "9-2020-transfers.parquet",
    "10-2020-transfers.parquet",
    "11-2020-transfers.parquet",
    "12-2020-transfers.parquet",
    "1-2021-transfers.parquet",
    "2-2021-transfers.parquet"
]
transfer_input_files = [transfer_file_location + f for f in transfer_files]
transfers = pd.concat((
    pd.read_parquet(f)
    for f in transfer_input_files
))
transfers

In [None]:
successful_transfers_bool = transfers['request_completed_ack_codes'].apply(lambda x: True in [(np.isnan(i) or i==15) for i in x])
transfers_without_integrated_status_bool = transfers['status'] != 'INTEGRATED'
successful_transfers_without_integrated_status = transfers[(successful_transfers_bool & transfers_without_integrated_status_bool)]
successful_transfers_without_integrated_status['status'].value_counts()

In [None]:
Series_of_lists_value_counts(successful_transfers_without_integrated_status['request_completed_ack_codes'].apply(set))

In [None]:
transfers_with_final_outcome = transfers.copy()
transfers_with_final_outcome.loc[successful_transfers_bool, 'status'] = 'INTEGRATED'
transfers_with_final_outcome = transfers_with_final_outcome.loc[(transfers_with_final_outcome['status'] == 'INTEGRATED') | (transfers_with_final_outcome['status'] == 'FAILED')]
transfers_with_final_outcome

In [None]:
transfers_with_final_outcome['all_error_codes'] = transfers_with_final_outcome.apply(lambda x: [*x['intermediate_error_codes'], *x['request_completed_ack_codes']],axis=1)
transfers_with_final_outcome

In [None]:
transfers_with_final_outcome['all_error_codes'] = transfers_with_final_outcome['all_error_codes'].apply(lambda x: [i for i in x if np.isfinite(i)])
transfers_with_final_outcome

In [None]:
reduced_transfers_with_final_outcome = transfers_with_final_outcome[['status', 'all_error_codes']]
has_errors = reduced_transfers_with_final_outcome["all_error_codes"].apply(len) > 0
transfers_with_final_outcome_exploded = reduced_transfers_with_final_outcome[has_errors].explode("all_error_codes")
transfers_with_final_outcome_exploded
# transfers_with_errors = group_by_errors("intermediate_error_codes", transfers_with_unique_intermediate_errors)
# transfers_with_errors