# Impact of cut off points on transfer outcomes

We need to decide when we should consistently cut off the data to show a full picture of a month. The purpose of this analysis is to assess the impact of different cut off points. For example, if we cut off the data two weeks after the month end, we might have 1% of transfers that are shown in pending but will ultimately be successful, compared to doing it three weeks after the month end where we might ave 0.5% of transfers that are pending but ultimately successful. 

In [1]:
import paths
from datetime import datetime
from dateutil.tz import tzutc

from gp2gp.date.range import DateTimeRange
from gp2gp.pipeline.dashboard.main import read_spine_csv_gz_files
from gp2gp.service.transformers import derive_transfers
from scripts.gp2gp_spine_outcomes import parse_conversations

In [2]:
july_data_file_name = "../data/months/July-2020.csv.gz"
august_data_file_name = "../data/months/Aug-2020.csv.gz"
september_data_file_name="../data/months/Sept-2020.csv.gz"
october_data_file_name = "../data/months/Oct-2020.csv.gz"

In [3]:
july_time_range = DateTimeRange(
    datetime(year=2020, month=7, day=1, tzinfo=tzutc()),
    datetime(year=2020, month=8, day=1, tzinfo=tzutc()),
)

In [4]:
  spine_messages = read_spine_csv_gz_files([
      july_data_file_name, august_data_file_name
  ])

  conversations = parse_conversations(spine_messages, time_range=july_time_range)
  transfers = derive_transfers(conversations)

In [5]:
import pandas as pd
from matplotlib import pyplot as plt

In [6]:
transfers_df = pd.DataFrame(transfers)[["status", "date_completed"]]

transfers_df

Unnamed: 0,status,date_completed
0,TransferStatus.FAILED,NaT
1,TransferStatus.FAILED,NaT
2,TransferStatus.FAILED,NaT
3,TransferStatus.FAILED,NaT
4,TransferStatus.FAILED,NaT
...,...,...
165064,TransferStatus.INTEGRATED,2020-07-01 06:39:08.615000+00:00
165065,TransferStatus.INTEGRATED,2020-07-01 06:38:55.183000+00:00
165066,TransferStatus.INTEGRATED,2020-07-01 06:38:33.343000+00:00
165067,TransferStatus.INTEGRATED,2020-07-01 05:49:42.736000+00:00
