### Encoding Cyclical Datetime Features
In case there are patterns in cyclical datetime features (hour of day/day of month/day of week), we will leverage the common trick of tranforming them with sin/cos to improve pattern detection.

In [23]:
import math
from metaflow import Flow
import numpy as np
import pandas as pd

run = Flow('TransactionDeduplicationFlow').latest_successful_run
cleaned_df = run.data.df

In [3]:
cleaned_df.head()

Unnamed: 0,accountNumber,customerId,creditLimit,availableMoney,transactionDateTime,transactionAmount,merchantName,acqCountry,merchantCountryCode,posEntryMode,...,echoBuffer,currentBalance,merchantCity,merchantState,merchantZip,cardPresent,posOnPremises,recurringAuthInd,expirationDateKeyInMatch,isFraud
0,737265056,737265056,5000,5000.0,2016-08-13 14:27:32,98.55,Uber,US,US,2.0,...,,0.0,,,,False,,,False,False
1,737265056,737265056,5000,5000.0,2016-10-11 05:05:54,74.51,AMC #191138,US,US,9.0,...,,0.0,,,,True,,,False,False
2,737265056,737265056,5000,5000.0,2016-11-08 09:18:39,7.47,Play Store,US,US,9.0,...,,0.0,,,,False,,,False,False
3,737265056,737265056,5000,5000.0,2016-12-10 02:14:50,7.47,Play Store,US,US,9.0,...,,0.0,,,,False,,,False,False
4,830329091,830329091,5000,5000.0,2016-03-24 21:04:46,71.18,Tim Hortons #947751,US,US,2.0,...,,0.0,,,,True,,,False,False


In [27]:
cleaned_df['txn_day_of_month'] = cleaned_df['transactionDateTime'].apply(lambda _:_.day)
cleaned_df['txn_day_of_week'] = cleaned_df['transactionDateTime'].apply(lambda _:_.dayofweek)
cleaned_df['txn_month_of_year'] = cleaned_df['transactionDateTime'].apply(lambda _:_.month)

In [28]:
# We normalize x values to match with the 0-2π cycle
cleaned_df['tdom_norm'] = 2 * math.pi *cleaned_df['txn_day_of_month'] / cleaned_df['txn_day_of_month'].max()
cleaned_df['tdow_norm'] = 2 * math.pi *cleaned_df['txn_day_of_week'] /cleaned_df['txn_day_of_week'].max()
cleaned_df['tmoy_norm'] = 2 * math.pi *cleaned_df['txn_month_of_year'] /cleaned_df['txn_month_of_year'].max()

In [29]:
cleaned_df['cos_tdom'] = np.cos(cleaned_df['tdom_norm'])
cleaned_df['cos_tdow'] = np.cos(cleaned_df['tdow_norm'])
cleaned_df['cos_tmoy'] = np.cos(cleaned_df['tmoy_norm'])

In [30]:
# Encode sin as well
cleaned_df['sin_tdom'] = np.sin(cleaned_df['tdom_norm'])
cleaned_df['sin_tdow'] = np.sin(cleaned_df['tdow_norm'])
cleaned_df['sin_tmoy'] = np.sin(cleaned_df['tmoy_norm'])

### A Note on Potential Landmines
1. By encoding one piece of information into two features, the algorithm might assign more importance to it.
2. Because decision tree based methods evaluate features individually, they could miss the combined value here.