## AITD End to End Example Notebook

In [1]:
from src.transform_predict import transform, score
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
## Read Dataset - Required Columns are: ['tx_description', 'sender_id', 'receiver_id', 'tx_date', 'amount']
data = pd.read_csv("data/example_input.csv")
data.head()

Unnamed: 0,tx_description,sender_id,receiver_id,tx_date,amount
0,lunch,1,101,2023-02-13 02:03:15,1190.63
1,dinner,1,101,2023-03-29 17:48:10,268.4
2,gift,1,100,2023-01-20 17:41:33,1276.55
3,rent,1,102,2023-01-17 08:13:35,1020.42
4,rent,1,101,2023-01-17 15:45:40,983.1


### Long Term Transaction Abuse Detection

In [3]:
feature_generation_3months = transform(data, lag=2, score_month=3, score_year=2023)
feature_generation_3months.head()



Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


0
2023
3
[202302, 202301]


Unnamed: 0,sender_id,receiver_id,year_month,toxicity_percentile,severe_toxicity_percentile,obscene_percentile,identity_attack_percentile,insult_percentile,threat_percentile,sexual_explicit_percentile,...,month_min2_num_trans_sum,month_min2_identity_attack_percentile,month_min2_sadness_sum,month_min2_length_transaction_median,month_min2_surprise_sum,month_min2_joy_max,month_min2_day_between_max_min,month_min2_longest_word_median,month_min2_anger_sum,month_min2_love_sum
0,1,100,202303,0.08569,0.37013,0.159817,0.106089,0.050668,0.461281,0.48631,...,0.4,0.135917,0.0,0.0,0.0,0.037037,0.0,0.333333,0.598046,0.0
1,1,101,202303,0.104995,0.055806,0.227049,0.155466,0.054305,0.164828,0.146895,...,0.2,0.353546,0.0,0.916667,0.0,0.041667,0.0,0.666667,0.644172,0.375
2,1,102,202303,0.019711,0.019863,0.053331,0.0,0.018816,0.0,0.021091,...,0.733333,0.258648,0.0,0.333333,0.0,0.03125,0.0,0.333333,0.877684,0.28125
3,2,100,202303,0.084638,0.32704,0.191018,0.35806,0.048903,0.521809,0.444424,...,0.466667,0.135917,0.535714,0.5,0.0,0.035714,0.0,0.0,1.0,0.642857
4,2,101,202303,0.411683,0.740513,0.68121,0.392687,0.256472,0.951393,0.978037,...,0.466667,0.179198,0.0,0.083333,0.0,0.035714,0.0,0.333333,0.733129,0.321429


In [4]:
predictions_3month = score(feature_generation_3months, model_loc='models/CBA_AITD_Long.zip')
predictions_3month.head()

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: java version "1.8.0_361"; Java(TM) SE Runtime Environment (build 1.8.0_361-b09); Java HotSpot(TM) 64-Bit Server VM (build 25.361-b09, mixed mode)
  Starting server from /Users/genevieverichards/opt/anaconda3/envs/aitd/lib/python3.10/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/ng/wrfk6d3n4p9gy7vj76hlf4z80000gn/T/tmpf8g4elha
  JVM stdout: /var/folders/ng/wrfk6d3n4p9gy7vj76hlf4z80000gn/T/tmpf8g4elha/h2o_genevieverichards_started_from_python.out
  JVM stderr: /var/folders/ng/wrfk6d3n4p9gy7vj76hlf4z80000gn/T/tmpf8g4elha/h2o_genevieverichards_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,12 secs
H2O_cluster_timezone:,Asia/Seoul
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.42.0.3
H2O_cluster_version_age:,2 months and 16 days
H2O_cluster_name:,H2O_from_python_genevieverichards_xyvzqe
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,7.096 Gb
H2O_cluster_total_cores:,0
H2O_cluster_allowed_cores:,0


generic Model Build progress: |██████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
generic prediction progress: |███████████████████████████████████████████████████| (done) 100%
H2O session _sid_b838 closed.


Unnamed: 0,sender_id,receiver_id,probability_non_abuse,probability_abuse
0,1,100,0.630252,0.369748
5,2,102,0.635904,0.364096
4,2,101,0.642964,0.357036
1,1,101,0.666243,0.333757
2,1,102,0.679424,0.320576


### Short Term Transaction Abuse Detection

In [5]:
# Note: For Short Term Abuse detection model, set lag to 0.
# Note: By default, the features will be generated for the latest month in the dataset, for the example data, this is March.
feature_generation_month = transform(data, lag=0)
feature_generation_month.head()



Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


0
2023
3


Unnamed: 0,sender_id,receiver_id,year_month,toxicity_percentile,severe_toxicity_percentile,obscene_percentile,identity_attack_percentile,insult_percentile,threat_percentile,sexual_explicit_percentile,...,recip_num_trans_sum,recip_identity_attack_percentile,recip_sadness_sum,recip_length_transaction_median,recip_surprise_sum,recip_joy_max,recip_day_between_max_min,recip_longest_word_median,recip_anger_sum,recip_love_sum
0,1,100,202303,0.08569,0.37013,0.159817,0.106089,0.050668,0.461281,0.48631,...,0.6,0.106089,1.0,0.666667,,0.033333,,0.666667,0.229039,0.6
1,1,101,202303,0.104995,0.055806,0.227049,0.155466,0.054305,0.164828,0.146895,...,1.0,0.155466,0.416667,0.583333,,0.027778,,0.666667,0.15985,1.0
2,1,102,202303,0.019711,0.019863,0.053331,0.0,0.018816,0.0,0.021091,...,0.6,0.0,0.0,1.0,,0.033333,,0.666667,0.644172,0.3
3,2,100,202303,0.084638,0.32704,0.191018,0.35806,0.048903,0.521809,0.444424,...,0.133333,0.35806,0.0,0.333333,,0.043478,,0.333333,0.806615,0.782609
4,2,101,202303,0.411683,0.740513,0.68121,0.392687,0.256472,0.951393,0.978037,...,0.666667,0.392687,0.0,0.666667,,0.032258,,0.333333,0.845043,0.580645


In [6]:
predictions_month_march = score(feature_generation_month, model_loc='models/CBA_AITD_Short.zip')
predictions_month_march.head()

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: java version "1.8.0_361"; Java(TM) SE Runtime Environment (build 1.8.0_361-b09); Java HotSpot(TM) 64-Bit Server VM (build 25.361-b09, mixed mode)
  Starting server from /Users/genevieverichards/opt/anaconda3/envs/aitd/lib/python3.10/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/ng/wrfk6d3n4p9gy7vj76hlf4z80000gn/T/tmp_p64132q
  JVM stdout: /var/folders/ng/wrfk6d3n4p9gy7vj76hlf4z80000gn/T/tmp_p64132q/h2o_genevieverichards_started_from_python.out
  JVM stderr: /var/folders/ng/wrfk6d3n4p9gy7vj76hlf4z80000gn/T/tmp_p64132q/h2o_genevieverichards_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,12 secs
H2O_cluster_timezone:,Asia/Seoul
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.42.0.3
H2O_cluster_version_age:,2 months and 16 days
H2O_cluster_name:,H2O_from_python_genevieverichards_2xfj41
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,7.096 Gb
H2O_cluster_total_cores:,0
H2O_cluster_allowed_cores:,0


generic Model Build progress: |██████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
generic prediction progress: |███████████████████████████████████████████████████| (done) 100%
H2O session _sid_94a3 closed.


Unnamed: 0,sender_id,receiver_id,probability_non_abuse,probability_abuse
0,1,100,0.628646,0.371354
4,2,101,0.661149,0.338851
1,1,101,0.679388,0.320612
5,2,102,0.695596,0.304404
2,1,102,0.720643,0.279357


#### Specifying the Score Month

In [7]:
# Note: For Short Term Abuse detection model, set lag to 0.
# Note: To Score for another month, you need to specify the month in score month, in the following example it will generate features for Feburary. 
# Note: You can also pass score_year if your data contains multiple years
feature_generation_month_feb = transform(data, lag=0, score_month=2, score_year=2023)
predictions_month_feb = score(feature_generation_month_feb, model_loc='models/CBA_AITD_Short.zip')
predictions_month_feb.head()



Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


0
2023
2
Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: java version "1.8.0_361"; Java(TM) SE Runtime Environment (build 1.8.0_361-b09); Java HotSpot(TM) 64-Bit Server VM (build 25.361-b09, mixed mode)
  Starting server from /Users/genevieverichards/opt/anaconda3/envs/aitd/lib/python3.10/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/ng/wrfk6d3n4p9gy7vj76hlf4z80000gn/T/tmppfw9onhz
  JVM stdout: /var/folders/ng/wrfk6d3n4p9gy7vj76hlf4z80000gn/T/tmppfw9onhz/h2o_genevieverichards_started_from_python.out
  JVM stderr: /var/folders/ng/wrfk6d3n4p9gy7vj76hlf4z80000gn/T/tmppfw9onhz/h2o_genevieverichards_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,12 secs
H2O_cluster_timezone:,Asia/Seoul
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.42.0.3
H2O_cluster_version_age:,2 months and 16 days
H2O_cluster_name:,H2O_from_python_genevieverichards_rfr0qj
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,7.096 Gb
H2O_cluster_total_cores:,0
H2O_cluster_allowed_cores:,0


generic Model Build progress: |██████████████████████████████████████████████████| (done) 100%
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
generic prediction progress: |███████████████████████████████████████████████████| (done) 100%
H2O session _sid_8aad closed.


Unnamed: 0,sender_id,receiver_id,probability_non_abuse,probability_abuse
5,2,102,0.644517,0.355483
2,1,102,0.721381,0.278619
0,1,100,0.731157,0.268843
1,1,101,0.73559,0.26441
3,2,100,0.742449,0.257551
