#  [모듈 5.1] TRANSACTION_FRAUD_INSIGHTS 모델 타입으로 Model 생성





이 노트북은 아래의 원본 노트북의 내용을 기반으로 하였습니다.

- Transaction fraud insights 모델 타입을 위한 개발자 가이드 
    - [ransaction fraud insights](https://docs.aws.amazon.com/frauddetector/latest/ug/transaction-fraud-insights.html)
    - [공식 AFD 샘플 코드 예시](https://github.com/aws-samples/aws-fraud-detector-samples)
        - 아래는 위 리파지토리 하위에 있는 TRANSACTION_FRAUD_INSIGHTS 의 예시 입니다.
        - https://github.com/aws-samples/aws-fraud-detector-samples/blob/master/Fraud_Detector_End_to_End_Stored_Data.ipynb

이 노트북은 원본과 다음 부분이 다릅니다.
    

# 0. 개념: Model 의 의존성

![model_dependencies.png](img/model_dependencies.png)

- Event를 생성하기 위해서는 Entity, Label, Variables 가 필요하다

# 1. 환경 셋업

In [45]:
# 아래는 파이썬 캐키지를 임포트할때에 캐싱된 것을 사용하지 않고, 매번 리로딩 하는 세팅 입니다.

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


기존 노트북의 변수를 로딩 합니다.

In [46]:
%store -r 

In [47]:
import boto3
import sagemaker
from datetime import datetime
import pandas as pd

client = boto3.client('frauddetector')
role = sagemaker.get_execution_role()
# -- suffix is appended to detector and model name for uniqueness  
sufx   = datetime.now().strftime("%Y%m%d")

# sufx='20210910'
print("sufx: ", sufx)

sufx:  20211030


# 2. 오브젝트 이름 정의

In [59]:
project_prefix = 'adtaking_fraud_phase0_tran'

In [49]:
# -- This is all you need to fill out. Once complete simply interactively run each code cell. --  

# ENTITY_TYPE    = f"cf_customer_{sufx}"
ENTITY_TYPE    = f"customer"
ENTITY_DESC    = "entity description: {0}".format(sufx) 

EVENT_TYPE     = f"{project_prefix}_{sufx}"
EVENT_DESC     = "example event description: {0}".format(sufx) 

MODEL_NAME     = f"{project_prefix}_model_{sufx}"
MODEL_DESC     = "model trained on: {0}".format(sufx) 

ARN_ROLE       = role
S3_FILE_LOC    = s3_train_data_uri

VARIABLES_MAP = {
    "IP_ADDRESS": "",       # e.g. ip_address
    "EMAIL_ADDRESS": ""     # e.g. customer_email
}

# -- percentage of data used in model training (by default: 100%). 
TRAINING_PERC = 1.0

In [50]:
print("project_prefix: ", project_prefix)
print("ENTITY_TYPE: ", ENTITY_TYPE)
print("EVENT_TYPE: ", EVENT_TYPE)
print("MODEL_NAME: ", MODEL_NAME)
print("ARN_ROLE: ", ARN_ROLE)
print("S3_FILE_LOC: ", S3_FILE_LOC)

project_prefix:  adtaking_fraud_phase0_tran
ENTITY_TYPE:  customer
EVENT_TYPE:  adtaking_fraud_phase0_tran_20211030
MODEL_NAME:  adtaking_fraud_phase0_tran_model_20211030
ARN_ROLE:  arn:aws:iam::057716757052:role/AFD-gsmoon
S3_FILE_LOC:  s3://sagemaker-us-east-1-057716757052/adtalking_fraud_phase0/train/train-180000.csv


# 3. 훈련 데이터 로딩 및 기본 프로파일링
-----


In [51]:
from src.transaction_utils import profiling

df   = pd.read_csv(s3_train_afd_tran_data_uri)

In [52]:
df_stats, trainingDataSchema, eventVariables, eventLabels = profiling(df, VARIABLES_MAP)


--- summary stats ---
      feature_name   dtype   count  nunique  null  not_null  null_pct  \
0      EVENT_LABEL   int64  180000        2     0    180000       0.0   
1        ENTITY_ID  object  180000    44650     0    180000       0.0   
2          str_app  object  180000      184     0    180000       0.0   
3       str_device  object  180000      143     0    180000       0.0   
4           str_os  object  180000      143     0    180000       0.0   
5      str_channel  object  180000      157     0    180000       0.0   
6  EVENT_TIMESTAMP  object  180000   110099     0    180000       0.0   
7      ENTITY_TYPE  object  180000        1     0    180000       0.0   
8         EVENT_ID  object  180000   180000     0    180000       0.0   
9  LABEL_TIMESTAMP  object  180000   110099     0    180000       0.0   



--- event variables ---
['str_app', 'str_device', 'str_os', 'str_channel']


--- event labels ---
['0', '1']


--- training data schema ---
{'modelVariables': ['str_app', '

# 4. Entity, Label, Variables, Event 생성
-----


## (1) Varables 와 Labels 생성

### Variables 생성

In [53]:
from src.transaction_utils import create_variables, create_label
# --- no changes just run this code block ---

# model_variables = create_variables_transaction(df_stats, MODEL_NAME)
# print("\n --- model variable dict --")
# print(model_variables)

exclude_list = ['ENTITY_TYPE','ENTITY_ID','EVENT_ID','EVENT_TIMESTAMP','EVENT_LABEL','LABEL_TIMESTAMP','UNKNOWN']
features_dict = df_stats.loc[(~df_stats['feature_type'].isin(exclude_list))].set_index('feature_name')['feature_type'].to_dict()
print("\n --- model variable dict --")
print("features_dict: ", features_dict)
features_dict = create_variables(features_dict)
print("\n")
print(features_dict)




 --- model variable dict --
features_dict:  {'str_app': 'CATEGORICAL', 'str_device': 'CATEGORICAL', 'str_os': 'CATEGORICAL', 'str_channel': 'CATEGORICAL'}
str_app has been defined, data type: STRING
str_device has been defined, data type: STRING
str_os has been defined, data type: STRING
str_channel has been defined, data type: STRING


{'str_app': 'STRING', 'str_device': 'STRING', 'str_os': 'STRING', 'str_channel': 'STRING'}


### Label 생성

In [54]:
label_mapper = trainingDataSchema['labelSchema']['labelMapper']
print("\n --- model label schema dict --")
print(label_mapper)
model_label = create_label(label_mapper)




 --- model label schema dict --
{'FRAUD': ['1'], 'LEGIT': ['0']}


## (2) Entity and Event Types 생성
-----

- 이벤트 유형:  이벤트 유형은 Amazon FraFraud Detector 로 전송되는 개별 이벤트의 구조를 정의합니다. 일단 정의되면 특정 이벤트 유형에 대한 위험을 평가하는 모델 및 탐지기를 작성할 수 있습니다. (예: biling online transaction)
    - 옵션 추가: eventIngestion = 'ENABLED', 추가
- 엔터티 유형: 이벤트를 수행 중인 사용자를 분류합니다. 예측 중에 엔터티 유형과 엔터티 ID를 지정하여 이벤트를 수행한 사람을 정의합니다.
    - 예: custoemr, account
- [관련 개발자 가이드](https://docs.aws.amazon.com/ko_kr/frauddetector/latest/ug/create-event-type.html)


In [55]:
# --- no changes just run this code block ---
response = client.put_entity_type(
    name        = ENTITY_TYPE,
    description = ENTITY_DESC
)
print("-- create entity --")
print(response)



-- create entity --
{'ResponseMetadata': {'RequestId': '30f32061-91bc-47ee-a9f1-a6441251e381', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sat, 30 Oct 2021 10:11:31 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': '30f32061-91bc-47ee-a9f1-a6441251e381'}, 'RetryAttempts': 0}}


In [56]:

response = client.put_event_type (
    name           = EVENT_TYPE,
    eventVariables = eventVariables,
    labels         = eventLabels,
    eventIngestion = 'ENABLED',    
    entityTypes    = [ENTITY_TYPE])
print("-- create event type --")
print(response)

-- create event type --
{'ResponseMetadata': {'RequestId': '41d2f2a8-1f52-4d54-9e56-d8f35f880a01', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sat, 30 Oct 2021 10:11:32 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': '41d2f2a8-1f52-4d54-9e56-d8f35f880a01'}, 'RetryAttempts': 0}}


# 5. 배치 임포트 (인제스쳔)

이 기능은 AFD 내부로 S3 의 데이터 세트를 가공하여 입력하는 작업 입니다. TRANSACTION_FRAUD_INSIGHTS 모델 타입 추가와 함께 신규 추가 기능 입니다.

In [57]:
## -- create batch import job --
client.create_batch_import_job(
    jobId = f'batch_import_{EVENT_TYPE}',
#     inputPath = f's3://{S3_BUCKET}/{S3_FILE}',
    inputPath = s3_train_afd_tran_data_uri, 
    outputPath = f's3://{bucket}',
    eventTypeName = EVENT_TYPE,
    iamRoleArn = role
)

{'ResponseMetadata': {'RequestId': '1c5fe7f5-3003-4b3c-842d-33483547f878',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Sat, 30 Oct 2021 10:11:33 GMT',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '2',
   'connection': 'keep-alive',
   'x-amzn-requestid': '1c5fe7f5-3003-4b3c-842d-33483547f878'},
  'RetryAttempts': 0}}

In [60]:
import time
# -- takes sometime to import all data into AFD  -- 
print("-- wait for model training to complete --") 
stime = time.time()
while True:
    response = client.get_batch_import_jobs(jobId = f'batch_import_{EVENT_TYPE}')
    status = response['batchImports'][0]['status']
    if status in ['IN_PROGRESS', 'IN_PROGRESS_INITIALIZING']:
        print(f"current progress: {(time.time() - stime)/60:{3}.{3}} minutes")
        time.sleep(60)  # -- sleep for 60 seconds 
    else:
        print(f"Model status : {status}")
        break
etime = time.time()

# -- summarize --
print("\n --- batch import complete  --")
print("Elapsed time : %s" % (etime - stime) + " seconds \n"  )
print(response)

-- wait for model training to complete --
Model status : COMPLETE

 --- batch import complete  --
Elapsed time : 0.1257936954498291 seconds 

{'batchImports': [{'jobId': 'batch_import_adtaking_fraud_phase0_tran_20211030', 'status': 'COMPLETE', 'startTime': '2021-10-30T10:11:33Z', 'completionTime': '2021-10-30T10:28:41Z', 'inputPath': 's3://sagemaker-us-east-1-057716757052/adtalking_fraud_phase0/train/train-afd-tran-180000.csv', 'outputPath': 's3://sagemaker-us-east-1-057716757052', 'eventTypeName': 'adtaking_fraud_phase0_tran_20211030', 'iamRoleArn': 'arn:aws:iam::057716757052:role/AFD-gsmoon', 'arn': 'arn:aws:frauddetector:us-east-1:057716757052:batch-import/batch_import_adtaking_fraud_phase0_tran_20211030', 'processedRecordsCount': 180000, 'failedRecordsCount': 18, 'totalRecordsCount': 180000}], 'ResponseMetadata': {'RequestId': '03365f5b-4b95-422e-993c-1bfff99eb879', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sat, 30 Oct 2021 11:34:10 GMT', 'content-type': 'application/x-amz-js

# 6. 모델 및 모델 버전 생성 (모델 훈련)
-----
아래는 소요 시간이 약 1시간 걸림.
    
- 모델을 생성
- 모델을 생성한 후에 모델 버전을 생성 함.

## 모델 생성
- 모델 타입에 TRANSACTION_FRAUD_INSIGHTS 지정

In [62]:
# --- no changes; just run this code block. ---

# -- create our model --
response = client.create_model(
   description   =  MODEL_DESC,
   eventTypeName = EVENT_TYPE,
   modelId       = MODEL_NAME,
   modelType   = 'TRANSACTION_FRAUD_INSIGHTS')

print("-- initalize model --")
print(response)



-- initalize model --
{'ResponseMetadata': {'RequestId': '4d6e0cdd-e2c6-48fc-aa0b-c960547d2808', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sat, 30 Oct 2021 11:35:24 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': '4d6e0cdd-e2c6-48fc-aa0b-c960547d2808'}, 'RetryAttempts': 0}}


## 모델 버전 생성
- TRANSACTION_FRAUD_INSIGHTS 에 따라 인자들 변경이 됨 



In [63]:
print(df['EVENT_TIMESTAMP'].min())
print(df['EVENT_TIMESTAMP'].max())
df.info()

2020-11-06T15:08:24Z
2020-11-08T23:58:58Z
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180000 entries, 0 to 179999
Data columns (total 10 columns):
 #   Column           Non-Null Count   Dtype 
---  ------           --------------   ----- 
 0   EVENT_LABEL      180000 non-null  int64 
 1   ENTITY_ID        180000 non-null  object
 2   str_app          180000 non-null  object
 3   str_device       180000 non-null  object
 4   str_os           180000 non-null  object
 5   str_channel      180000 non-null  object
 6   EVENT_TIMESTAMP  180000 non-null  object
 7   ENTITY_TYPE      180000 non-null  object
 8   EVENT_ID         180000 non-null  object
 9   LABEL_TIMESTAMP  180000 non-null  object
dtypes: int64(1), object(9)
memory usage: 13.7+ MB


In [64]:

# -- initializes the model, it's now ready to train -- 
response = client.create_model_version(
    modelId     = MODEL_NAME,
    modelType   = 'TRANSACTION_FRAUD_INSIGHTS',
    trainingDataSource = 'INGESTED_EVENTS',
    trainingDataSchema = trainingDataSchema,
    ingestedEventsDetail={
          'ingestedEventsTimeWindow': {
              'startTime': df['EVENT_TIMESTAMP'].min(),
              'endTime': df['EVENT_TIMESTAMP'].max()
          }
    }
)
model_version = response['modelVersionNumber']
print("-- model training --")
print(response)



-- model training --
{'modelId': 'adtaking_fraud_phase0_tran_model_20211030', 'modelType': 'TRANSACTION_FRAUD_INSIGHTS', 'modelVersionNumber': '1.0', 'status': 'TRAINING_IN_PROGRESS', 'ResponseMetadata': {'RequestId': '80f7c898-5051-474a-9eb8-0c8e1bf87f4b', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sat, 30 Oct 2021 11:35:33 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '155', 'connection': 'keep-alive', 'x-amzn-requestid': '80f7c898-5051-474a-9eb8-0c8e1bf87f4b'}, 'RetryAttempts': 0}}


In [67]:
import time

# -- model training takes time, we'll loop until it's complete  -- 
print("-- wait for model training to complete --")
stime = time.time()
while True:
    response = client.get_model_version(modelId=MODEL_NAME, modelType = "TRANSACTION_FRAUD_INSIGHTS", modelVersionNumber = '1.0')
    if response['status'] == 'TRAINING_IN_PROGRESS':
        print(f"current progress: {(time.time() - stime)/60:{3}.{3}} minutes")
        time.sleep(60)  # -- sleep for 60 seconds 
    if response['status'] != 'TRAINING_IN_PROGRESS':
        print("Model status : " +  response['status'])
        break
        
etime = time.time()

# -- summarize -- 
print("\n --- model training complete  --")
print("Elapsed time : %s" % (etime - stime) + " seconds \n"  )
print(response)



-- wait for model training to complete --
Model status : TRAINING_COMPLETE

 --- model training complete  --
Elapsed time : 0.2544558048248291 seconds 

{'modelId': 'adtaking_fraud_phase0_tran_model_20211030', 'modelType': 'TRANSACTION_FRAUD_INSIGHTS', 'modelVersionNumber': '1.0', 'trainingDataSource': 'INGESTED_EVENTS', 'trainingDataSchema': {'modelVariables': ['str_app', 'str_device', 'str_os', 'str_channel'], 'labelSchema': {'labelMapper': {'FRAUD': ['1'], 'LEGIT': ['0']}, 'unlabeledEventsTreatment': 'IGNORE'}}, 'ingestedEventsDetail': {'ingestedEventsTimeWindow': {'startTime': '2020-11-06T15:08:24Z', 'endTime': '2020-11-08T23:58:58Z'}}, 'status': 'TRAINING_COMPLETE', 'arn': 'arn:aws:frauddetector:us-east-1:057716757052:model-version/TRANSACTION_FRAUD_INSIGHTS/adtaking_fraud_phase0_tran_model_20211030/1.0', 'ResponseMetadata': {'RequestId': '6382a73d-dbf4-4c1e-b165-3377a3040dd3', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sat, 30 Oct 2021 13:46:38 GMT', 'content-type': 'applica

In [68]:

modelVersionNumber = response['modelVersionNumber']
modelVersionNumber

'1.0'

## 모델 성능 결과

- 아래의 자세한 결과는 AFD 콘솔에서 학인 하세요

![score_dist.png](img/score_dist.png)

## 모델 피쳐 중요성

![model_var_imp.png](img/model_var_imp.png)

# 6. 모델 배포
아래는 소요시간이 약 10분 걸림

- Model Version 이  Ready to deploy 임. 
- update_model_version_status() 을 통해서 배포함.

In [71]:
response = client.update_model_version_status (
    modelId = MODEL_NAME,
    modelType = 'TRANSACTION_FRAUD_INSIGHTS',
    modelVersionNumber = modelVersionNumber,
    status = 'ACTIVE'
)
print("-- activating model --")
print(response)



ValidationException: An error occurred (ValidationException) when calling the UpdateModelVersionStatus operation: You may only update Model Version status to 'ACTIVE' if the current status is 'TRAINING_COMPLETE', to 'INACTIVE' if it is 'ACTIVE', and to 'TRAINING_CANCELLED' if it is 'TRAINING_IN_PROGRESS'.

In [78]:
#-- wait until model is active 
print("--- waiting until model status is active ")
stime = time.time()
while True:
    response = client.get_model_version(modelId=MODEL_NAME, modelType = "TRANSACTION_FRAUD_INSIGHTS", modelVersionNumber = '1.0')
    if response['status'] != 'ACTIVE':
        print(f"current progress: {(time.time() - stime)/60:{3}.{3}} minutes")
        time.sleep(60)  # sleep for 1 minute 
    if response['status'] == 'ACTIVE':
        print("Model status : " +  response['status'])
        break
        
etime = time.time()
print("Elapsed time : %s" % (etime - stime) + " seconds \n"  )
print(response)

--- waiting until model status is active 
Model status : ACTIVE
Elapsed time : 0.3952972888946533 seconds 

{'modelId': 'adtaking_fraud_phase0_tran_model_20211030', 'modelType': 'TRANSACTION_FRAUD_INSIGHTS', 'modelVersionNumber': '1.0', 'trainingDataSource': 'INGESTED_EVENTS', 'trainingDataSchema': {'modelVariables': ['str_app', 'str_device', 'str_os', 'str_channel'], 'labelSchema': {'labelMapper': {'FRAUD': ['1'], 'LEGIT': ['0']}, 'unlabeledEventsTreatment': 'IGNORE'}}, 'ingestedEventsDetail': {'ingestedEventsTimeWindow': {'startTime': '2020-11-06T15:08:24Z', 'endTime': '2020-11-08T23:58:58Z'}}, 'status': 'ACTIVE', 'arn': 'arn:aws:frauddetector:us-east-1:057716757052:model-version/TRANSACTION_FRAUD_INSIGHTS/adtaking_fraud_phase0_tran_model_20211030/1.0', 'ResponseMetadata': {'RequestId': '69785512-e1c3-4d1d-84a5-7452ded09c80', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sat, 30 Oct 2021 23:16:44 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '626', 'connecti

## 변수 저장

In [81]:
%store project_prefix
%store ENTITY_TYPE
%store EVENT_TYPE
%store eventVariables
%store MODEL_NAME
%store modelVersionNumber
%store ARN_ROLE


Stored 'project_prefix' (str)
Stored 'ENTITY_TYPE' (str)
Stored 'EVENT_TYPE' (str)
Stored 'eventVariables' (list)
Stored 'MODEL_NAME' (str)
Stored 'modelVersionNumber' (str)
Stored 'ARN_ROLE' (str)
