#  [모듈 2.1] Model 생성





이 노트북은 아래의 원본 노트북의 내용을 기반으로 하였습니다.

- An Introduction to the Amazon Fraud Detector API  
- https://github.com/aws-samples/aws-fraud-detector-samples/blob/master/Fraud_Detector_End_to_End.ipynb

이 노트북은 원본과 다음 부분이 다릅니다.


# 0. 개념: Model 의 의존성

![model_dependencies.png](img/model_dependencies.png)

- Event를 생성하기 위해서는 Entity, Label, Variables 가 필요하다

# 1. 환경 셋업

In [1]:
# 아래는 파이썬 캐키지를 임포트할때에 캐싱된 것을 사용하지 않고, 매번 리로딩 하는 세팅 입니다.

%load_ext autoreload
%autoreload 2

기존 노트북의 변수를 로딩 합니다.

In [2]:
%store -r 

In [3]:
import boto3
import sagemaker
from datetime import datetime
import pandas as pd

client = boto3.client('frauddetector')
role = sagemaker.get_execution_role()
# -- suffix is appended to detector and model name for uniqueness  
sufx   = datetime.now().strftime("%Y%m%d")

# sufx='20210910'
print("sufx: ", sufx)

sufx:  20210913


# 2. 오브젝트 이름 정의

In [4]:
project_prefix = 'adtaking_fraud_phase0'

In [5]:
# -- This is all you need to fill out. Once complete simply interactively run each code cell. --  

ENTITY_TYPE    = f"cf_customer_{sufx}"
ENTITY_DESC    = "entity description: {0}".format(sufx) 

EVENT_TYPE     = f"{project_prefix}_{sufx}"
EVENT_DESC     = "example event description: {0}".format(sufx) 

MODEL_NAME     = f"{project_prefix}_model_{sufx}"
MODEL_DESC     = "model trained on: {0}".format(sufx) 

ARN_ROLE       = role
S3_FILE_LOC    = s3_train_data_uri


In [6]:
print("project_prefix: ", project_prefix)
print("ENTITY_TYPE: ", ENTITY_TYPE)
print("EVENT_TYPE: ", EVENT_TYPE)
print("MODEL_NAME: ", MODEL_NAME)
print("ARN_ROLE: ", ARN_ROLE)
print("S3_FILE_LOC: ", S3_FILE_LOC)

project_prefix:  adtaking_fraud_phase0
ENTITY_TYPE:  cf_customer_20210913
EVENT_TYPE:  adtaking_fraud_phase0_20210913
MODEL_NAME:  adtaking_fraud_phase0_model_20210913
ARN_ROLE:  arn:aws:iam::189546603447:role/AFD-gsmoon
S3_FILE_LOC:  s3://sagemaker-us-east-1-189546603447/adtalking_fraud_phase0/train/train-180000.csv


# 3. 훈련 데이터 로딩 및 기본 프로파일링
-----


In [7]:
from src.p_utils import summary_stats

df   = pd.read_csv(s3_train_data_uri)
df_stats, trainingDataSchema, eventVariables, eventLabels = summary_stats(df)


--- summary stats ---
      feature_name   dtype   count  nunique  null  not_null  null_pct  \
0      EVENT_LABEL  object  180000        2     0    180000       0.0   
1           str_ip  object  180000    44650     0    180000       0.0   
2          str_app  object  180000      184     0    180000       0.0   
3       str_device  object  180000      143     0    180000       0.0   
4           str_os  object  180000      143     0    180000       0.0   
5      str_channel  object  180000      157     0    180000       0.0   
6  EVENT_TIMESTAMP  object  180000   110099     0    180000       0.0   



--- event variables ---
['str_ip', 'str_app', 'str_device', 'str_os', 'str_channel']


--- event labels ---
['0', '1']


--- training data schema ---
{'modelVariables': ['str_ip', 'str_app', 'str_device', 'str_os', 'str_channel'], 'labelSchema': {'labelMapper': {'FRAUD': ['1'], 'LEGIT': ['0']}}}




# 4. Entity, Label, Variables, Event 생성
-----

#### IP, EMail_Address 확인
- 현재 데이터 셋은 존재 하지 않음

In [8]:
df_stats.loc[(df_stats['feature_type'].isin(['IP_ADDRESS', 'EMAIL_ADDRESS']))]

Unnamed: 0,feature_name,dtype,count,nunique,null,not_null,null_pct,nunique_pct,feature_type,feature_warning


## (1) Varables 와 Labels 생성

In [9]:
from src.p_utils import create_variables, create_label
# --- no changes just run this code block ---

model_variables = create_variables(df_stats, MODEL_NAME)
print("\n --- model variable dict --")
print(model_variables)


model_label = create_label(df, "EVENT_LABEL")
print("\n --- model label schema dict --")
print(model_label)




 --- model variable dict --
[{'name': 'str_ip'}, {'name': 'str_app'}, {'name': 'str_device'}, {'name': 'str_os'}, {'name': 'str_channel'}]

 --- model label schema dict --
{'labelKey': 'EVENT_LABEL', 'labelMapper': {'FRAUD': ['1'], 'LEGIT': ['0']}}


## (2) Entity and Event Types 생성
-----

- 이벤트 유형:  이벤트 유형은 Amazon FraFraud Detector 로 전송되는 개별 이벤트의 구조를 정의합니다. 일단 정의되면 특정 이벤트 유형에 대한 위험을 평가하는 모델 및 탐지기를 작성할 수 있습니다. (예: biling online transaction)
- 엔터티 유형: 이벤트를 수행 중인 사용자를 분류합니다. 예측 중에 엔터티 유형과 엔터티 ID를 지정하여 이벤트를 수행한 사람을 정의합니다.
    - 예: custoemr, account
- [관련 개발자 가이드](https://docs.aws.amazon.com/ko_kr/frauddetector/latest/ug/create-event-type.html)


In [10]:
# --- no changes just run this code block ---
response = client.put_entity_type(
    name        = ENTITY_TYPE,
    description = ENTITY_DESC
)
print("-- create entity --")
print(response)



-- create entity --
{'ResponseMetadata': {'RequestId': 'a157644e-5ced-4a63-8639-f3ddbf34af8c', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 13 Sep 2021 12:37:03 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': 'a157644e-5ced-4a63-8639-f3ddbf34af8c'}, 'RetryAttempts': 0}}


In [11]:

response = client.put_event_type (
    name           = EVENT_TYPE,
    eventVariables = eventVariables,
    labels         = eventLabels,
    entityTypes    = [ENTITY_TYPE])
print("-- create event type --")
print(response)

-- create event type --
{'ResponseMetadata': {'RequestId': '48d2d91e-3fe8-44f8-bd98-bdb335026fd0', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 13 Sep 2021 12:37:04 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': '48d2d91e-3fe8-44f8-bd98-bdb335026fd0'}, 'RetryAttempts': 0}}


# 5. 모델 및 모델 버전 생성 (모델 훈련)
-----
아래는 소요 시간이 약 1시간 걸림.
    
- 모델을 생성
- 모델을 생성한 후에 모델 버전을 생성 함.

## 모델 생성

In [12]:
# --- no changes; just run this code block. ---

# -- create our model --
response = client.create_model(
   description   =  MODEL_DESC,
   eventTypeName = EVENT_TYPE,
   modelId       = MODEL_NAME,
   modelType   = 'ONLINE_FRAUD_INSIGHTS')

print("-- initalize model --")
print(response)



-- initalize model --
{'ResponseMetadata': {'RequestId': 'af3aeb17-6f80-403d-b5b9-7e28c664e02b', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 13 Sep 2021 12:37:06 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': 'af3aeb17-6f80-403d-b5b9-7e28c664e02b'}, 'RetryAttempts': 0}}


## 모델 버전 생성

In [14]:

# -- initializes the model, it's now ready to train -- 
response = client.create_model_version(
    modelId     = MODEL_NAME,
    modelType   = 'ONLINE_FRAUD_INSIGHTS',
    trainingDataSource = 'EXTERNAL_EVENTS',
    trainingDataSchema = trainingDataSchema,
    externalEventsDetail = {
        'dataLocation'     : S3_FILE_LOC,
        'dataAccessRoleArn': ARN_ROLE
    }
)
print("-- model training --")
print(response)


-- model training --
{'modelId': 'adtaking_fraud_phase0_model_20210913', 'modelType': 'ONLINE_FRAUD_INSIGHTS', 'modelVersionNumber': '1.0', 'status': 'TRAINING_IN_PROGRESS', 'ResponseMetadata': {'RequestId': '6a4079d4-e886-4c9d-ad55-170ed483dfb5', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 13 Sep 2021 12:37:15 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '145', 'connection': 'keep-alive', 'x-amzn-requestid': '6a4079d4-e886-4c9d-ad55-170ed483dfb5'}, 'RetryAttempts': 0}}


In [16]:
import time

# -- model training takes time, we'll loop until it's complete  -- 
print("-- wait for model training to complete --")
stime = time.time()
while True:
    response = client.get_model_version(modelId=MODEL_NAME, modelType = "ONLINE_FRAUD_INSIGHTS", modelVersionNumber = '1.0')
    if response['status'] == 'TRAINING_IN_PROGRESS':
        print(f"current progress: {(time.time() - stime)/60:{3}.{3}} minutes")
        time.sleep(60)  # -- sleep for 60 seconds 
    if response['status'] != 'TRAINING_IN_PROGRESS':
        print("Model status : " +  response['status'])
        break
        
etime = time.time()

# -- summarize -- 
print("\n --- model training complete  --")
print("Elapsed time : %s" % (etime - stime) + " seconds \n"  )
print(response)



-- wait for model training to complete --
Model status : TRAINING_COMPLETE

 --- model training complete  --
Elapsed time : 0.2484264373779297 seconds 

{'modelId': 'adtaking_fraud_phase0_model_20210913', 'modelType': 'ONLINE_FRAUD_INSIGHTS', 'modelVersionNumber': '1.0', 'trainingDataSource': 'EXTERNAL_EVENTS', 'trainingDataSchema': {'modelVariables': ['str_ip', 'str_app', 'str_device', 'str_os', 'str_channel'], 'labelSchema': {'labelMapper': {'FRAUD': ['1'], 'LEGIT': ['0']}}}, 'externalEventsDetail': {'dataLocation': 's3://sagemaker-us-east-1-189546603447/adtalking_fraud_phase0/train/train-180000.csv', 'dataAccessRoleArn': 'arn:aws:iam::189546603447:role/AFD-gsmoon'}, 'status': 'TRAINING_COMPLETE', 'arn': 'arn:aws:frauddetector:us-east-1:189546603447:model-version/ONLINE_FRAUD_INSIGHTS/adtaking_fraud_phase0_model_20210913/1.0', 'ResponseMetadata': {'RequestId': '1f9d1417-c38e-4041-b388-8da1fe70ad8c', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 13 Sep 2021 13:35:40 GMT', 'cont

In [17]:

modelVersionNumber = response['modelVersionNumber']
modelVersionNumber

'1.0'

## 모델 성능 결과

- 아래의 자세한 결과는 AFD 콘솔에서 학인 하세요

![score_dist.png](img/score_dist.png)

## 모델 피쳐 중요성

![model_var_imp.png](img/model_var_imp.png)

# 6. 모델 배포
아래는 소요시간이 약 10분 걸림

- Model Version 이  Ready to deploy 임. 
- update_model_version_status() 을 통해서 배포함.

In [None]:
response = client.update_model_version_status (
    modelId = MODEL_NAME,
    modelType = 'ONLINE_FRAUD_INSIGHTS',
    modelVersionNumber = modelVersionNumber,
    status = 'ACTIVE'
)
print("-- activating model --")
print(response)

#-- wait until model is active 
print("--- waiting until model status is active ")
stime = time.time()
while True:
    response = client.get_model_version(modelId=MODEL_NAME, modelType = "ONLINE_FRAUD_INSIGHTS", modelVersionNumber = '1.0')
    if response['status'] != 'ACTIVE':
        print(f"current progress: {(time.time() - stime)/60:{3}.{3}} minutes")
        time.sleep(60)  # sleep for 1 minute 
    if response['status'] == 'ACTIVE':
        print("Model status : " +  response['status'])
        break
        
etime = time.time()
print("Elapsed time : %s" % (etime - stime) + " seconds \n"  )
print(response)

-- activating model --
{'ResponseMetadata': {'RequestId': '94161c24-ed04-4e93-9049-5fe4c496bacc', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 13 Sep 2021 13:43:35 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': '94161c24-ed04-4e93-9049-5fe4c496bacc'}, 'RetryAttempts': 0}}
--- waiting until model status is active 
current progress: 0.00336 minutes
current progress: 1.01 minutes
current progress: 2.01 minutes
current progress: 3.02 minutes
current progress: 4.02 minutes
current progress: 5.03 minutes
current progress: 6.03 minutes
current progress: 7.04 minutes
current progress: 8.04 minutes
current progress: 9.05 minutes


## 변수 저장

In [None]:
%store project_prefix
%store ENTITY_TYPE
%store EVENT_TYPE
%store eventVariables
%store MODEL_NAME
%store modelVersionNumber
%store ARN_ROLE
