# Fraud Detection on Amazon SageMaker

This notebook shows how to use Amazon Sagemaker Processsing to prepare the data. The source and destionation location of the S3 bucket needs to be changed. Also the ecr_repository_uri will vary depending on the region.

First, we process the raw dataset to prepare the features and extract the interactions in the dataset that will be used to construct the graph. 

Then, we create a launch a training job using the SageMaker framework estimator to train a graph neural network model with DGL.

In [1]:
!bash setup.sh

import sagemaker
from sagemaker_graph_fraud_detection import config, container_build

role = config.role
sess = sagemaker.Session()

Obtaining file:///root/sagemaker-graph-fraud-detection/source/sagemaker/sagemaker_graph_fraud_detection
Installing collected packages: sagemaker-graph-fraud-detection
  Attempting uninstall: sagemaker-graph-fraud-detection
    Found existing installation: sagemaker-graph-fraud-detection 1.0
    Uninstalling sagemaker-graph-fraud-detection-1.0:
      Successfully uninstalled sagemaker-graph-fraud-detection-1.0
  Running setup.py develop for sagemaker-graph-fraud-detection
Successfully installed sagemaker-graph-fraud-detection-1.0


## Data Preprocessing 

In [3]:
#container to run the processing
ecr_repository_uri = "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3"

### Run Preprocessing job with Amazon SageMaker Processing

The script we have defined at `AutoInsuranceFraudProcessing.py` performs data preprocessing transformations on the raw data. The preproceesing involes replacing values, droping rows, dropping columns and categorical encoding

Change the S3 locations of the source and destination

In [7]:
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput

script_processor = ScriptProcessor(command=['python3'],
                                   image_uri=ecr_repository_uri,
                                   role=role,
                                   instance_count=1,
                                   instance_type='ml.m4.2xlarge')

script_processor.run(code='AutoInsuranceFraudProcessing.py',
                     inputs=[ProcessingInput(source='s3://sagemaker-us-east-1-367858208265/AutoInsuranceFraud/insurance_claims.csv',
                                             destination='/opt/ml/processing/input')],
                     outputs=[ProcessingOutput(output_name="preprocessed_data.csv", destination='s3://sagemaker-us-east-1-367858208265/AutoInsuranceFraud/Results/DataProcessing',
                                               source='/opt/ml/processing/output')])



Parameter 'session' will be renamed to 'sagemaker_session' in SageMaker Python SDK v2.



Job Name:  sagemaker-scikit-learn-2021-09-08-05-43-24-577
Inputs:  [{'InputName': 'input-1', 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-367858208265/AutoInsuranceFraud/insurance_claims.csv', 'LocalPath': '/opt/ml/processing/input', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}, {'InputName': 'code', 'S3Input': {'S3Uri': 's3://sagemaker-us-east-1-367858208265/sagemaker-scikit-learn-2021-09-08-05-43-24-577/input/code/AutoInsuranceFraudProcessing.py', 'LocalPath': '/opt/ml/processing/input/code', 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', 'S3DataDistributionType': 'FullyReplicated', 'S3CompressionType': 'None'}}]
Outputs:  [{'OutputName': 'preprocessed_data.csv', 'S3Output': {'S3Uri': 's3://sagemaker-us-east-1-367858208265/AutoInsuranceFraud/Results/DataProcessing', 'LocalPath': '/opt/ml/processing/output', 'S3UploadMode': 'EndOfJob'}}]
...........................[34mReceived arguments Namespace()[0

### View Results of Data Preprocessing

Once the preprocessing job is complete, we can take a look at the contents of the S3 bucket to see the transformed data.

In [8]:
preprocessing_job_description = script_processor.jobs[-1].describe()

output_config = preprocessing_job_description["ProcessingOutputConfig"]
for output in output_config["Outputs"]:
    print(output)
    preprocessed_training_data = output["S3Output"]["S3Uri"]
    print(preprocessed_training_data)


{'OutputName': 'preprocessed_data.csv', 'S3Output': {'S3Uri': 's3://sagemaker-us-east-1-367858208265/AutoInsuranceFraud/Results/DataProcessing', 'LocalPath': '/opt/ml/processing/output', 'S3UploadMode': 'EndOfJob'}, 'AppManaged': False}
s3://sagemaker-us-east-1-367858208265/AutoInsuranceFraud/Results/DataProcessing


Once the training is completed, the training instances are automatically saved and SageMaker stores the trained model and evaluation results to a location in S3.