# Fraud Detector - Minimal Prediction API Example 
#### Supervised fraud detection  
-------

## Setup
------
First setup your AWS credentials so that Fraud Detector can store and access training data and supporting detector artifacts.

https://docs.aws.amazon.com/frauddetector/latest/ug/set-up.html

To use Amazon Fraud Detector, you have to set up permissions that allow access to the Amazon Fraud Detector console and API operations. You also have to allow Amazon Fraud Detector to perform tasks on your behalf and to access resources that you own.

We recommend creating an AWS Identify and Access Management (IAM) user with access restricted to Amazon Fraud Detector operations and required permissions. You can add other permissions as needed.

## Plan
------

You'll need the following pieces of information to make predictions on your dataset. 

- ENTITY_TYPE  
- EVENT_TYPE    
- DETECTOR_NAME & VERSION


You'll also need to identify how many records you'd like to predict on.  


In [None]:
from IPython.core.display import display, HTML
from IPython.display import clear_output
display(HTML("<style>.container { width:90% }</style>"))
# ------------------------------------------------------------------

import numpy as np
np.seterr(divide='ignore', invalid='ignore')


import pandas as pd
import uuid 
from datetime import datetime

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# -- dask for parallelism -- 
import dask 

# -- standard stuff -- 
import time
from datetime import datetime

# -- AWS stuff -- 
import boto3


## Initialize AWS Fraud Detector Client 
------

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/frauddetector.html 


In [None]:
# -- fraud detector client --
client = boto3.client('frauddetector',)

# -- use this to append to files 
sufx   = datetime.now().strftime("%Y%m%d")

### Entity, Detector, Model, and File Information  
-----
<div class="alert alert-info"> 💡 <strong> Entity, Detector, and Files. </strong>

- DETECTOR_NAME & VERSION corresponds to the name and version of your deployed Fraud Detector  
- MODEL_NAME & VERSION corresponds to the name and version of the model deployed with your Fraud Detector   
- S3_BUCKET & S3_FILE this is the information on the S3 file you wish to apply your detector to.   

</div>

In [None]:
ENTITY_TYPE    = "your_entity"
EVENT_TYPE     = "your_envent_type" 

DETECTOR_NAME = "your_detector"
DETECTOR_VER  = "1"

# -- name and version of model, used to get the model column names -- 
MODEL_NAME    = "your_model"
MODEL_VER     = "1"


# -- input file of data to be scored -- 
ARN_ROLE      = "arn:aws:iam::XXXX:role/your_role" 
S3_BUCKET     = "yourbucket"
S3_FILE       = "yourfile.csv"
S3_FILE_LOC   = "s3://{0}/{1}".format(S3_BUCKET,S3_FILE)

# -- run 100 records, you can change this here or below to run the whole file.
record_count = 100

#### Load Data to be Scored 
-----
<div class="alert alert-info"> 💡 <strong> Check the first 5 Records. </strong>

Does your data look correct? Do you need to rename any columns? You want the column names to match the field names used by the Model. 

</div>

In [None]:
# -- connect to S3, snag file, and convert to a panda's dataframe --
s3   = boto3.resource('s3')
obj  = s3.Object(S3_BUCKET, S3_FILE)
body = obj.get()['Body']
df   = pd.read_csv(body)
df.head()

## Run Predictions  
-----
The following applies the get_event_prediction endpoint to your recrods in your data frame.    

<i> Note: this uses the Dask backend to parallelize the prediction calls. </i>

<div class="alert alert-info"> 💡 <strong> get_event_prediction </strong>

To specify the number of records to score you change the record_count to a specific number (e.g., if you want to just predict on say 100 records). By default it assumes you want to apply predictions to the whole dataset. Once completed convert json to a pandas dataframe, and appends any existing labels, and analyze based on score threshold for a particular false positive rate (FPR).

</div>

this is all you need to run predictions: 

```python

client.get_event_prediction(detectorId=DETECTOR_NAME, 
                            detectorVersionId=DETECTOR_VERSION,
                            eventId = '222222',
                            eventTypeName = EVENT_TYPE,
                            eventTimestamp = '2020-07-27 12:01:01', 
                            entities = [{'entityType': ENTITY_TYPE, 'entityId':'11111'}],
                            eventVariables=  record)
```


Example of what a record would look like: 

```python
record = [{'order_amt': '8036.0',
  'ip_address': '192.18.59.93',
  'email_address': 'synth_george_hayduke@example.com',
  'cc_bin': '42785',
  'billing_postal': '17740-2745',
  'shipping_postal': '20950-6945',
  'customer_name': 'Geroge Hayduke'}]
```

In [None]:
# record_count = df.shape[0] -- override  to run all records in file 
model_variables = [column for column in df.columns if column not in  ['EVENT_LABEL', 'EVENT_TIMESTAMP']]
dateTimeObj = datetime.now()
timestampStr = dateTimeObj.strftime("%Y-%m-%dT%H:%M:%SZ")

start = time.time()

@dask.delayed
def _predict(record):
    eventId = uuid.uuid1()
    try:
        pred = client.get_event_prediction(detectorId=DETECTOR_NAME, 
                                       detectorVersionId=DETECTOR_VERSION,
                                       eventId = str(eventId),
                                       eventTypeName = EVENT_TYPE,
                                       eventTimestamp = timestampStr, 
                                       entities = [{'entityType': ENTITY_TYPE, 'entityId':str(eventId.int)}],
                                       eventVariables=  record) 
        
        record["score"]   = pred['modelScores'][0]['scores']["{0}_insightscore".format(MODEL_NAME)]
        record["outcomes"]= pred['ruleResults'][0]['outcomes']
        return record
    
    except:
        pred  = client.get_event_prediction(detectorId=DETECTOR_NAME, 
                                       detectorVersionId='1',
                                       eventId = str(eventId),
                                       eventTypeName = EVENT_TYPE,
                                       eventTimestamp = timestampStr, 
                                       entities = [{'entityType': ENTITY_TYPE, 'entityId':str(eventId.int)}],
                                       eventVariables=  record) 
        record["score"]   = "-999"
        record["outcomes"]= "error"
        return record

    

#predict_data  = df[eventVariables].head(10).astype(str).to_dict(orient='records')
predict_data  = df[model_variables].head(record_count).astype(str).to_dict(orient='records')
predict_score = []

i=0
for record in predict_data:
    clear_output(wait=True)
    rec = dask.delayed(_predict)(record)
    predict_score.append(rec)
    i += 1
    print("current progress: ", round((i/record_count)*100,2), "%" )
    

predict_recs = dask.compute(*predict_score)

# Calculate time taken and print results
time_taken = time.time() - start
tps = len(predict_recs) / time_taken

print ('Process took %0.2f seconds' %time_taken)
print ('Scored %d records' %len(predict_recs))

### Take a look at your predictions
-----
Each record will have a score and the outcome of any rule conditions met. 

In [None]:
predictions = pd.DataFrame.from_dict(predict_recs, orient='columns')
head(predictions)

### Optionally Write Predictions to File

<div class="alert alert-info"> <strong> Write Predictions. </strong>

You can write your prediction dataset to a CSV to manually review predictions. Simply add a cell below and copy the code below. 

</div>



```python

# -- optionally write predictions to a CSV file -- 
predictions.to_csv(FILE + ".csv", index=False)
# -- or to a XLS file 
predictions.to_excel(FILE + ".xlsx", index=False)

```