# MADlib Flow 

## Credit & Debit Transaction Fraud Model

Use case:
Using available credit and debit card transaction data build a classification model to predict whether or not a new transaction is fraudulent.

This notebook demonstrate the model development, feature engineering, a look up cache for feature engine, and operationalize the model for batch and low-latency operations.

RTSMADlib tool is used to deploy overall workflow. This particular model workflow requires features lookup feature to do the feature engineering. The MADlibFlow will automatically populates the cache elements needed in redis based on the input manifest.

### Pre-Requisites
    1. A running instance of Greenplum with Apache MADlib. If you do not have a Greenplum you can spin up the sandbox with single node instance in AWS/GCP/Azure marketplace or you can build docker image by following instrunctions in ....
    
    2. A running instance of Redis. If you do not have redis, you can use docker based image https://hub.docker.com/_/redis/
    
    3. Python Libraries: pandas, geopy, numpy, enlighten, ipython-sql, psycopg2, psycopg2-binary
    4. In tasklet 1; 
       - Modify database connection parameters
       - Modify the root_dir to MADlibFlowClient install folder.
    5. Modify MADlibflow model deployment JSON file        
         $RTSMADLIB_HOME/samples/credit_transactions/credit_transaction_flow.json
         1. Replace {DATABASE_HOST}, {PORT}, {DATABASE}, {USER}, {PASSWORD} with Greenplum connection information.
         2. Replace {REDIST_HOST}

### The notebook perform the below tasks

1. Connect to greenplum and setup the session
2. Create the schema and create all the tables needed
3. Run the data creator
4. Load the data to Greenpulm
5. Feature Engineering
6. Build and train model
7. Test Batch Score the model .
8. Operationalize model with MADlibflow


# Connect to Greenplum

In [None]:
import psycopg2               # Python-PostgreSQL Database Adapter - https://pypi.python.org/pypi/psycopg2
import pandas as pd           # Python Data Analysis Library - https://pandas.pydata.org/
import math  
import json

%load_ext sql

# PLEASE MODIFY THE BELOW AS PER YOUR GREENPLUM CLUSTER SETTINGS
database_host = '{HOST}'
database_databasename = '{DATABASE}'
database_username = '{USER}'
database_password = '{PASSWORD}'
database_port = '{PORT}'


# PLEASE MODIFY "root_dir" PATH TO YOUR RTS4MADlib INSTALL DIRECTORY
root_dir="/Users/sridharpaladugu/RTS4MADlib"
data_generator_path=root_dir+"/samples/CreditCardTransactionGenerator"
transactions_path=data_generator_path+"/data/"

try:
    connString = "host='{}' dbname='{}' user='{}' password='{}' port={}".format(database_host,database_databasename,database_username,database_password,database_port)
    # print connString
    conn = psycopg2.connect(connString)
    cur = conn.cursor()
    conn.autocommit = True
        
    connection = 'postgresql://{}:{}@{}:{}/{}'.format(database_username,database_password,database_host,database_port,database_databasename)
    %sql $connection

    message = "<span style='color:green'>**Connection successful!**</span>"
    print(message)
except Exception as e:
    message = "<span style='color:red'>**ERROR: Unable to connect to the database ({})**</span>".format(e)
    print(message) 

## Create Schema

In [None]:
%%sql

-- create working schema
DROP SCHEMA IF EXISTS credit_trans CASCADE;

CREATE SCHEMA credit_trans;

DROP TABLE IF EXISTS credit_trans.raw_accounts;

CREATE TABLE credit_trans.raw_accounts (
    raw_accounts json
);

DROP TABLE IF EXISTS credit_trans.accounts;

CREATE TABLE credit_trans.accounts (
    account_number text,
    expiration_date text,
    cvv text,
    card_type text,
    account_city text,
    account_city_alias text,
    account_state text,
    account_long float,
    account_lat float,
    account_transaction_radius integer,
    trxn_mean float,
    trxn_std float,
    account_id integer
);

DROP TABLE IF EXISTS credit_trans.raw_locations;

CREATE TABLE credit_trans.raw_locations (
    raw_locations json
);

DROP TABLE IF EXISTS credit_trans.locations;
CREATE TABLE credit_trans.locations (
    rlb_location_key text,
    merchant_name text,
    merchant_trxn_mean float,
    merchant_trxn_std float,
    merchant_city text,
    merchant_state varchar(2),
    merchant_long float,
    merchant_lat float,
    merchant_city_alias text,
    transaction_id integer,
    location_id integer
);

DROP TABLE IF EXISTS credit_trans.raw_transactions;

CREATE TABLE credit_trans.raw_transactions (
    account_id integer,
    account_lat double precision,
    account_long double precision,
    account_number text,
    card_type text,
    fraud_flag boolean,
    location_id integer,
    merchant_city text,
    merchant_city_alias text,
    merchant_lat double precision,
    merchant_long double precision,
    merchant_name text,
    merchant_state text,
    posting_date text,
    rlb_location_key text,
    transaction_amount double precision,
    transaction_date text,
    transaction_id integer
);


# Generate Test data. 
This generates some credidit transactions to develop model. The job defeault generates 100000 transactions, but if we want to generate more transactions please modify the parameter "transactionNumber: 100000" in  the file $RTSMADLIB_HOME/samples/CreditCardTransactionGenerator/myConfigs.xml

In [None]:
! cd $RTSMADLIB_HOME/samples/CreditCardTransactionGenerator; python Generator.py

# Load data from files

In [None]:
import psycopg2
import os

cur = conn.cursor()

with open(data_generator_path+"/accounts.json", 'r') as accountsFile:
#   accountsFile.readline()  # Skip the header row.
  cur.copy_from(accountsFile, 'credit_trans.raw_accounts')
  conn.commit()

with open(data_generator_path+"/locations.json", 'r') as locationsFile:
#   locationsFile.readline()  # Skip the header row.
  cur.copy_from(locationsFile, 'credit_trans.raw_locations')
  conn.commit()

datafiles = os.listdir(transactions_path)
for datafile in datafiles:
    print (datafile)
    if datafile.startswith("transactions_"):
        print ("loading file" + datafile +"............")
        with open(transactions_path+datafile, 'r') as f:
            f.readline()  # Skip the header row.
            cur.copy_from(f, 'credit_trans.raw_transactions', sep=',')
            conn.commit()


# ELT

In [None]:
%%sql

INSERT INTO credit_trans.accounts (
    SELECT raw_accounts->>'account_number' AS account_number
          ,raw_accounts->>'expiration_date' AS expiration_date
          ,raw_accounts->>'cvv' AS cvv
          ,raw_accounts->>'card_type' AS card_type
          ,raw_accounts->>'city' AS account_city
          ,raw_accounts->>'city_alias' AS account_city_alias
          ,(raw_accounts->>'state')::varchar(2) AS account_state 
          ,(raw_accounts->>'long')::float AS account_long
          ,(raw_accounts->>'lat')::float AS account_lat
          ,(raw_accounts->>'transaction_radius')::integer AS account_transaction_radius
          ,(raw_accounts->>'trxn_mean')::float AS trxn_mean
          ,(raw_accounts->>'trxn_std')::float AS trxn_std
          ,(raw_accounts->>'account_id')::integer AS account_id
    FROM (
        SELECT json_array_elements(raw_accounts)AS raw_accounts
        FROM credit_trans.raw_accounts
    ) foo
);
SELECT count(*) FROM credit_trans.accounts;

INSERT INTO credit_trans.locations (
    SELECT raw_locations->>'rlb_location_key' AS rlb_location_key
          ,raw_locations->>'merchant_name' AS merchant_name
          ,(raw_locations->>'merchant_trxn_mean')::float AS merchant_trxn_mean
          ,(raw_locations->>'merchant_trxn_std')::float AS merchant_trxn_std
          ,raw_locations->>'merchant_city' AS merchant_city
          ,(raw_locations->>'merchant_state')::varchar(2) AS merchant_state
          ,(raw_locations->>'merchant_long')::float AS merchant_long 
          ,(raw_locations->>'merchant_lat')::float AS merchant_lat
          ,raw_locations->>'merchant_city_alias' AS merchant_city_alias
          ,(raw_locations->>'transaction_id')::integer AS transaction_id
          ,(raw_locations->>'location_id')::integer AS location_id
    FROM (
        SELECT json_array_elements(raw_locations) AS raw_locations
        FROM credit_trans.raw_locations
    ) foo
);

SELECT count(*) FROM credit_trans.locations;

DROP TABLE IF EXISTS credit_trans.transactions;

CREATE TABLE credit_trans.transactions AS
SELECT 
    account_id,
    account_number,
    card_type,
    fraud_flag,
    location_id,
    merchant_city,
    merchant_city_alias,
    merchant_lat,
    merchant_long,
    merchant_name,
    merchant_state,
    to_timestamp(posting_date::float) AT TIME ZONE 'EST' AS posting_date,
    rlb_location_key,
    CASE WHEN transaction_amount < 0 THEN 0 ELSE transaction_amount END AS transaction_amount,
    to_timestamp(transaction_date::float) AT TIME ZONE 'EST' AS transaction_date,
    transaction_id
FROM credit_trans.raw_transactions
DISTRIBUTED RANDOMLY;

select count(*) from credit_trans.transactions


# Data Audit

## Summary Statistics

https://madlib.apache.org/docs/latest/group__grp__summary.html

In [None]:
%%sql

--drop existing table & run madlib summary stats function

DROP TABLE IF EXISTS credit_trans.transactions_summary;
SELECT madlib.summary('credit_trans.transactions','credit_trans.transactions_summary');

-- grab results from gpdb
SELECT * FROM credit_trans.transactions_summary;

--drop existing table & run madlib summary stats function
DROP TABLE IF EXISTS credit_trans.locations_summary;
SELECT madlib.summary('credit_trans.locations','credit_trans.locations_summary');

-- grab results from gpdb
SELECT * FROM credit_trans.locations_summary;

-- create join table for exploration
DROP TABLE IF EXISTS credit_trans.transactions_accounts;
CREATE TABLE credit_trans.transactions_accounts AS
SELECT t.*
      ,a.account_city
      ,a.account_city_alias 
      ,a.account_state 
      ,a.account_long
      ,a.account_lat
      ,a.account_transaction_radius
FROM credit_trans.transactions AS t
JOIN credit_trans.accounts AS a
USING (account_id);


# Data Exploration

### Create Joined Table For Exploration

In [None]:
%%sql

DROP TABLE IF EXISTS credit_trans.transactions_accounts;
CREATE TABLE credit_trans.transactions_accounts AS
SELECT t.*
      ,a.account_city
      ,a.account_city_alias 
      ,a.account_state 
      ,a.account_long
      ,a.account_lat
      ,a.account_transaction_radius
FROM credit_trans.transactions AS t
JOIN credit_trans.accounts AS a
USING (account_id);


# Feature Engineering

In [None]:
%%sql

-- transaction features

DROP TABLE IF EXISTS credit_trans.transactions_features;
CREATE TABLE credit_trans.transactions_features AS
SELECT transaction_id
      ,transaction_amount
      ,account_id
      ,location_id
      ,fraud_flag
FROM credit_trans.transactions
WHERE transaction_date > now() - interval '30 days';

-- merchant features

DROP TABLE IF EXISTS credit_trans.merchant_features;
CREATE TABLE credit_trans.merchant_features AS
SELECT l.merchant_name
      ,t.*
FROM credit_trans.locations AS l
JOIN (
  SELECT location_id
        ,merchant_state
        ,count(CASE WHEN fraud_flag = True THEN 1 ELSE null END) AS m_fraud_cases
        ,min(extract('hour' from transaction_date)) AS m_min_hour
        ,max(extract('hour' from transaction_date)) AS m_max_hour
        ,avg(transaction_amount) AS m_avg_transaction_amount
        ,min(transaction_amount) AS m_min_transaction_amount
        ,max(transaction_amount) AS m_max_transaction_amount
        ,stddev(transaction_amount) AS m_stddev_transaction_amount
        ,count(*) AS m_number_transactions
        ,count(distinct account_id) AS m_unique_accounts
        ,coalesce(count(CASE WHEN card_type = 'Discover' THEN 1 ELSE null END) / (count(*))::float,0) AS m_prop_discover_transactions
        ,coalesce(count(CASE WHEN card_type = 'Amex' THEN 1 ELSE null END) / (count(*))::float,0) AS m_prop_amex_transactions
        ,coalesce(count(CASE WHEN card_type = 'Diners' THEN 1 ELSE null END) / (count(*))::float,0) AS m_prop_diners_transactions
        ,coalesce(count(CASE WHEN card_type = 'MasterCard' THEN 1 ELSE null END) / (count(*))::float,0) AS m_prop_mastercard_transactions
        ,coalesce(count(CASE WHEN card_type = 'Visa' THEN 1 ELSE null END) / (count(*))::float,0) AS m_prop_visa_transactions
  FROM credit_trans.transactions
  WHERE transaction_date > now() - interval '30 days'
  GROUP BY 1, 2
) AS t
USING (location_id);

-- Account features

DROP TABLE IF EXISTS credit_trans.account_features;
CREATE TABLE credit_trans.account_features AS
SELECT a.account_state
      ,a.card_type
      ,t.*
FROM credit_trans.accounts AS a
JOIN (
  SELECT account_id
        ,min(extract('hour' from transaction_date)) AS a_min_hour
        ,max(extract('hour' from transaction_date)) AS a_max_hour
        ,avg(transaction_amount) AS a_avg_transaction_amount
        ,min(transaction_amount) AS a_min_transaction_amount
        ,max(transaction_amount) AS a_max_transaction_amount
        ,stddev(transaction_amount) AS a_stddev_transaction_amount
        ,count(*) AS a_number_transactions
  FROM credit_trans.transactions
  WHERE transaction_date > now() - interval '30 days'
  GROUP BY 1
) AS t
USING (account_id);


-- combined features

DROP TABLE IF EXISTS credit_trans.all_features;
CREATE TABLE credit_trans.all_features AS
SELECT *
      ,abs(t.transaction_amount - m.m_avg_transaction_amount) AS m_transaction_delta
      ,abs(t.transaction_amount - a.a_avg_transaction_amount) AS a_transaction_delta
FROM credit_trans.transactions_features AS t
JOIN credit_trans.merchant_features AS m
USING (location_id)
JOIN credit_trans.account_features AS a
USING (account_id);


## One Hot Encode Transaction Features

    https://madlib.apache.org/docs/latest/group__grp__encode__categorical.html

In [None]:
categoricalFeatures = ['merchant_name','merchant_state','m_min_hour','m_max_hour','account_state','card_type','a_min_hour','a_max_hour']
continuosFeatues = ['transaction_amount','m_fraud_cases','m_avg_transaction_amount','m_min_transaction_amount','m_max_transaction_amount','m_stddev_transaction_amount','m_number_transactions','m_unique_accounts','m_prop_discover_transactions','m_prop_amex_transactions','m_prop_diners_transactions','m_prop_mastercard_transactions','m_prop_visa_transactions','a_avg_transaction_amount','a_min_transaction_amount','a_max_transaction_amount','a_stddev_transaction_amount','a_number_transactions','m_transaction_delta','a_transaction_delta']
idColumns = ['account_id', 'location_id', 'transaction_id', 'fraud_flag']


# encode categorical features
query = """
    DROP TABLE IF EXISTS credit_trans.all_features_onehot, credit_trans.all_features_onehot_dictionary;
    SELECT madlib.encode_categorical_variables (
        'credit_trans.all_features',
        'credit_trans.all_features_onehot',
        '{}',
        NULL,
        '{}',
        NULL,
        'merchant_name=TravelCenters Of America, card_type=Diners',
        NULL,
        NULL,
        TRUE
    );
""".format(','.join(categoricalFeatures),','.join(continuosFeatues + idColumns))
cur.execute(query)

query = """
    SELECT *
    FROM credit_trans.all_features_onehot
    LIMIT 5
"""
cur.execute(query)

colnames = [desc[0] for desc in cur.description]

pivotFeatures = [c for c in colnames if c not in categoricalFeatures + continuosFeatues + idColumns] 

print(pivotFeatures)

pd.DataFrame(cur.fetchall(), columns=colnames)


# Model Training

    ## Training & Validation Sample Split
       
       https://madlib.apache.org/docs/latest/group__grp__train__test__split.html

In [None]:
%%sql
-- split training and validation set
-- we are careful not to include the same customer in both sets

DROP TABLE IF EXISTS credit_trans.model
                    ,credit_trans.model_train
                    ,credit_trans.model_test;                        
SELECT madlib.train_test_split(
    'credit_trans.all_features_onehot',
    'credit_trans.model',
    0.3,
    NULL,
    NULL,
    '*',
    FALSE,
    TRUE
);

SELECT fraud_flag
      ,count(*)
FROM credit_trans.model_train
GROUP BY 1
ORDER BY 1;

# Random Forest (MADlib)

    ## Train model
    
    https://madlib.apache.org/docs/latest/group__grp__random__forest.html

In [None]:
%%sql

-- train random forest model
DROP TABLE IF EXISTS credit_trans.rf_model, credit_trans.rf_model_summary, credit_trans.rf_model_group;
SELECT madlib.forest_train(
            'credit_trans.model_train',
            'credit_trans.rf_model',
            'transaction_id',
            'fraud_flag',
            'transaction_amount, m_fraud_cases, m_transaction_delta, a_transaction_delta, merchant_state_2',
            null,
            null,
            10::integer,
            4::integer,
            true::boolean,
            5::integer,
            10::integer,
            3::integer,
            1::integer,
            10::integer
        );



## Score Validation Data

https://madlib.apache.org/docs/latest/group__grp__random__forest.html

In [None]:
%%sql

-- Create merchants table for caching
DROP FUNCTION IF EXISTS credit_trans.mf_merchants();
CREATE OR REPLACE FUNCTION credit_trans.mf_merchants()
RETURNS Integer AS
$BODY$

DECLARE
    query TEXT;
BEGIN
    query := $$
                DROP TABLE IF EXISTS credit_trans.mf_merchants;
                CREATE TABLE credit_trans.mf_merchants AS
                SELECT location_id
                      ,count(CASE WHEN fraud_flag = true THEN 1 ELSE null END) AS m_fraud_cases
                      ,avg(transaction_amount) AS m_avg_transaction_amount
                FROM credit_trans.locations
                LEFT JOIN credit_trans.transactions
                USING (location_id)
                WHERE transaction_date > now() - interval '30 days'
                GROUP BY location_id;
            $$;

    EXECUTE query;
    RETURN 1;
      
    EXCEPTION
      	WHEN OTHERS THEN
      		RETURN -1;
  END;    		
  $BODY$
  LANGUAGE plpgsql VOLATILE;

-- test function
SELECT credit_trans.mf_merchants();


In [None]:
%%sql

-- Create accounts table for caching
DROP FUNCTION IF EXISTS credit_trans.mf_accounts();
CREATE OR REPLACE FUNCTION credit_trans.mf_accounts()
RETURNS Integer AS
$BODY$
DECLARE
    query TEXT;
BEGIN
    query := $$
                DROP TABLE IF EXISTS credit_trans.mf_accounts;
                CREATE TABLE credit_trans.mf_accounts AS
                SELECT account_id
                      ,avg(transaction_amount) AS a_avg_transaction_amount
                FROM credit_trans.accounts
                LEFT JOIN credit_trans.transactions
                USING (account_id)
                WHERE transaction_date > now() - interval '30 days'
                GROUP BY account_id;
            $$;

    EXECUTE query;
    RETURN 1;
      
    EXCEPTION
      	WHEN OTHERS THEN
      		RETURN -1;
  END;    		
  $BODY$
  LANGUAGE plpgsql VOLATILE;

-- test function
SELECT credit_trans.mf_accounts();


In [None]:
%%sql
DROP TABLE IF EXISTS credit_trans.feature_engine_test;
CREATE TABLE credit_trans.feature_engine_test AS

SELECT t.*
      ,m.m_fraud_cases
      ,abs(t.transaction_amount - m.m_avg_transaction_amount) AS m_transaction_delta
      ,abs(t.transaction_amount - a.a_avg_transaction_amount) AS a_transaction_delta
      ,CASE WHEN t.merchant_state = 'RS' THEN 1 ELSE 0 END AS merchant_state_2
FROM credit_trans.transactions AS t
JOIN credit_trans.mf_merchants AS m USING (location_id)
JOIN credit_trans.mf_accounts AS a USING (account_id);

In [None]:
%%sql

-- Score out-of-sample
DROP TABLE IF EXISTS credit_trans.model_test_scored;
SELECT madlib.forest_predict('credit_trans.rf_model',
                             'credit_trans.feature_engine_test',
                             'credit_trans.model_test_scored',
                             'prob');
                
DROP TABLE IF EXISTS credit_trans.model_test_scored_tmp;

CREATE TABLE credit_trans.model_test_scored_tmp AS
SELECT *
FROM credit_trans.model_test_scored
JOIN credit_trans.model_test
USING (transaction_id);
DROP TABLE credit_trans.model_test_scored;
ALTER TABLE credit_trans.model_test_scored_tmp RENAME TO model_test_scored;


# MADlibFlow

## Operationalize the Credit Fraud model 

The required input files for model, feature, feature cache are in the $MADLIBFLOW_CLIENT/samples/credit_transactions folder.
The file "credit_transaction_flow.json" defines the deployment workflow.

In [None]:
! rts4madlib --help

In [None]:
#  deployment specification for madlib model
model=  {
		"modeldb-datasource.jdbc-url": "jdbc:postgresql://{HOST}:{PORT}/{DATABASE}",
		"modeldb-datasource.userName": "{USER}",
		"modeldb-datasource.password": "{PASSWORD}",
		"madlibrest.modelname": "Credit_fraud_randomforest_model",
		"madlibrest.modeldescription": "Random Forest Classification Example credit transactions",
		"madlibrest.modelschema": "credit_trans",
		"madlibrest.modeltables": [
			"rf_model",
			"rf_model_group",
			"rf_model_summary"
		],
		"madlibrest.modelinputtable": "feature_engine_test",
		"madlibrest.modelquery": "SELECT madlib.forest_predict('credit_trans.rf_model', 'credit_trans.feature_engine_test', 'credit_trans.model_test_scored', 'prob')",
		"madlibrest.resultstable": "model_test_scored",
		"madlibrest.resultsquery": "SELECT * from  credit_trans.model_test_scored, credit_trans.feature_engine_test"
	}

In [None]:
#  deployment specification for madlib feature engine

featuresengine =  {
		"spring.profiles.active": "redis",
        "modeldb-datasource.jdbc-url": "jdbc:postgresql://{HOST}:{PORT}/{DATABASE}",
		"modeldb-datasource.userName": "{USER}",
		"modeldb-datasource.password": "{PASSWORD}",
		"redis": {
			"clustertype": "standlone",
			"hostname": "{REDIS__HOST}"
			"port": 6379
		},
		"feature-engine":{
            "featurename": "CreditTransactionFeatures",
            "featuredescription": "Credit Transaction Features",
            "featuresschema": "credit_trans",
            "payloadtable": "message",
            "featurequery": "SELECT t.* ,m.m_fraud_cases ,abs(t.transaction_amount - m.m_avg_transaction_amount) AS m_transaction_delta,abs(t.transaction_amount - a.a_avg_transaction_amount) AS a_transaction_delta, CASE WHEN t.merchant_state = 'RS' THEN 1 ELSE 0 END AS merchant_state_2 FROM credit_trans.message AS t JOIN credit_trans.mf_merchants AS m USING (location_id) JOIN credit_trans.mf_accounts AS a USING (account_id)",
            "cacheenabled": "true",
            "cacheentities": {
                "mf_accounts": "account_id",
                "mf_merchants": "location_id"
            }
        }
	}


In [None]:
#  deployment specification for madlib feature cache 

featurecache =  {
		"spring.profiles.active": "redis",
		"redis": {
			"clustertype": "standlone",
			"hostname": "{REDIS__HOST}",
			"port": 6379
		},
		"modeldb-datasource": {
			"jdbc-url": "jdbc:postgresql://{HOST}:{PORT}/{DATABASE}",
			"username": "{USER}",
			"password": "{PASSWORD}"
		},
		"feature-cache": {
			"featurename": "CreditTransactionFeaturesCache",
			"featuresourceschema": "credit_trans",
            "featurefunctions": [
                "mf_accounts",
                "mf_merchants"
            ],
			"featuresourcetables": {
				"mf_accounts": "account_id",
				"mf_merchants": "location_id"
			}
		}
	}


In [None]:
myconfig=json.dumps ({
	"deploy-packages": [
		"model",
		"feature-engine",
		"featurecache"
	],
	"model": model,
	"feature-engine": featuresengine,
	"featurecache": featurecache
}
)


with open("config.json", "w") as f:
    f.write(myconfig)
    

In [None]:
! echo "Deploying Credit Model to Docker......"
! source ~/.bash_profile
! rts4madlib --name credit_fraud --type flow --action deploy --target docker --inputJson config.json

## Testing - Madlibflow containers

The log files of deployment should show the service end points for each container. We use the endpoint to test. 
The below tests the information end point on the model container.

## Inspect model container
Please use the port from the above docker output

In [None]:
! curl -v -H "Content-Type:application/json" -X GET http://127.0.0.1:8085/actuator/info

## Inspect feature_engine container
Please use the port from the above docker output

In [None]:
! curl -v -H "Content-Type:application/json" -X GET http://127.0.0.1:8185/actuator/info

## Inspect feature_cache container
Please use the port from the above docker output

In [None]:
! curl -v -H "Content-Type:application/json" -X GET http://127.0.0.1:8285/actuator/info

##### Create a test payload to test the prediction

In [None]:
# Select random record to test scoring
df = %sql SELECT * FROM credit_trans.transactions ORDER BY RANDOM() LIMIT 5;
df = df.DataFrame()
df['posting_date'] = df['posting_date'].astype('str')
df['transaction_date'] = df['transaction_date'].astype('str')
js = json.dumps(json.loads(df.to_json(orient='records'))[2])
print js

### run the REST end point for prediction
Please use the port from the above docker output

In [None]:
! curl -v -H "Content-Type:application/json" -X POST http://localhost:8494/predict  -d '{js}'

## Undeploy model

In [None]:
! docker ps

In [83]:
! rts4madlib --name credit_fraud --type flow --action undeploy --target docker 
! rts4madlib --name credit_fraud --type featurecache --action undeploy --target docker 

deployName => credit_fraud
un-deploying container credit_fraud_rts-for-madlib-model_1
fa03c2387fe3
fa03c2387fe3
un-deploying container credit_fraud_rts-for-madlib-featuresengine_1
385ef2eb0388
385ef2eb0388
un-deploying container credit_fraud_rts-for-madlib-mlmodelflow_1
6133d189a8cf
6133d189a8cf
deployName => credit_fraud
un-deploying container credit_fraud_rts-for-madlib-featurescachemanager_1
4c10431e3163
4c10431e3163


In [84]:
!docker ps

CONTAINER ID        IMAGE               COMMAND                   CREATED             STATUS              PORTS                            NAMES
2ad10245a5df        redis               "docker-entrypoint.s…"    2 hours ago         Up 2 hours          6379/tcp                         my-redis
d1a1296fa7b3        gpdb5214:1.0        "/bin/sh -c 'echo \"1…"   11 hours ago        Up 11 hours         22/tcp, 0.0.0.0:9432->5432/tcp   beautiful_zhukovsky
