AutoML Proof of Concept

This is a Docker compose based sandbox for learning how to operationalize a Google Cloud hosted, tabular ML classification model. The topic problem is credit card fraud detection. The model was trained on a dataset that was generated by https://github.com/namebrandon/Sparkov_Data_Generation with minor modifications, and the "real" transactions that are used to test the operationalized system come from the same source.

The main components are as follows:

File	Description
compose.yaml	Docker compose file describing how to run all components locally
retrieve_gcp_creds.sh	Logs in to Google Cloud and retrieves a credentials file, `application_default_credentials.json`. Do not check this file in.
card-fraud.proto	Protobuf definition of an authorization request
event-sender	Contains the python program, `eventsender.py` for sending events to Hazelcast via map.put()
config/hazelcast.yaml	The configuration used by the Hazelcast instance
scoring-pipeline	Java code for the prediction pipeline
submitjob.sh	Helper script to deploy the pipeline to Hazelcast via the Hazelcast CLI

Instructions

This assumes a tabular classification ML model has already been trained and deployed to Google Cloud.

Build the java project.

cd scoring-pipeline
mvn clean package

Generate the card transaction data.

This is the data that will be sent to Hazelcast for fraud detection. Clone https://github.com/wrmay/Sparkov_Data_Generation, which is a fork of the original that uses commas instead of pipes for separating data. This was a requirement of Google's AutoML.

Generate the data with something similar to the following:

python -m venv venv
. venv/bin/activate
pip install -r requirements.txt
python datagen.py -n 10000 -o data 01-01-2022 01-31-2022

Note: attempts to generate less than 8 days of data will fail, pick the start and end date accordingly

Use event-sender/csv_sort.py to restore all of the *nnnnn.csv files by the linux timestamp column and remove the header row (needs documentation).

Copy all of the generated data files into the data/transactions_for_generator project.

Start the Simulation

docker compose up -d

The management center should be available at localhost:8080

Preparation

Obtain Google Cloud credentials. By default, any authenticated user can access the model.

./retrieve_gcp_creds.sh

This should create a file, application_default_credentials.json, which you should not check in to github.

Obtain the project, location and endpoint id of the tabular, classification model endpoint you will access. For example: "hazelcast-33", "us-central1", "4731246912831750144". Edit submitjob.sh accordingly.

Submit the Pipeline

./submitjob.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config

config

data

data

event-sender

event-sender

scoring-pipeline

scoring-pipeline

.gitignore

.gitignore

README.md

README.md

card-fraud.proto

card-fraud.proto

compose.yaml

compose.yaml

retrieve_gcp_creds.sh

retrieve_gcp_creds.sh

submitjob.sh

submitjob.sh

Repository files navigation

AutoML Proof of Concept

Instructions

Build the java project.

Generate the card transaction data.

Start the Simulation

Preparation

Submit the Pipeline

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
config		config
data		data
event-sender		event-sender
scoring-pipeline		scoring-pipeline
.gitignore		.gitignore
README.md		README.md
card-fraud.proto		card-fraud.proto
compose.yaml		compose.yaml
retrieve_gcp_creds.sh		retrieve_gcp_creds.sh
submitjob.sh		submitjob.sh

hazelcast/automl-poc

Folders and files

Latest commit

History

Repository files navigation

AutoML Proof of Concept

Instructions

Build the java project.

Generate the card transaction data.

Start the Simulation

Preparation

Submit the Pipeline

About

Resources

Stars

Watchers

Forks

Languages