Skip to content

kiconiaworks/igata

Repository files navigation

igata README

This project provides a inference infrastructure for easily preparing and deploying a trained model and wrapping the model in a way that easily allows input/output to various services (for example: SQS, S3, DyanamoDB).

The goal is to separate where/how data is stored with how it's processed.

Usage

  1. Build and train your model,

  2. Implement your model for prediction by sub-classing igata.predictors.PredictorBase and implementing the required methods.

    -Available Methods: - preprocess_input(input_record, meta) - predict(input_record, meta) [REQUIRED] - postprocess_output(_prediction_result)

    NOTE: Currently igata only supports images (png|jpg) as inputs from S3, or SQS/S3 input_record is provided to the preprocess_input() method as a numpy.array.

Execution

Once you have a Predictor class sub-classing igata.predictors.PredictorBase, prepare a DockerFile to build the combined image.

Execution is performed through the igata.cli entry point. Environment Variables are used to control the input/output managers to use. (See sections below)

The entry point for execution is through igata.cli

Example:

PREDICTOR_MODULE=dummypredictor.predictors PREDICTOR_CLASS_NAME=DummyPredictorNoInputNoOutput OUTPUT_CTXMGR_SQS_QUEUE_URL=http://localhost:4576/queue/test-queue pipenv run python -m igata.cli s3://test-bucket/720503_273_2014960_tn.jpg

Environment Variables

Environment variables are used to control the input/output for a given predictor.

The following environment variables can be used to control a built image executor.

General Environment Variables

  • LOG_LEVEL set output log level, DEBUG, INFO, WARNING
  • PREDICTOR_MODULE: Dotted path to module containing user-defined Predictor class (Ex: 'mypackage.submodule')
  • PREDICTOR_CLASS_NAME: [DEFAULT="Predictor"] User-defined Predictor class name that subclasses igata.predictors.PredictorBase (Ex: "MyPredictor")

Input Context Manager Environment Variables

  • INPUT_CONTEXT_MANAGER

Available Input Context Manager(s):

  • 'S3BucketImageInputCtxManager': [DEFAULT] Pulls IMAGE inputs from s3 bucket/key given a list of s3Uris (Ex: s3://bucket/my/key.png)

    • Required Option(s) Environment Variables: None
  • 'SQSMessageS3InputImageCtxManager':

    • Required Option(s) Environment Variables:
      • INPUT_CTXMANAGER_SQS_QUEUE_URL: Queue Url form which to retrieve messages from
  • 'SQSMessageS3InputCSVCtxManager':

    • Required Option(s) Environment Variables:
      • INPUT_CTXMANAGER_SQS_QUEUE_URL: Queue Url form which to retrieve messages from

SQSMessageS3InputImageCtxManager SQS message Format

  schema:
    type: array
    items:
      properties:
        collection_id:
          type: string
          description: 親ID
          example: 'events:1234'
        image_id:
          type: string
          description: 画像ID
          example: 'images:1234'
        s3_uri:
          type: string
          description: 画像のS3オブジェクトURI
          format: url
          example: 's3://bucket/image.jpg'
        sns_topic_arn:
          type: string
          description: 解析処理の完了を通知するSNSトピックのARN
          example: 'arn:aws:sns:*:123456789012:notify_complete'
      required:
        - collection_id
        - image_id
        - s3_uri

SQSMessageS3InputCSVCtxManager SQS message Format

  schema:
    type: array
    items:
      properties:
        collection_id:
          type: string
          description: 親ID
          example: 'cf2609fe-20d8-44a4-8386-3d925926c512'
        file_id:
          type: string
          description: ファイル特定ID
          example: '4c1bec6e-34ae-4917-a96f-1cdc298cba65'
        s3_uri:
          type: string
          description: 画像のS3オブジェクトURI
          format: url
          example: 's3://bucket/image.jpg'
        sns_topic_arn:
          type: string
          description: 解析処理の完了を通知するSNSトピックのARN
          example: 'arn:aws:sns:*:123456789012:notify_complete'
      required:
        - collection_id
        - image_id
        - s3_uri

Output Context Manager Environment Variables

  • OUTPUT_CONTEXT_MANAGER: Defines the OutputCtxManager to use. (See 'Available Output Context Managers below)

  • RESULT_RECORD_CHUNK_SIZE: Defines the number of records that are cached before being sent to the OutputCtxManager's put_records() method.

Available Output Context Manager(s):

  • 'SQSRecordOutputCtxManager': [DEFAULT] Output Predictor results to an SQS Message Queue

    • Required Option(s) Environment Variables:
      • OUTPUT_CTXMGR_SQS_QUEUE_URL: (str) Url to the result output sqs queue
  • 'S3BucketCsvFileOutputCtxManager'

    • Required Option(s) Environment Variables:
      • OUTPUT_CTXMGR_OUTPUT_S3_BUCKET: (str) Bucket name of output bucket
      • OUTPUT_CTXMGR_FIELDNAMES: (str) comma separated list of values defining the header fieldnames (Ex: "header1,header2,header3"
  • 'DynamodbOutputCtxManager'

    • Required Option(s) Environment Variables:
      • RESULTS_ADDITIONAL_PARENT_FIELDS: (str) comma separated fields to include from parent record to include in result
      • RESULTS_SORTKEY_KEYNAME: (str) The field name of the dynamodb RESULTS Table sort-key (required to output to the result to the dynamodb results table)
      • REQUESTS_TABLE_HASHKEY_KEYNAME: (str) field name of the dynamodb REQUESTS Table hash-key.
      • REQUESTS_TABLE_RESULTS_KEYNAME: (str) field name that defines the JSON results field content
      • OUTPUT_CTXMGR_REQUESTS_TABLENAME: (str) Dynamodb REQUESTS Table name, 'state' field will be updated
      • OUTPUT_CTXMGR_RESULTS_TABLENAME: (str) Dynamodb RESULTS Table name. Will be populated with flattened results of the model result dictionary

DynamodbOutputCtxManager Table(s) Structure

REQUESTS Table
AttributeName Type Is HASHKEY Is RANGEKEY GSI HASH_KEY GSI RANGEKEY
request_id S
collection_id S
state S

GSI projection_type = ALL

RESULTS Table
AttributeName Type Is HASHKEY Is RANGEKEY GSI HASH_KEY GSI RANGEKEY
hashkey S
s3_uri S
collection_id S
valid_number S

GSI projection_type = ALL

Local Development

Python: 3.7

Requires pipenv for dependency management Install with pip install pipenv --user

Install the local development environment

  1. Setup pre-commit hooks (black, isort):

    # assumes pre-commit is installed on system via: `pip install pre-commit`
    pre-commit install
  2. The following command installs project and development dependencies:

    pipenv install --dev
    

Run code checks

To run linters:

# runs flake8, pydocstyle
make check

To run type checker:

make mypy

Running tests

This project uses pytest for running testcases.

NOTE: localstack is used for local aws service tests

.env for local testing:

S3_ENDPOINT=http://localhost:4572
SQS_ENDPOINT=http://localhost:4576
SQS_OUTPUT_QUEUE_NAME=test-output-queue
SNS_ENDPOINT=http://localhost:4575
DYNAMODB_ENDPOINT=http://localhost:4569
LOG_LEVEL=DEBUG
SQS_VISIBILITYTIMEOUT_SECONDS_ON_EXCEPTION=0

Tests cases are written and placed in the tests directory.

To run the tests use the following command:

docker-compose up -d
pytest -v

In addition the following make command is available:

make test-local

CI/CD Required Environment Variables

The following are required for this project to be integrated with auto-deploy using the github flow branching strategy.

With github flow master is the release branch and features are added through Pull-Requests (PRs) On merge to master the code will be deployed to the production environment.

S3_ENDPOINT=http://localhost:4572
SQS_ENDPOINT=http://localhost:4576
SQS_OUTPUT_QUEUE_NAME=test-output-queue