Skip to content

Contextual Multi-Armed Bandit Reward Tracker & Model Trainer

License

Notifications You must be signed in to change notification settings

improve-ai/tracker-trainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contextual Multi-Armed Bandit Item/Reward Tracker & Model Trainer

The Improve AI Tracker/Trainer is a stack of serverless components that trains updated contextual multi-armed bandit models for scoring, ranking, and decisions. The stack runs on AWS to cheaply and easily track JSON items and their rewards from Improve AI libraries. These rewards are joined with the tracked items that they're associated with and used as input to training new scoring and ranking models.

Deployment

Fork this repo

Make a private fork of this repo. This way your model configuration is stored in revision control.

Install the Serverless Framework

$ npm install -g serverless

Install NPM Dependencies

$ npm install

Configure Models and Training Parameters

$ nano config/config.yml

Deploy the Stack

Deploy a new dev stage in us-east-1

$ serverless deploy --stage dev
Deploying improveai-acme-demo to stage dev (us-east-1)

✔ Service deployed to stack improveai-acme-demo-dev (111s)

endpoint: https://xxxx.lambda-url.us-east-1.on.aws/

The output of the deployment will list the track endpoint URL like https://xxxx.lambda-url.us-east-1.on.aws. The track endpoint URL may be used directly by the client SDKs to track decisions and rewards. Alternately, a CDN may be configured in front of the track endpoint URL for greater administrative control.

The deployment will also create a models S3 bucket in the form of improveai-{organization}-{project}-{stage}-models. After each round of training, updated models are automatically uploaded to the models bucket.

The models bucket is private by default. Make the '/models/latest/' directory public to serve models directly from S3. Alternatively, a CDN may be configured in front of the models S3 bucket.

Model URLs follow the template of https://{modelsBucket}.s3.amazonaws.com/models/latest/{modelName}.{mlmodel|xgb}.gz. The Android and Python SDKs use .xgb.gz models and the iOS SDK uses .mlmodel.gz models.

Integrate a Ranker Library

Improve AI libraries are currently available for Swift, Java, and Python.

Algorithm

The reinforcement learning algorithm is a contextual multi-armed bandit with XGBoost acting as the core regression algorithm. As such, it is ideal for making decisions on structured data, such as JSON or native objects in Swift/Objective-C, Java/Kotlin, and Python. Unlike deep reinforcement learning algorithms, which often require simulator environments and hundreds of millions of decisions, this algorithm performs well with the more modest amounts of data found in real world applications. Compared to A/B testing it requires exponentially less data for good results.