Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

heroku/predictionio-engine-ur

Repository files navigation

โš ๏ธ This project is no longer active. No further updates are planned.

PredictionIO Universal Recommender for Heroku

A fork of the Universal Recommender version 0.5.0 deployable with the PredictionIO buildpack for Heroku. Due to substantial revisions to support Elasticsearch on Heroku, this fork lags behind the main UR; conceptual differences beyond version 0.5.0 are listed in the UR release log.

The Universal Recommender (UR) is a new type of collaborative filtering recommender based on an algorithm that can use data from a wide variety of user taste indicatorsโ€”it is called the Correlated Cross-Occurrence algorithm. โ€ฆCCO is able to ingest any number of user actions, events, profile data, and contextual information. It then serves results in a fast and scalable way. It also supports item properties for filtering and boosting recommendations and can therefor be considered a hybrid collaborative filtering and content-based recommender.

โ€”upstream docs

The Heroku app depends on:

Demo Story ๐Ÿธ

This engine demonstrates recommendation of items for a mobile phone user based on their purchase history. The model is trained with a small example data set.

How To ๐Ÿ“š

โœ๏ธ Throughout this document, code terms that start with $ represent a value (shell variable) that should be replaced with a customized value, e.g $ENGINE_NAMEโ€ฆ

  1. โš ๏ธ Requirements
  2. ๐Ÿš€ Demo Deployment
    1. Create the app
    2. Configure the app
    3. Provision Elasticsearch
    4. Provision Postgres
    5. Import data
    6. Deploy the app
    7. Scale up
    8. Retry release
    9. Diagnostics
  3. ๐ŸŽฏ Query for predictions
  4. ๐Ÿ› ย Local development
    1. Import sample data
    2. Run pio
    3. Query the local engine
  5. ๐ŸŽ› Configuration options

Requirements

Demo Deployment

Adaptation of the normal PIO engine deployment.

Create the app

git clone \
  https://github.com/heroku/predictionio-engine-ur.git \
  pio-engine-ur

cd pio-engine-ur

heroku create $ENGINE_NAME
heroku buildpacks:add https://github.com/heroku/predictionio-buildpack.git

Configure the app

heroku config:set \
  PIO_EVENTSERVER_APP_NAME=ur \
  PIO_EVENTSERVER_ACCESS_KEY=$RANDOM-$RANDOM-$RANDOM-$RANDOM-$RANDOM-$RANDOM \
  PIO_UR_ELASTICSEARCH_CONCURRENCY=1

Provision Elasticsearch

heroku addons:create bonsai --as PIO_ELASTICSEARCH --version 5.4

Ensure the --version you specify is a currently supported version.

In the Bonsai add-on's dashboard, verify that Elasticsearch is really the requested version. Only versions greater than 5.1 will work with this Heroku app. Caution: it's easy to accidentally provision the wrong version.

Provision Postgres

heroku addons:create heroku-postgresql:hobby-dev
  • Use a higher-level, paid plan for anything but a small demo.
  • hobby-basic is the smallest paid heroku-postgresql plan

Import data

Initial training data is automatically imported from data/initial-events.json.

๐Ÿ‘“ When you're ready to begin working with your own data, read about strategies for importing data.

Deploy the app

git push heroku master

# Follow the logs to see training & web start-up
#
heroku logs -t

โš ๏ธ Initial deploy will probably fail due to memory constraints. Proceed to scale up.

Scale up

Once deployed, scale up the processes to avoid memory issues:

heroku ps:scale \
  web=1:Standard-2X \
  release=0:Performance-L \
  train=0:Performance-L

๐Ÿ’ต These are paid, professional dyno types

Retry release

When the release (pio train) fails due to memory constraints or other transient error, you may use the Heroku CLI releases:retry plugin to rerun the release without pushing a new deployment:

# First time, install it.
heroku plugins:install heroku-releases-retry

# Re-run the release & watch the logs
heroku releases:retry
heroku logs -t

Query for predictions

Once deployment completes, the engine is ready to recommend of items for a mobile phone user based on their purchase history.

Get all recommendations for a user:

# an Android user
curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
     -H "Content-Type: application/json" \
     -d $'{"user": "100"}'
# an iPhone user
curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
     -H "Content-Type: application/json" \
     -d $'{"user": "200"}'

Get recommendations for a user, excluding phones:

curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
     -H "Content-Type: application/json" \
     -d $'{
            "user": "100",
            "fields": [{
              "name": "category",
              "values": ["phone"],
              "bias": 0
            }]
          }'

Get accessory recommendations for a user excluding phones & boosting power-related items:

curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
     -H "Content-Type: application/json" \
     -d $'{
            "user": "100",
            "fields": [{
              "name": "category",
              "values": ["phone"],
              "bias": 0
            },{
              "name": "category",
              "values": ["power"],
              "bias": 1.5
            }
          }'

For a user with no purchase history, the recommendations will be based on popularity:

curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
     -H "Content-Type: application/json" \
     -d $'{"user": "000"}'

Get recommendations based on similarity with an item:

curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
     -H "Content-Type: application/json" \
     -d $'{"item": "101"}'

Get recommendations for a user boosting on similarity with an item:

curl -X "POST" "http://$ENGINE_NAME.herokuapp.com/queries.json" \
     -H "Content-Type: application/json" \
     -d $'{
            "user": "100",
            "item": "101"
          }'

๐Ÿ‘“ See the main Universal Recommender query docs for more parameters. Please note those docs have been updated for the newest version 0.6.0, but this repo provides version 0.5.0. Differences are listed in the UR release log.

Local Development

Clone Engine Template

Start in this repo's working directory. If you don't already have it cloned, then do it now:

git clone \
  https://github.com/heroku/predictionio-engine-ur.git \
  pio-engine-ur

cd pio-engine-ur

Set-up PredictionIO

โžก๏ธ Setup local development including Elasticsearch.

bin/pio status should succeed when this setup is complete.

Import sample data

bin/pio app new ur
PIO_EVENTSERVER_APP_NAME=ur data/import-events -f data/initial-events.json

Run pio

bin/pio build
bin/pio train -- --driver-memory 2500m
bin/pio deploy

Query the local engine

curl -X "POST" "http://127.0.0.1:8000/queries.json" \
     -H "Content-Type: application/json" \
     -d $'{
            "user": "100",
            "fields": [{
              "name": "category",
              "values": ["phone"],
              "bias": 0
            }]
          }'

Configuration

  • PIO_UR_ELASTICSEARCH_CONCURRENCY
    • may increase in-line with the Bonsai Add-on plan's value for Concurrent Indexing
    • the max for a dedicated Elasticsearch cluster is "unlimited", but in reality set it to match the number of Spark executor cores
  • PIO_UR_ELASTICSEARCH_INDEX_REPLICAS
    • more replicas may improve concurrent search performance
    • should increase in-line with the number of Elasticsearch nodes (n-1) in the cluster
    • takes effect after the next training, when a new index is inserted