Skip to content

Prediction Nodes

Noah Harasz edited this page Oct 21, 2021 · 4 revisions

A Prediction Node represents a cloud-based model that is encapsulated such that Numerai can trigger it for predictions; it is designed to be scalable, resource efficient, and easy to configure and debug. This section details what each file does in the Numerai Examples and the important components of a Prediction Node.

Numerai Examples

The examples show how to structure one Prediction Node. You can review the layout of an example after running numerai copy-example to copy the files to your computer.

Python

numerai-python3
├── Dockerfile
├── .dockerignore
├── predict.py
├── requirements.txt
└── train.py
  • Dockerfile: Used during numerai node deploy to build a Docker image that's used to run your code in the cloud. It copies all files in its directory, installs Python requirements for requirements.txt, and runs python predict.py by default.

  • .dockerignore: This file uses regex to match files that should not be included in the Docker image.

  • train.py: This is an extra entry point specifically for training, it's used when running numerai node test --local --command "python train.py"

  • requirements.txt: Defines python packages required to run the code.

  • predict.py: Gets run by default locally and in the cloud when running numerai test without the --command|-c option.

RLang

numerai-rlang
├── Dockerfile
├── .dockerignore
├── install_packages.R
└── main.R
  • Dockerfile: Used during numerai node deploy to build a Docker image that's used to run your code in the cloud. It copies all files in its directory, installs Rlang requirements from install_packages.R, and runs main.R by default.

  • .dockerignore: This file uses regex to match files that should not be included in the Docker image.

  • install_packages.R: Installs dependencies necessary for running the example.

  • main.R: Ran by default locally and in the cloud and when running numerai test without the --command|-c option.

Prediction Nodes

Prediction Nodes use a few important components like a Dockerfile, a Trigger, a Container, and a Compute Cluster, all of which can be created when following the quickstart in the main README.

The Dockerfile

This is the most important component of deploying a Prediction Node. It is a program (much like a bash script), that packages up your code as an image; this image contains everything your code needs to run in the cloud. The most typical case of a Dockerfile is demonstrated in the Numerai Examples, if you're not sure how to use a Dockerfile, first copy an example with numerai copy-example, then read the documentation in the Dockerfile to learn the basics.

These files are very flexible, the default Dockerfile will just copy everything in whatever directory it is in, but this can be customized if you'd like to share code between models. For example, if you have a python project setup like so:

numerai_models
├── common
├──── __init__.py
├──── data.py
├──── setup.py
├── model_1
├──── Dockerfile
├──── .dockerignore
├──── predict.py
├──── requirements.txt
└──── train.py

Where common is an installable python package you want to use in multiple models, you can add this line to model_1/Dockerfile: RUN pip install ../common/. Finally, run numerai node deploy from the numerai_models directory to install the package in the image, making it available to your model code.

If you want to learn more about how to customize this file [checkout the Dockerfile reference] (https://docs.docker.com/engine/reference/builder/).

Cloud Components

The CLI uses Terraform to provision cloud resources. These cloud resources generally provide the following components on all cloud providers:

  • Trigger: A small function that schedules a "task" on your Compute Cluster. This "task" handles pulling the image that was created by the Dockerfile and running it as a Container on your Compute Cluster.

  • Container: The thing that actually contains and runs your code on a computer provisioned by the Compute Cluster. The --size (or -s) flag on the numerai node config sets the CPU and Memory limits for a Container.

  • Compute Cluster: A handler that accepts scheduled "tasks" and spins up and down computers to run Containers.

For detailed architecture breakdowns of our supported Cloud providers go to their individual Wiki pages:

Clone this wiki locally