Here is the prototype for JellyBean, a system for serving and optimizing machine learning inference workflows on heterogeneous infrastructures.
Here we provide an example to run JellyBean's query optimizer for the medium infrastructure setup, on VQA dataset, and execute the workflow with the query processor.
We provide the sampled dataset and pretrained models of all the variants we used.
The dataset is a processed version of the following two datasets:
The dataset can be obtained from:
Place the dataset in VQA/data/VQA_Workflow_Datasets
The models for VQA are trained using the code repo Our pretrained models can be obtained from:
The models should be placed in VQA/python/trained_models/variants/
The input data for the query optimizer are located in VQA/examples/optimizer_inputs
We already provided in the optimizer's inputs the profile results for our medium infrastructure setup. But if you want to profile the models on an worker of your own: To compute average message (model output) sizes:
cd VQA
python python/models_profiler/ --dataset data/VQA_Workflow_Datasets --output ~/VQAWorkflowProfile
Then we profile the execution latency of the models
cd VQA
python python/models_profiler/ --dataset data/VQA_Workflow_Datasets --output ~/VQAWorkflowProfile --device {DEVICE}
python python/models_profiler/ --output ~/VQAWorkflowProfile --device {DEVICE}
python python/models_profiler/ --dataset data/VQA_Workflow_Datasets --output ~/VQAWorkflowProfile --device {DEVICE}
here {DEVICE}
is the device identifier on the worker, e.g., cuda:0
, cpu
To run query optimizer using our provided input data:
python optimizer/ --inputs VQA/examples/optimizer_inputs/ --config VQA/examples/optimizer_inputs/config.yaml
It outputs the worker assginment and estimated compute and communication costs:
"communication_cost": 1.2667783071018048,
"compute_cost": 6.242,
"end_to_end_accuracy": 56.59,
"model_assignment": {
"ExtractImageFeature": "resnet18",
"SpeechRecognition": "wav2vec2-large-960h-lv60-self",
"VQA": "default"
"worker_assignment": {
"ExtractImageFeature": [
"SpeechRecognition": [
"VQA": [
which we can then use the outputs to configurate thed query processor
contains our query processor built on timely dataflow.
contains the code that wraps the Python models and the query processor to distribute the VQA workflow.
is the query processor's config, which can be constructed from the query optimizer's output. It contains the worker assignment, how data should be distributed among the workers (load balancing).
cd VQA
cargo run --release --bin workflow -- -c examples/config.json -p [PIPELINE_INDEX] -i [WORKER_INDEX]
is 0/1/2/3/4, they correspond to:
- 0,1: represents the data emitter, which read the image and audio files
- 2: speech recognition
- 3: image feature extractor
- 4: VQA inference
represents the worker ID with respect to the pipeline stage.