boris is a distributed processing tool that uses "serverless" cloud infrastructure to run programs in parallel. Currently AWS is the only supported provider.
boris borrows heavily from lithops implementation.
boris basically executes a function with 1 or more parameter combinations. This makes it a good candidate for hyper-parameter optimization tasks like grid-searches or training many machine-learning models in parallel.
To get started with boris, run the appropriate make
command to build and deploy your cloud resources. Later, we will
invoke a function using this infrastructure.
Note: The AWS backend requires the SAM executable. You can download it here (Linux) (macOS)
Note: The AWS backend requires an ECR image repository.
make build_aws
make build_deploy
boris invokes a python function multiple times – each time with different parameter values. The target function can have dependencies on builtin libraries or 3rd party libraries.
import time
def add(data):
time.sleep(1)
return data["x"] + data["y"]
Create an Executor
instance from a boris configuration object. Call the executor's map
method to begin execution.
boris will package your function and its dependencies, upload it to storage and start function execution on the
configured compute backend.
import boris
config = boris.Config(
backend=boris.Backend.Aws,
aws_secret_access_key="..."
)
bex = boris.Executor(config=config)
args = [
{"x": 1, "y": 1},
{"x": 2, "y": 2},
{"x": 3, "y": 3}
]
futures = bex.map(add, *args)
futures[0].result() # 2
Todo: describe flow
- Libraries that use special initialization logic (eg. django) will likely cause boris to fail.
- To prevent errors when packaging functions that reference libs with c-extensions, boris uses the following versions.
package | version |
---|---|
cloudpickle | 1.6.0 |
joblib | 0.17.0 |
numpy | 1.19.4 |
pandas | 1.1.4 |
psycopg2-binary | 2.8.6 |
pydantic | 1.7.3 |
python-dateutil | 2.8.1 |
pytz | 2020.4 |
scikit-learn | 0.23.2 |
scipy | 1.5.4 |
six | 1.15.0 |
threadpoolctl | 2.1.0 |
- 3 Lambda functions per python version
- 2 S3 buckets