This is a microservice for creating materialized versions of an analysis database, merging together spatially bound annotation and a segmentation data stored in a chunkedgraph that is frozen in time. The data is stored in a PostgreSQL database where the spatial annotations are leveraging PostGIS point types. The materialization engine can create new time locked versions periodically on a defined schedule as well as one-off versions for specific use cases.
Present functionality:
- A flask microservice as REST API endpoints for creating and querying the materialized databases.
- The backend is powered by workflows running as Celery workers, a task queue implementation used to asynchronously execute work.
This service is intended to be deployed to a Kubernetes cluster as a series of pods. Local deployment is currently best done by using docker. A docker-compose file is included that will install all the required packages and create a local PostgreSQL database and redis broker that is leveraged by the Celery workers for running tasks.
Docker compose example:
$ docker-compose build
$ docker-compose up
Alternatively one can setup a docker container running PostgreSQL database and a separate Redis container then create a python virtual env and run the following commands:
Setup a redis instance:
$ docker run -p 6379:6379 redis
Setup a Postgres database (with postgis):
$ docker run --name db -v /my/own/datadir:/var/lib/postgresql/data -e POSTGRES_PASSWORD=materialize postgis/postgis
Setup the flask microservice:
$ cd materializationengine
$ python3 -m venv mat_engine
$ source mat_engine/bin/activate
(mat_engine) $: python setup.py install
(mat_engine) $: python run.py
Start a celery worker for processing tasks. Open another terminal:
$ source mat_engine/bin/activate
(mat_engine) $ celery worker --app=run.celery --pool=prefork --hostname=worker.process@%h --queues=processcelery --concurrency=4 --loglevel=INFO -Ofair
The materialization engine runs celery workflows that create snapshots of spatial annotation data where each spatial point is linked to a segmentation id that is valid at a specific time point.
There are a few workflows that make up the materialization engine:
- Bulk Upload (Load large spatial and segmentation datasets into a PostgreSQL database)
- Ingest New Annotations (Query and insert underlying segmentation data on spatial points with missing segmentation data)
- Update Root Ids (Query and update expired root ids from the chunkedgraph between a time delta)
- Create Frozen Database (Creates a time locked database for all tables)
- Complete Workflow (Combines the Ingest New Annotations, Update Root Id and Create Frozen Workflows in one, run in series)
Distributed under the MIT license. See LICENSE
for more information.
- Fork it (https://github.com/seung-lab/MaterializationEngine/fork)
- Create your feature branch (
git checkout -b feature/fooBar
) - Commit your changes (
git commit -am 'Add some fooBar'
) - Push to the branch (
git push origin feature/fooBar
) - Create a new Pull Request