Skip to content
This repository has been archived by the owner on Apr 15, 2019. It is now read-only.

Orchestrate ScienceBeam tasks for multiple datasets and tools (mostly for evaluation purpose)

License

Notifications You must be signed in to change notification settings

elifesciences/sciencebeam-orchester

Repository files navigation

ScienceBeam Orchester

Configuration

An example configuration is provided in the example-config directory. Please copy it to config.

Datasets

Datasets describe where the data to be converted is coming from. In general it is describing a set of files.

Datasets are configured in: ./config/datasets, each .sh file describing one dataset.

Tools

Tools are used to convert files. Currently they configure the ScienceBeam pipeline.

Tools are configured in: ./config/tools, each .sh file describing one tool.

Run All

By default the corresponding container is started and stopped from within the sciencebeam-orchester container.

docker-compose run --rm sciencebeam-orchester ./run-all.sh convert

For an invidual dataset and conversion tool:

docker-compose run --rm sciencebeam-orchester \
  ./run-all.sh \
  --dataset pmc-1943-cc-by-sample \
  --tool grobid-tei \
  --force \
  --limit 1000 \
  --workers 10 \
  convert
docker-compose run --rm sciencebeam-orchester ./run-all.sh evaluation-report

Running individual containers

Build containers:

docker-compose up --no-start

Start:

docker-compose start sciencebeam-orchester
docker-compose start scienceparse-v2
docker-compose run --rm sciencebeam-orchester ./run.sh\
  --dataset pmc-1943 --tool scienceparse-v2 convert

About

Orchestrate ScienceBeam tasks for multiple datasets and tools (mostly for evaluation purpose)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published