Celery runner #235

Kirill888 · 2017-05-15T23:55:49Z

Overview

New executor that uses Celery (with Redis as broker and data backend).

This provides an alternative to current setup (dask.distributed). Problem with using dask.distributed is that it requires that tasks are idempotent, since it will sometimes schedule the same task in parallel on different nodes. With many tasks doing I/O this creates problems.

Celery in comparison has a much simpler execution model, and doesn't have same constraints.

Redis backend

Celery supports a number of backends, of them two are fully supported: RabbitMQ and Redis. I have picked Redis as it is the simplest to get running without root access (NCI environment)

data_task_options

Adding celery option to --executor command line, same host:port argument is used. Celery executor will connect to Redis instance at a given address, if address is localhost and Redis is not running, it will be launched for the duration of the execution. Workers don't get launched however, so in most cases the app will stall until workers are added to the processing pool (see datacube-worker)

$HOME/.datacube-redis contains redis password, if this file doesn't exist it will be created with a randomly generated password when launching Redis server.

Also adding executor alias dask to be the same as distributed. However now that we have 2 distributed backends we should probably favour dask as a name for dask.distributed backend.

datacube-worker

New app datacube-worker was added to support launching workers in either celery or dask mode. It accepts the same --executor option as the task app.

coveralls · 2017-05-16T00:09:43Z

Coverage decreased (-1.6%) to 79.787% when pulling 5464640 on celery_runner into cdff103 on develop.

coveralls · 2017-05-16T00:39:12Z

Coverage decreased (-1.6%) to 79.787% when pulling 7b41e17 on celery_runner into cdff103 on develop.

Convenience method for interactive development/debugging.

Sample app that uses executor to run tasks in parallel

Currently one can not set redis address on a command line, but it can be set via environment variable `REDIS=redis://<host>:>port>/<db>`

Redis server IP/port can be configured from command line now.

Also shutdowns celery workers and redis when done, including redis-server as part of the environment.

When not supplied on a command line should be use one from the config, instead it was setting to None.

changed to IOError

`datacube-worker` accepts the same `--executor` option as task apps and launches appropriate worker task: `celery::redis` or `dask::distributed` based worker.

This one launches worker and app on a local machine only

Allowing remote connections to redis without password for testing, for now

when launching `datacube-worker` allow `dask` as an alias for `distributed`

Also adding shell script for launching on pbs

2 spaces between functions

- gen_password - slurp content of a file - write_user_secret_file function to create files that are readable by user only

By default now redis server will be launched with a randomly generated password that will be stored in user home directory in a text file only user has permissions to read. Celery workers load password from that file, and so need to have access to the home directory (ok on NCI) but also need to be launched after the file was created. Password is generated once and then re-used unless user manually deletes the file where it is stored, `.datacube-redis` at the moment.

in this case IOError should be enough

removed `params` options, just use `key=value` syntax. For redis config options that have `-` in the name replace it with `_` to fit python syntax constraints.

also updated setup.py to include celery+redis dependency.

conda doesn't have celery 4 yet.

coveralls · 2017-05-16T01:06:32Z

Coverage decreased (-1.6%) to 79.787% when pulling 76feeaa on celery_runner into cdff103 on develop.

coveralls · 2017-05-16T03:18:08Z

Coverage decreased (-1.6%) to 79.787% when pulling 61d913b on celery_runner into 4bfdced on develop.

…celery_runner

andrewdhicks · 2017-05-29T04:10:33Z

datacube/_celery_runner.py

+                # If no change detected sleep for a bit
+                # TODO: this is sub-optimal, not sure what other options are
+                #       though?
+                sleep(0.1)


It is a shame that celery.ResultSet doesn't have a collect() function like AsyncResult http://docs.celeryproject.org/en/latest/reference/celery.result.html#celery.result.AsyncResult.collect

andrewdhicks · 2017-05-29T04:13:35Z

datacube/ui/click.py

 }

+EXECUTOR_TYPES['dask'] = EXECUTOR_TYPES['distributed']  # Add alias "dask" for distributed


This could make things more confusing, as dask also has synchronous, multi-threaded and multi-process schedulers/executors.

Ok, but 'distributed' is too generic also as celery is also "distributed" across machines, and 'dask[.-_]distributed' is hard to type and to remember separator token.

Good point!

andrewdhicks · 2017-05-29T04:18:00Z

datacube/ui/task_app.py

+def wrap_task(f, *args, **kwargs):
+    'turn function `f(task, *args, **kwargs)` into `g(task)` in pickle-able fashion'
+    return functools.partial(_wrap_impl, f, args, kwargs)
+


We should aim for PEP 257 with docstrings - Triple quotes are used even though the string fits on one line. This makes it easy to later expand it.

addressing code review, also deleted unused import

chartostring now returns decoded string always, so numpy.char.decode step is no longer needed. Code should still work with older lib.

Two things changed in 1.2.8 - chartostring - cython version dependency cython thing somehow broke compliance checkers, it can not load any checkers as they fail with ``` DistributionNotFound: The 'cython>=0.19' distribution was not found and is required by netCDF4 ``` in `load_all_available_checkers`, so no checkers are loaded which then asserts in `run` method.

coveralls · 2017-05-29T07:46:51Z

Coverage decreased (-0.6%) to 81.387% when pulling 0ce2711 on celery_runner into ee53c48 on develop.

Adding new function `netcdf_extract_string` that takes care of possible ways strings can be stored in netcdf. This fixes test failures when using 1.3.8 version that switched to returning unicode from `chartostring` method.

cython is a compulsory dependency of netcdf4 starting from 1.2.8, but since it's a build-time dependency conda doesn't install it, netdcf4 is compiled externally. Lack of cython doesn't break netcdf4 library, but it does break compliance checker, because it uses `pkg_resource` lib to check if dependencies for plugins are installed and `pkg_resource` reports that cython is needed to import `netcdf4` which isn't true.

coveralls · 2017-05-30T00:35:39Z

Coverage decreased (-0.6%) to 81.383% when pulling 5f0fa4b on celery_runner into ee53c48 on develop.

woodcockr · 2017-05-30T01:05:27Z

Hi Folks,
I've been watching this branch with interest. Looks good, I like the switching of different execution models. When you are ready would love a chat/doc about how this design all comes together and where you might be heading next to help with some of our future contributions.
Thanks for the effort!

Kirill888 · 2017-05-30T02:45:29Z

Hi Rob,

thank you for feedback. In a very short term we are hoping to have this branch merged in it's current state. Changes are backwards compatible with processing apps we currently have and we will likely leave them be as they are for now. Longer term I think we can all agree that writing "task apps" can be a lot simpler, and you can consider work in this branch as an exploration of pain points. I do not claim to address any of them just yet.

Ideally I'd like us to move away from writing custom launch scripts with hard-coded environment setup in bash and using qsub on a command-line, I think all this logic can be done in python and re-used by all processing apps with an import and a decorator.

I want to be able to write processing app that can be easily tested locally, and then without writing any scripts deployed on NCI, or any other compute provider we might support in the future.

  source activate my-env
  # run locally to test
  ./my-processing-app.py --my-app-option1 --my-option2=small-dataset

  # run on a raijin using qsub under the hood
  ./my-processing-app.py --pbs="num-nodes=10,time=1h" --my-app-option1 --my-option2=large-dataset

  # run on a AWS using dockers or whatever 
  ./my-processing-app.py --aws="num-nodes=10,region=oregon" --my-app-option1 --my-option2=large-dataset

I will be away in June and everyone else have their plate full with higher priority tasks as it is, so I do not expect any more work to be done in that direction for a while.

also removed netcdf4 peg in `setup.py`, should have done it in the last commit.

coveralls · 2017-05-30T04:09:31Z

Coverage decreased (-0.2%) to 81.851% when pulling 30bcee5 on celery_runner into ee53c48 on develop.

woodcockr · 2017-05-30T05:12:36Z

Great. That is exactly we wanted to achieve with the Execution Engine - write your app, run it on any of the target compute resource environments. preferably with it figuring out the best way to divide up the tasking.

I've made a note in our planning to link up with this once we clear the current development tasks.

somewhat higher coverage for executor classes.

coveralls · 2017-05-30T06:52:34Z

Coverage increased (+0.1%) to 82.123% when pulling 543043f on celery_runner into ee53c48 on develop.

running out of disk on travis test machines, see if this helps

To be feature compatible with dask, celery and serial executor. Now you can submit lambda's and inner functions not just top-level functions.

coveralls · 2017-05-31T02:27:50Z

Coverage increased (+0.1%) to 82.146% when pulling a9119f6 on celery_runner into ee53c48 on develop.

Fixing `ResourceWarning` (file handle leak) in `read_documents`

coveralls · 2017-05-31T04:24:14Z

Coverage increased (+0.1%) to 82.136% when pulling 7d95ae8 on celery_runner into ee53c48 on develop.

Still running out of disk on travis, conda clean should happen after setting up agdc environment.

coveralls · 2017-05-31T04:51:24Z

Coverage increased (+0.1%) to 82.136% when pulling 1f54ee6 on celery_runner into ee53c48 on develop.

Fixing pylint complaints, also minor flake8 formatting corrections.

coveralls · 2017-05-31T05:09:02Z

Coverage increased (+0.1%) to 82.136% when pulling eeda1f6 on celery_runner into ee53c48 on develop.

omad

Looks good enough to merge. Since Kirril is away for the rest of June I'm not going to let this wait any longer.

Kirill888 added 23 commits May 16, 2017 10:49

Adding __repr__ to some executor classes

3541ef7

Convenience method for interactive development/debugging.

Sample task_app

ea75bfe

Sample app that uses executor to run tasks in parallel

Adding celery based executor

c8c096d

Currently one can not set redis address on a command line, but it can be set via environment variable `REDIS=redis://<host>:>port>/<db>`

Allow command line configuration or celery

93fa39c

Redis server IP/port can be configured from command line now.

pep8 fixes for dummy task app

f7be8e5

Adding celery+redis to travis env

12e49a7

Celery runner launches redis when needed

2a3d63a

Also shutdowns celery workers and redis when done, including redis-server as part of the environment.

Fixing num_tasks argument handling

a0c3a98

When not supplied on a command line should be use one from the config, instead it was setting to None.

Fixing code check problems

6a5d492

Python 2.6 doesn't have ConnectionError

236bc1d

changed to IOError

Adding worker launcher python app

65b1291

`datacube-worker` accepts the same `--executor` option as task apps and launches appropriate worker task: `celery::redis` or `dask::distributed` based worker.

Adding test script for celery_runner

58ab0ea

This one launches worker and app on a local machine only

Changed redis config

c1786eb

Allowing remote connections to redis without password for testing, for now

Adding dask as an alias for distributed

7135e54

when launching `datacube-worker` allow `dask` as an alias for `distributed`

Moved dummy task app

cf75f47

Also adding shell script for launching on pbs

minor style correction

71a22fa

2 spaces between functions

Set executable flags on launch_pbs shell script

62e1d1c

Adding some util functions

253b0ea

- gen_password - slurp content of a file - write_user_secret_file function to create files that are readable by user only

Fixing pylint complaints

259d0a4

in this case IOError should be enough

Fixing pylint complaints

e1984c7

removed `params` options, just use `key=value` syntax. For redis config options that have `-` in the name replace it with `_` to fit python syntax constraints.

Celery version constraint

6a20027

also updated setup.py to include celery+redis dependency.

Using pip for latest version of celery

76feeaa

conda doesn't have celery 4 yet.

Kirill888 force-pushed the celery_runner branch from 7b41e17 to 76feeaa Compare May 16, 2017 00:50

Kirill888 changed the title ~~[almost ready] Celery runner~~ Celery runner May 16, 2017

Merge branch 'develop' into celery_runner

61d913b

Merge branch 'develop' of github.com:opendatacube/datacube-core into …

b07e744

…celery_runner

andrewdhicks reviewed May 29, 2017

View reviewed changes

Kirill888 added 3 commits May 29, 2017 14:52

Fixing formatting for doc strings

3e10d2a

addressing code review, also deleted unused import

Fix for newer version of netCDF4 python wrapper

2e1c54e

chartostring now returns decoded string always, so numpy.char.decode step is no longer needed. Code should still work with older lib.

Kirill888 added 2 commits May 30, 2017 09:36

String handling for netcdf variable

bb7c6f8

Adding new function `netcdf_extract_string` that takes care of possible ways strings can be stored in netcdf. This fixes test failures when using 1.3.8 version that switched to returning unicode from `chartostring` method.

Adding more executor tests

30bcee5

also removed netcdf4 peg in `setup.py`, should have done it in the last commit.

More executor tests

543043f

somewhat higher coverage for executor classes.

Kirill888 added 2 commits May 30, 2017 16:59

Telling conda to cleanup before running tests

d02ba07

running out of disk on travis test machines, see if this helps

Switching parallel executor to use cloudpickle unless disabled

a9119f6

To be feature compatible with dask, celery and serial executor. Now you can submit lambda's and inner functions not just top-level functions.

Making sure files are closed

7d95ae8

Fixing `ResourceWarning` (file handle leak) in `read_documents`

Moved conda clean command

1f54ee6

Still running out of disk on travis, conda clean should happen after setting up agdc environment.

file is no good name on py27, file->handle

eeda1f6

Fixing pylint complaints, also minor flake8 formatting corrections.

omad approved these changes Jun 6, 2017

View reviewed changes

omad merged commit 12cbe5a into develop Jun 6, 2017

Kirill888 deleted the celery_runner branch July 12, 2017 03:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Celery runner #235

Celery runner #235

Kirill888 commented May 15, 2017 •

edited

coveralls commented May 16, 2017

coveralls commented May 16, 2017

coveralls commented May 16, 2017

coveralls commented May 16, 2017

andrewdhicks May 29, 2017

andrewdhicks May 29, 2017

Kirill888 May 29, 2017

andrewdhicks May 29, 2017

andrewdhicks May 29, 2017

coveralls commented May 29, 2017

coveralls commented May 30, 2017

woodcockr commented May 30, 2017

Kirill888 commented May 30, 2017

coveralls commented May 30, 2017

woodcockr commented May 30, 2017

coveralls commented May 30, 2017

coveralls commented May 31, 2017

coveralls commented May 31, 2017

coveralls commented May 31, 2017

coveralls commented May 31, 2017

omad left a comment

		}

		EXECUTOR_TYPES['dask'] = EXECUTOR_TYPES['distributed'] # Add alias "dask" for distributed

Celery runner #235

Celery runner #235

Conversation

Kirill888 commented May 15, 2017 • edited

Overview

Redis backend

data_task_options

datacube-worker

coveralls commented May 16, 2017

coveralls commented May 16, 2017

coveralls commented May 16, 2017

coveralls commented May 16, 2017

andrewdhicks May 29, 2017

Choose a reason for hiding this comment

andrewdhicks May 29, 2017

Choose a reason for hiding this comment

Kirill888 May 29, 2017

Choose a reason for hiding this comment

andrewdhicks May 29, 2017

Choose a reason for hiding this comment

andrewdhicks May 29, 2017

Choose a reason for hiding this comment

coveralls commented May 29, 2017

coveralls commented May 30, 2017

woodcockr commented May 30, 2017

Kirill888 commented May 30, 2017

coveralls commented May 30, 2017

woodcockr commented May 30, 2017

coveralls commented May 30, 2017

coveralls commented May 31, 2017

coveralls commented May 31, 2017

coveralls commented May 31, 2017

coveralls commented May 31, 2017

omad left a comment

Choose a reason for hiding this comment

Kirill888 commented May 15, 2017 •

edited