Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve edxapp celery workers #207

Merged
merged 2 commits into from
Dec 18, 2018
Merged

Improve edxapp celery workers #207

merged 2 commits into from
Dec 18, 2018

Conversation

jmaupetit
Copy link
Contributor

Purpose

Celery workers deployed with edxapp consume too much resources by default to treat too few tasks on small instances. Moreover they are not blue-green compatible since workers still running from an old stack (previous) can treat tasks related to other stacks (current or next). This last point is a major issue.

Proposal

  • deploy celery workers if their queue is different from the default queue (except for the default queue worker)
  • suffix celery queues with the current deployment stamp to avoid tasks collision between deployed stacks

apps/edxapp/vars/all/main.yml Outdated Show resolved Hide resolved
apps/edxapp/vars/all/main.yml Outdated Show resolved Hide resolved
apps/edxapp/vars/all/main.yml Outdated Show resolved Hide resolved
@sampaccoud
Copy link
Contributor

sampaccoud commented Dec 15, 2018

Having second thoughts: actually this is not the good approach. We should keep the queues separate and differentiated.

What we want is not a single queue but a single Celery worker that consumes all the queues. It can be done with the --queues or -Q option: http://docs.celeryproject.org/en/latest/userguide/workers.html#queues

@jmaupetit
Copy link
Contributor Author

From a theoretical point-of-view, you made the point, but, at the end, the proposed solution and your alternative will lead to the same behavior: 1 consumer for all queues by default.

I need to think a bit more on how to make consumer deployment toggleable in your proposed scenario.

@sampaccoud
Copy link
Contributor

@jmaupetit but in your solution the queues are mixed in Redis. Maybe we need a setting defining what workers we want to and which queues they must consume. We could also define in this setting how many pods we want to spawn for each worker. WDYT?

@jmaupetit
Copy link
Contributor Author

Following your remarks, I've refactored the whole thing. And I think it's lovely now 😍

When waiting for a stack deployment, instead of expecting as much pods
as the number of DeploymentConfig to create, be more relevant by
calculating the total number of replicas we expect.

This prevent the wait loop from failing when deploying more or less that
one replica per DC.
The current work improves celery workers & queues management by allowing
to:

* define one or more queues to be consumed by a worker
* scale down/up celery worker number of replicas
* make celery queues stack-specific
@jmaupetit
Copy link
Contributor Author

Ready for review @sampaccoud @lunika 🙏

- *lms_low
- *lms_high_mem
replicas: 1
edxapp_celery_lms_low_priority_worker:
Copy link
Contributor

@sampaccoud sampaccoud Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought of wrapping this in a single workers variable. Then you could use a loop in the templates (I'm making the assumption that it's possible to include several objects in a k8s definition file?). Something like:

celery_workers:
  edxapp_celery_lms_high_mem_worker:
    name: "lms-high-mem"
    queues: [*lms_high_mem]
    replicas: 0
  edxapp_celery_lms_default_priority_worker:
    name: "lms-default"
    queues:
      - *lms_high
      - *lms_default
      - *lms_low
      - *lms_high_mem
    replicas: 1

Copy link
Contributor Author

@jmaupetit jmaupetit Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting but too edxapp-specific IMO. If we break the one file/one object paradigm, this will require to change many things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 after-thought: I am not so sure of what I wrote. I need to think a bit more about side-effects, but having a single generic DC template for workers might be neat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it seems that we cannot define multiple objects in a single template (for now):

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 138, in run
    res = self._execute()
  File "/usr/local/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 511, in _execute
    self._task.post_validate(templar=templar)
  File "/usr/local/lib/python2.7/dist-packages/ansible/playbook/task.py", line 260, in post_validate
    super(Task, self).post_validate(templar)
  File "/usr/local/lib/python2.7/dist-packages/ansible/playbook/base.py", line 390, in post_validate
    value = templar.template(getattr(self, name))
  File "/usr/local/lib/python2.7/dist-packages/ansible/template/__init__.py", line 525, in template
    disable_lookups=disable_lookups,
  File "/usr/local/lib/python2.7/dist-packages/ansible/template/__init__.py", line 481, in template
    disable_lookups=disable_lookups,
  File "/usr/local/lib/python2.7/dist-packages/ansible/template/__init__.py", line 717, in do_template
    res = j2_concat(rf)
  File "<template>", line 12, in root
  File "/usr/local/lib/python2.7/dist-packages/ansible/plugins/filter/core.py", line 223, in from_yaml
    return yaml.safe_load(data)
  File "/usr/local/lib/python2.7/dist-packages/yaml/__init__.py", line 93, in safe_load
    return load(stream, SafeLoader)
  File "/usr/local/lib/python2.7/dist-packages/yaml/__init__.py", line 71, in load
    return loader.get_single_data()
  File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 37, in get_single_data
    node = self.get_single_node()
  File "/usr/local/lib/python2.7/dist-packages/yaml/composer.py", line 43, in get_single_node
    event.start_mark)
ComposerError: expected a single document in the stream
  in "<unicode string>", line 6, column 1:
    apiVersion: v1
    ^
but found another document
  in "<unicode string>", line 102, column 1:
    ---
    ^

I'll postpone this improvement and roll back to my previous implementation.

jmaupetit added a commit that referenced this pull request Dec 18, 2018
As suggested by @sampaccoud in [1], service variant workers are now
listed in a single edxapp variable. By using this data structure, we are
now able to define all workers deployments in a single template per
service variant (CMS & LMS).

[1] #207 (comment)
@jmaupetit jmaupetit requested review from sampaccoud and lunika and removed request for lunika December 18, 2018 15:44
@jmaupetit jmaupetit merged commit 8b07bbf into master Dec 18, 2018
@jmaupetit jmaupetit deleted the fix-edxapp-celery-workers branch December 18, 2018 16:31
jmaupetit added a commit that referenced this pull request Dec 19, 2018
Added:

- Edxapp Celery workers are now configurable (number of replicas and queues to
  consume) (#207)

Fixed:

- Pods deployment wait loop is now more robust by relying on the number of pod
  replicas instead of the number of deployments (#207)
- Add missing ask-vault-pass option in bin/built_images script (#210)
- Prevent htpasswd file overriding when running the create htpasswd playbook
  multiple times (#211)
jmaupetit added a commit that referenced this pull request Dec 19, 2018
Added:

- Edxapp Celery workers are now configurable (number of replicas and queues to
  consume) (#207)

Fixed:

- Pods deployment wait loop is now more robust by relying on the number of pod
  replicas instead of the number of deployments (#207)
- Add missing ask-vault-pass option in bin/built_images script (#210)
- Prevent htpasswd file overriding when running the create htpasswd playbook
  multiple times (#211)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants