-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve edxapp celery workers #207
Conversation
Having second thoughts: actually this is not the good approach. We should keep the queues separate and differentiated. What we want is not a single queue but a single Celery worker that consumes all the queues. It can be done with the |
From a theoretical point-of-view, you made the point, but, at the end, the proposed solution and your alternative will lead to the same behavior: 1 consumer for all queues by default. I need to think a bit more on how to make consumer deployment toggleable in your proposed scenario. |
@jmaupetit but in your solution the queues are mixed in Redis. Maybe we need a setting defining what workers we want to and which queues they must consume. We could also define in this setting how many pods we want to spawn for each worker. WDYT? |
1cb8086
to
c48a13c
Compare
Following your remarks, I've refactored the whole thing. And I think it's lovely now 😍 |
When waiting for a stack deployment, instead of expecting as much pods as the number of DeploymentConfig to create, be more relevant by calculating the total number of replicas we expect. This prevent the wait loop from failing when deploying more or less that one replica per DC.
The current work improves celery workers & queues management by allowing to: * define one or more queues to be consumed by a worker * scale down/up celery worker number of replicas * make celery queues stack-specific
113e64d
to
31fca15
Compare
Ready for review @sampaccoud @lunika 🙏 |
- *lms_low | ||
- *lms_high_mem | ||
replicas: 1 | ||
edxapp_celery_lms_low_priority_worker: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you thought of wrapping this in a single workers
variable. Then you could use a loop in the templates (I'm making the assumption that it's possible to include several objects in a k8s definition file?). Something like:
celery_workers:
edxapp_celery_lms_high_mem_worker:
name: "lms-high-mem"
queues: [*lms_high_mem]
replicas: 0
edxapp_celery_lms_default_priority_worker:
name: "lms-default"
queues:
- *lms_high
- *lms_default
- *lms_low
- *lms_high_mem
replicas: 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting but too edxapp-specific IMO. If we break the one file/one object paradigm, this will require to change many things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 after-thought: I am not so sure of what I wrote. I need to think a bit more about side-effects, but having a single generic DC template for workers might be neat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it seems that we cannot define multiple objects in a single template (for now):
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 138, in run
res = self._execute()
File "/usr/local/lib/python2.7/dist-packages/ansible/executor/task_executor.py", line 511, in _execute
self._task.post_validate(templar=templar)
File "/usr/local/lib/python2.7/dist-packages/ansible/playbook/task.py", line 260, in post_validate
super(Task, self).post_validate(templar)
File "/usr/local/lib/python2.7/dist-packages/ansible/playbook/base.py", line 390, in post_validate
value = templar.template(getattr(self, name))
File "/usr/local/lib/python2.7/dist-packages/ansible/template/__init__.py", line 525, in template
disable_lookups=disable_lookups,
File "/usr/local/lib/python2.7/dist-packages/ansible/template/__init__.py", line 481, in template
disable_lookups=disable_lookups,
File "/usr/local/lib/python2.7/dist-packages/ansible/template/__init__.py", line 717, in do_template
res = j2_concat(rf)
File "<template>", line 12, in root
File "/usr/local/lib/python2.7/dist-packages/ansible/plugins/filter/core.py", line 223, in from_yaml
return yaml.safe_load(data)
File "/usr/local/lib/python2.7/dist-packages/yaml/__init__.py", line 93, in safe_load
return load(stream, SafeLoader)
File "/usr/local/lib/python2.7/dist-packages/yaml/__init__.py", line 71, in load
return loader.get_single_data()
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 37, in get_single_data
node = self.get_single_node()
File "/usr/local/lib/python2.7/dist-packages/yaml/composer.py", line 43, in get_single_node
event.start_mark)
ComposerError: expected a single document in the stream
in "<unicode string>", line 6, column 1:
apiVersion: v1
^
but found another document
in "<unicode string>", line 102, column 1:
---
^
I'll postpone this improvement and roll back to my previous implementation.
As suggested by @sampaccoud in [1], service variant workers are now listed in a single edxapp variable. By using this data structure, we are now able to define all workers deployments in a single template per service variant (CMS & LMS). [1] #207 (comment)
4955cfa
to
31fca15
Compare
Added: - Edxapp Celery workers are now configurable (number of replicas and queues to consume) (#207) Fixed: - Pods deployment wait loop is now more robust by relying on the number of pod replicas instead of the number of deployments (#207) - Add missing ask-vault-pass option in bin/built_images script (#210) - Prevent htpasswd file overriding when running the create htpasswd playbook multiple times (#211)
Added: - Edxapp Celery workers are now configurable (number of replicas and queues to consume) (#207) Fixed: - Pods deployment wait loop is now more robust by relying on the number of pod replicas instead of the number of deployments (#207) - Add missing ask-vault-pass option in bin/built_images script (#210) - Prevent htpasswd file overriding when running the create htpasswd playbook multiple times (#211)
Purpose
Celery workers deployed with
edxapp
consume too much resources by default to treat too few tasks on small instances. Moreover they are not blue-green compatible since workers still running from an old stack (previous) can treat tasks related to other stacks (current or next). This last point is a major issue.Proposal