Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Celery worker concurrency setting #1010

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

arbrandes
Copy link

@arbrandes arbrandes commented Feb 29, 2024

This allows the user to configure how many Celery workers are spawned independently of how many CPUs there are in the system. The default is to spawn as many workers as there are CPUs, which in some cases can consume too many resources.

The setting should be particularly useful to people running Tutor for development on Linux machines, where reducing the concurrency to "1" can reduce RAM usage significantly.

Testing

  1. Before running this branch, launch a Tutor environment and count how many celery process there are. With something like:

    pgrep celery | wc -l
    

    You should get twice the number of CPUs on the system - one set for each of LMS and CMS - plus two parent processes. (On my machine, which has 12 real cores + 12 virtual ones, the number comes out to 26.)

  2. Stop the environment, install this branch, and set:

    tutor config save --set OPENEDX_LMS_CELERY_WORKERS=1 --set OPENEDX_CMS_CELERY_WORKERS=1
    
  3. Relaunch the environment. The worker containers should be recreated.

  4. Check the number of celery processes. There should now be just 4.

It's worth checking RAM usage, too. Before, my dev environment would take up 8 or more gigabytes of RAM. After, it takes less than 3.

@arbrandes arbrandes changed the base branch from nightly to master February 29, 2024 20:43
@arbrandes arbrandes force-pushed the celery-concurrency branch 2 times, most recently from 1dd4b5a to 0382877 Compare March 1, 2024 14:34
@arbrandes
Copy link
Author

@regisb, mind taking a look?

@@ -141,7 +141,7 @@ spec:
containers:
- name: cms-worker
image: {{ DOCKER_IMAGE_OPENEDX }}
args: ["celery", "--app=cms.celery", "worker", "--loglevel=info", "--hostname=edx.cms.core.default.%%h", "--max-tasks-per-child", "100", "--exclude-queues=edx.lms.core.default"]
args: ["celery", "--app=cms.celery", "worker", "--loglevel=info", "--hostname=edx.cms.core.default.%%h", "--concurrency={{ OPENEDX_CMS_CELERY_WORKERS }}", "--max-tasks-per-child", "100", "--exclude-queues=edx.lms.core.default"]
Copy link

@dkaliberda dkaliberda Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --concurrency argument specifies the number of worker processes. This is not an ideal practice in Kubernetes environments because:

  1. Kubernetes prefers to manage scalability and replication at the container orchestration level, using the replicas field in a Deployment to manage the number of pod instances.
  2. Setting --concurrency inside a container limits the scalability to the process level inside the pod, rather than allowing Kubernetes to manage multiple pods across nodes for better fault tolerance and load distribution.
  3. It violate "one process per container" principle. This is important because with multiple processes in the same container, it is harder to troubleshoot the container because logs from different processes will be mixed together, and it is harder to manage the processes lifecycle, etc.

So, it's better to just make a hardcode--concurrency=1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all makes sense, but it might be counter-intuitive to have a configuration item that works for one deployment scenario but not another.

I mean, we have OPENEDX_CMS_UWSGI_WORKERS, and that's also configurable for Kubernetes. 🤷🏼

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it would be appropriate to add to the documentation a mention that setting --concurrency=1 for K8s is recommended not in the context of saving resources, but in the context of proper resource management. What do you think about this? It will be useful for DevOps to pay attention to this.

@@ -250,7 +250,7 @@ spec:
containers:
- name: lms-worker
image: {{ DOCKER_IMAGE_OPENEDX }}
args: ["celery", "--app=lms.celery", "worker", "--loglevel=info", "--hostname=edx.lms.core.default.%%h", "--max-tasks-per-child=100", "--exclude-queues=edx.cms.core.default"]
args: ["celery", "--app=lms.celery", "worker", "--loglevel=info", "--hostname=edx.lms.core.default.%%h", "--concurrency={{ OPENEDX_LMS_CELERY_WORKERS }}", "--max-tasks-per-child=100", "--exclude-queues=edx.cms.core.default"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same problem

@@ -158,7 +158,7 @@ services:
environment:
SERVICE_VARIANT: lms
DJANGO_SETTINGS_MODULE: lms.envs.tutor.production
command: celery --app=lms.celery worker --loglevel=info --hostname=edx.lms.core.default.%%h --max-tasks-per-child=100 --exclude-queues=edx.cms.core.default
command: celery --app=lms.celery worker --loglevel=info --hostname=edx.lms.core.default.%%h --concurrency={{ OPENEDX_LMS_CELERY_WORKERS }} --max-tasks-per-child=100 --exclude-queues=edx.cms.core.default
Copy link

@dkaliberda dkaliberda Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker-compose also provides mechanisms for managing replicas. Therefore, it is also better to make --concurrency=1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, yes. I'd be glad to review a PR that does that instead. I just need a way to reduce RAM usage for development. ;)

@@ -177,7 +177,7 @@ services:
environment:
SERVICE_VARIANT: cms
DJANGO_SETTINGS_MODULE: cms.envs.tutor.production
command: celery --app=cms.celery worker --loglevel=info --hostname=edx.cms.core.default.%%h --max-tasks-per-child 100 --exclude-queues=edx.lms.core.default
command: celery --app=cms.celery worker --loglevel=info --hostname=edx.cms.core.default.%%h --concurrency={{ OPENEDX_CMS_CELERY_WORKERS }} --max-tasks-per-child 100 --exclude-queues=edx.lms.core.default
Copy link

@dkaliberda dkaliberda Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker-compose also provides mechanisms for managing replicas. Therefore, it is also better to make --concurrency=1

@DawoudSheraz DawoudSheraz self-requested a review March 12, 2024 09:17
@DawoudSheraz
Copy link
Contributor

Not sure if it is Mac thing but I can't see anything against pgrep celery in LMS/CMS worker, using bash (both dev and local). However, I can see the max_concurrency is set to number of allocated CPUs for both LMS and CMS workers using celery --app=lms.celery inspect stats.

docs/dev.rst Outdated
--set OPENEDX_CMS_CELERY_WORKERS=1 \
--set OPENEDX_LMS_CELERY_WORKERS=1 \
--set OPENEDX_CMS_UWSGI_WORKERS=1 \
--set OPENEDX_LMS_UWSGI_WORKERS=1 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather avoid asking users to manually set these values. Instead, we should automatically default to workers=1 in development. Can we do that? For instance by overriding the celery config in development?

docs/dev.rst Outdated
--set OPENEDX_LMS_CELERY_WORKERS=1 \
--set OPENEDX_CMS_UWSGI_WORKERS=1 \
--set OPENEDX_LMS_UWSGI_WORKERS=1 \
--set ELASTICSEARCH_HEAP_SIZE=100m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: can we automatically set this value in development?

@@ -149,6 +149,11 @@ This defines the version that will be pulled from just the Open edX platform git

By default, there are 2 `uwsgi worker processes <https://uwsgi-docs.readthedocs.io/en/latest/Options.html#processes>`__ to serve requests for the LMS and the CMS. However, each worker requires upwards of 500 Mb of RAM. You should reduce this value to 1 if your computer/server does not have enough memory.

- ``OPENEDX_LMS_CELERY_WORKERS`` (default: ``"0"``)
- ``OPENEDX_CMS_CELERY_WORKERS`` (default: ``"0"``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding new configuration settings to Tutor core is a personal trigger of mine 🧨 Do we really want to make changes to the default production values? If yes, can we:

  1. propose better defaults?
  2. make these custom changes possible via a patch instead of two new configuration settings?

Copy link
Author

@arbrandes arbrandes Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand the reluctance to add new config items. It's just that in this case, it was the most straightforward way to achieve what I was after. There's precedent, too: OPENEDX_LMS_UWSGI_WORKERS is there for very similar reasons.

Regarding the defaults, I'm not actually changing them: I'm just making them explicit, where before they were implicit. (The implicit default is to scale the workers to however many CPUs you have, and that's what "0" means.)

As for using patches, I wouldn't mind except for the fact that, as mentioned above, this is just doing what OPENEDX_LMS_UWSGI_WORKERS does, except for Celery workers. If we have that configuration, I don't see why we shouldn't have this one.

All of this said, I really like the idea of changing certain things automatically for development environments, whether they have corresponding config items or not. For instance, after I issued this PR it came to my attention that Tutor's importing * from devstack.py for the development settings, and that means that we aren't using Celery workers at all! (See https://github.com/openedx/edx-platform/blob/master/lms/envs/devstack.py#L35.) So why is tutor dev even firing up workers?

In any case, the latter sounds like it warrants a separate PR. My question regarding this PR, though, is whether we do or do not want OPENEDX_LMS_CELERY_WORKERS. It might not make sense to change this in a Kubernetes setting, but I'm willing to defend that it does on any Docker deployment where you have more CPUs than you have RAM (so to speak).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this for not starting workers at all in dev mode? #1041

The question remains whether we still want to let people configure the number of Celery workers manually. (I say we let them.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the fact that we can disable workers in dev. I commented on #1041.

Let's now focus on the possibility to customize the number of celery runners. I agree that this would be a useful feature. If we really have to, we'll introduce new configuration values, but I'd like to see if we can avoid it. For instance, could we avoid that by creating a celery config file? This file would include a {{ patch("edx-platform-celery-config") }} statement. That way, we wouldn't have to create new configuration settings for every celery parameter.

@DawoudSheraz
Copy link
Contributor

@arbrandes Hi, there are a few to-be-addressed comments added by Régis. Please take a look when you get a chance. Thanks.

@DawoudSheraz DawoudSheraz requested a review from regisb April 9, 2024 06:50
This allows the user to configure how many Celery workers are spawned
independently of how many CPUs there are in the system. The default is
to spawn as many workers as there are CPUs, which in some cases can
consume too many resources.

(The setting should be particularly useful to people running Tutor for
development on Linux machines, where reducing the concurrency to "1" can
reduce RAM usage significantly.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

None yet

4 participants