Investigate implementation of "background tasks" #1097

SchrodingersGat · 2020-11-03T05:03:31Z

There will most likely in the future be a requirement for processing "background" tasks. These may either be:

One-off long-running tasks (e.g. user requests deletion of 1,000 records)
Periodic / scheduled tasks

Some of the features implemented in #1063 almost needed this until I worked out a way to speed them up. But, with more complexity being built into InvenTree, it may not be long before a task takes too long to process and the user experience really degrades.

Also, with integrations into other services (DigiKey / OpenCart / Shopify / etc) we'll need something in the background running API calls, etc.

Options

Front-runner options seem to be:

Celery

https://docs.celeryproject.org/en/stable/django/first-steps-with-django.html

Django Background Tasks

https://django-background-tasks.readthedocs.io/en/latest/

SchrodingersGat · 2020-11-03T05:04:11Z

Could also leverage this concept to write "janitor" functions which run in the background at fixed intervals to make sure the business-logic rules are being observed?

eeintech · 2020-11-03T12:17:22Z

I read that Django 3.x was pushing towards asynchronous routines, this seems to be the current doc:
https://docs.djangoproject.com/en/3.1/topics/async/

Maybe actually relying on core Django feature could be enough? Would have to investigate.

My main grief with Celery is that you need to concurrently run a message broker (eg. Redis, RabbitMQ) and while it's not so hard to install, it adds extra setup, configuration and system running processes. It's also a potential point of failure. I think it would had more difficulties for many users to run properly and profit from this feature.

SchrodingersGat · 2020-11-03T21:08:44Z

I agree completely with your appraisal of Celery. I had not heard of it before, but after looking through the docs it seems like it would add a lot of difficulty to installing and configuring InvenTree. Worse if something goes wrong, it's an opaque background process that could be very hard to debug (especially for novice users).

django-background-tasks seemed a bit more friendly - it's python only, and it stores the pending tasks in the same main database. The only quirk there is working out how to process the tasks automatically (it requires a separate command to check for tasks every few seconds).

If the django async features meet the requirements here, I would 100% rather use that, instead of relying on an external dependency.

eeintech · 2020-11-03T22:16:34Z

I have not "played" with the async features yet, should add it to my todo list 😄

amishHammer · 2020-11-05T19:40:01Z

IMHO Celery is the way to go. The reliability of the broker is a mute point, very similar to the database. When deploying into production most people will be deploying to a real external DB service (MySQL/Postgresql/etc).

While django-background-tasks seems like a nice simple option for some items it lacks certain things that should be considered for wide scale/large adoption of InvenTree. These include back end scale out for background/async tasks to additional servers, scheduled tasks.

django_celery_beat seems like a nice way to add a django admin interface to manage scheduled tasks.

Things you could do with celery:

Accept messages from the broker from an IIoT device that is used to say scan built products and automatically update the build and assign output
Emit messages to other systems when events happen: Build created/stock received

Other things to thing about wtf django-background-tasks:

It's relatively unmaintained last release was ~2 years ago
Last commit was 11 months ago

There are a few other options but the go to is Celery.

It may be possible to make the celery dependency an extension that other extensions can be dependent on if they need it.

SchrodingersGat · 2020-11-05T23:19:21Z

Having spent some more time reading about this I would tend to agree that Celery is the way to go. Your suggestions for potential uses are very nice too!

I would like to find a way to ensure that it can be easily installed and spun up by even "novice" users.

Perhaps a script to run a development server to get people going, and then comprehe instructions on how to setup a production environment.

amishHammer · 2020-11-06T16:25:52Z

I just created #1113 which adds the base framework for Celery.

eeintech · 2020-11-06T17:16:34Z

When deploying into production most people will be deploying to a real external DB service (MySQL/Postgresql/etc).

@amishHammer I'm a little concerned for users NOT wanting to deploy production setup and running InvenTree locally, on all available platforms/OSes. In #1113 you seem to point to RabbitMQ as message broker, any idea if it is cross-platform?
How would a single user with no experience whatsoever in backend installation and relying solely on SQLite database handle the installation?

lookme2 · 2020-11-06T18:58:49Z

I agree with @eeintech

amishHammer · 2020-11-06T19:44:39Z

@eeintech

Both Redis and RabbitMQ are available for almost every Linux distro via system packages, there are also docker images for both packages. Both of these are pervasive in their deployment in the wild.

I have provided tasks for installing the required system packages the same as has been done for MySQL and PostgreSQL.

As of this point in time I see no need for core InvenTree background tasks, so no broker is required, you can run the Django webserver without a broker running and InvenTree will still function.

I will be committing some updates to the docs project as well to document this and the installation of optional dependencies.

eeintech · 2020-11-06T21:18:54Z

As of this point in time I see no need for core InvenTree background tasks, so no broker is required, you can run the Django webserver without a broker running and InvenTree will still function.

Right now I agree. In the future, when "background tasks" are put in place, I believe it would be important to keep this option open (eg. keep no-broker needed compatible settings). The downside is that instead of creating a new Celery task, InvenTree should still be able to run a synchronized routine instead.

During installation/deployment user should have the choice to:

run "easy-to-install" InvenTree environment with synchronized tasks (currently it is only that guy 😄)
run "experienced-users-install" InvenTree environment with "asynchronized" tasks (the future!)

The former is for single/low-count users with no to low-count plugins, the later is for multiple power users with many plugins.

I'm definitely including myself to the second category but I do believe the first category of users would much rather keep their setup as minimal as possible.

Your effort to enable Celery is great (I personally love it as a fan of Celery) but IMO we should keep in mind that in the InvenTree settings, one should have the choice to define a broker or not (which will drive the type of task/routine, eg. sync vs async). Adding packages/dependencies shouldn't be the problem but when we start creating task/routine relying solely on Celery and the message broker, this is where it can become problematic.

amishHammer · 2020-11-07T03:25:08Z

The problem with having 2 long running task subsystems is code duplication and the added complexity maintaining both systems side by. If that could be adequately abstracted that may make that simpler,

As far as complexity, at least on Ubuntu its as simple as apt-get install rabbitmq-server the default configuration is enough to make celery work. Personally I don't find this a huge leap but I understand wanting to keep it minimal.

rcludwick · 2021-01-11T02:00:49Z

As far as complexity, at least on Ubuntu its as simple as apt-get install rabbitmq-server

I've been running Django Q for a while, and it seems pretty functional, and the only requirement it has is a database. It's even simpler. No apt-get install required!

That said, if you want, it can run with a message broker, and supports several. https://github.com/Koed00/django-q

Also rabbitmq have care and feeding cycles that are non-trivial to debug and maintain. If the goal is to get this running in a simple docker container while allowing the user to expand if he wants, then I much prefer Django Q.

SchrodingersGat · 2021-01-15T06:15:37Z

django-q looks interesting, I'll have to investigate that

eeintech · 2021-03-10T16:27:33Z

I would also vote in favor of Django Q if it can run without message broker as it relies only on Python dependencies, much less to worry about!

SchrodingersGat · 2021-03-10T23:22:42Z

I've recently been looking more seriously into django-q

The tasks I wish to use this for right now are:

Checking for InvenTree updates (every week or so)
Sending emails (e.g. for new users, password reset, etc)

It looks "fit for purpose", there are a couple of outstanding questions I need to answer first. These are probably "easy" questions for someone with more system-admin experience than myself.

Essentially the question is how to start the django-q process, and ensure that it is running. Also, when the server is stopped, the django-q process should be brought down too.

Currently you have to "manually" start the cluster with python manage.py qcluster docs. This starts the cluster process running.

Ideally I want this process to be run automatically when the server is started, and brought down when the server stops.

Admittedly it is possible that InvenTree is now at the point of complexity where it needs to be managed externally (e.g. systemctl) and we just have to write some more comprehensive documentation..

I would also like to continue to support windows, although with the ability to use WSL under Windows (which is how I dev) this is less of an issue.

Perhaps we have to move to linux only, and provide documentation on how to setup a systemd script for setup / teardown.

For those with experience using django-q - is this the "best" way to approach this? Any other suggestions?

rcludwick · 2021-03-11T22:52:26Z

I like celery for massively scalability, but it doesn't make a lot of sense for lightweight tasking, since you now have to do a care and feeding of a RabbitMQ server. You can still use RabbitMQ in Django Q, but it doesn't force you into it. You can use the DB as the queue, and because you're doing that, you do get scalability instantly as long as you can share a DB connection.

Most celery workers I've seen in production were still hitting a single database, even though they were split across multiple worker machines. So the benefits of separating messages into a message broker, didn't really gain anything other than administration of RabbitMQ. And if you still need a database, you might just as well use Django Q with the ORM.

I'm assuming that Django Q is just multiprocessing the workers, though I haven't looked to be honest. I would expect that to work just fine on windows, and since it's all python, there shouldn't be any real surprises to keep it from running on windows, I think.

In practice I had more issues with Celery running on windows (cuz I had to develop on windows, cuz you know, security). Celery had more dependencies that broke more often. They ran just fine on linux.

I was thinking if you wanted to combine them easily you would use multiprocess to launch 4 server instances (uwsgi/gunicorn etc) and one django q cluster manager process... then the django q cluster manager would launch it's own workers. From there you're just monitoring the processes, and kill any that don't get cleaned up.

Or just use supervisord (which is python). Supervisord works with cygwin, or it looks like you can write a windows-service to do it, which may be better anyway to give native windows support.

Inside the docker container if you use it, use supervisord to start the web instance (or 4 with an nginx proxy running in front of it) and the django Q workers, and then provide a sample supervisord.conf for people that don't want to use docker. If you're supporting windows, I'd say it's the responsibility of a package installer to install the windows service easily, or a script that can do it with Administrator privs and then let windows be responsible for starting and restarting it. Apparently there's windows support for running python in a windows service, I haven't tried it though.

My opinion though is that supervisord > systemctl since it can handle multiple services easily and bring them all up and down. But having multiple options for people is not a bad idea either.

SchrodingersGat · 2021-03-12T00:32:08Z

@rcludwick thanks for the input here. Sounds like you have a lot of experience in this area - I'm coming from an embedded systems background, so having do a lot of reading to keep maintaining and scaling this project :)

I might throw a few more questions your way - thanks for sharing your knowledge!

amishHammer mentioned this issue Nov 6, 2020

Celery integration #1113

Closed

SchrodingersGat mentioned this issue Mar 14, 2021

Django q #1398

Merged

SchrodingersGat closed this as completed in #1398 Apr 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate implementation of "background tasks" #1097

Investigate implementation of "background tasks" #1097

SchrodingersGat commented Nov 3, 2020

SchrodingersGat commented Nov 3, 2020

eeintech commented Nov 3, 2020

SchrodingersGat commented Nov 3, 2020

eeintech commented Nov 3, 2020

amishHammer commented Nov 5, 2020

SchrodingersGat commented Nov 5, 2020

amishHammer commented Nov 6, 2020

eeintech commented Nov 6, 2020 •

edited

Loading

lookme2 commented Nov 6, 2020

amishHammer commented Nov 6, 2020

eeintech commented Nov 6, 2020 •

edited

Loading

amishHammer commented Nov 7, 2020

rcludwick commented Jan 11, 2021 •

edited

Loading

SchrodingersGat commented Jan 15, 2021

eeintech commented Mar 10, 2021

SchrodingersGat commented Mar 10, 2021

rcludwick commented Mar 11, 2021 •

edited

Loading

SchrodingersGat commented Mar 12, 2021

Investigate implementation of "background tasks" #1097

Investigate implementation of "background tasks" #1097

Comments

SchrodingersGat commented Nov 3, 2020

Options

Celery

Django Background Tasks

SchrodingersGat commented Nov 3, 2020

eeintech commented Nov 3, 2020

SchrodingersGat commented Nov 3, 2020

eeintech commented Nov 3, 2020

amishHammer commented Nov 5, 2020

SchrodingersGat commented Nov 5, 2020

amishHammer commented Nov 6, 2020

eeintech commented Nov 6, 2020 • edited Loading

lookme2 commented Nov 6, 2020

amishHammer commented Nov 6, 2020

eeintech commented Nov 6, 2020 • edited Loading

amishHammer commented Nov 7, 2020

rcludwick commented Jan 11, 2021 • edited Loading

SchrodingersGat commented Jan 15, 2021

eeintech commented Mar 10, 2021

SchrodingersGat commented Mar 10, 2021

rcludwick commented Mar 11, 2021 • edited Loading

SchrodingersGat commented Mar 12, 2021

eeintech commented Nov 6, 2020 •

edited

Loading

eeintech commented Nov 6, 2020 •

edited

Loading

rcludwick commented Jan 11, 2021 •

edited

Loading

rcludwick commented Mar 11, 2021 •

edited

Loading