Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate implementation of "background tasks" #1097

Closed
SchrodingersGat opened this issue Nov 3, 2020 · 18 comments · Fixed by #1398
Closed

Investigate implementation of "background tasks" #1097

SchrodingersGat opened this issue Nov 3, 2020 · 18 comments · Fixed by #1398

Comments

@SchrodingersGat
Copy link
Member

There will most likely in the future be a requirement for processing "background" tasks. These may either be:

  • One-off long-running tasks (e.g. user requests deletion of 1,000 records)
  • Periodic / scheduled tasks

Some of the features implemented in #1063 almost needed this until I worked out a way to speed them up. But, with more complexity being built into InvenTree, it may not be long before a task takes too long to process and the user experience really degrades.

Also, with integrations into other services (DigiKey / OpenCart / Shopify / etc) we'll need something in the background running API calls, etc.

Options

Front-runner options seem to be:

Celery

https://docs.celeryproject.org/en/stable/django/first-steps-with-django.html

Django Background Tasks

https://django-background-tasks.readthedocs.io/en/latest/

@SchrodingersGat
Copy link
Member Author

Could also leverage this concept to write "janitor" functions which run in the background at fixed intervals to make sure the business-logic rules are being observed?

@eeintech
Copy link
Contributor

eeintech commented Nov 3, 2020

I read that Django 3.x was pushing towards asynchronous routines, this seems to be the current doc:
https://docs.djangoproject.com/en/3.1/topics/async/

Maybe actually relying on core Django feature could be enough? Would have to investigate.

My main grief with Celery is that you need to concurrently run a message broker (eg. Redis, RabbitMQ) and while it's not so hard to install, it adds extra setup, configuration and system running processes. It's also a potential point of failure. I think it would had more difficulties for many users to run properly and profit from this feature.

@SchrodingersGat
Copy link
Member Author

I agree completely with your appraisal of Celery. I had not heard of it before, but after looking through the docs it seems like it would add a lot of difficulty to installing and configuring InvenTree. Worse if something goes wrong, it's an opaque background process that could be very hard to debug (especially for novice users).

django-background-tasks seemed a bit more friendly - it's python only, and it stores the pending tasks in the same main database. The only quirk there is working out how to process the tasks automatically (it requires a separate command to check for tasks every few seconds).

If the django async features meet the requirements here, I would 100% rather use that, instead of relying on an external dependency.

@eeintech
Copy link
Contributor

eeintech commented Nov 3, 2020

I have not "played" with the async features yet, should add it to my todo list 😄

@amishHammer
Copy link

IMHO Celery is the way to go. The reliability of the broker is a mute point, very similar to the database. When deploying into production most people will be deploying to a real external DB service (MySQL/Postgresql/etc).

While django-background-tasks seems like a nice simple option for some items it lacks certain things that should be considered for wide scale/large adoption of InvenTree. These include back end scale out for background/async tasks to additional servers, scheduled tasks.

django_celery_beat seems like a nice way to add a django admin interface to manage scheduled tasks.

Things you could do with celery:

  • Accept messages from the broker from an IIoT device that is used to say scan built products and automatically update the build and assign output
  • Emit messages to other systems when events happen: Build created/stock received

Other things to thing about wtf django-background-tasks:

  • It's relatively unmaintained last release was ~2 years ago
  • Last commit was 11 months ago

There are a few other options but the go to is Celery.

It may be possible to make the celery dependency an extension that other extensions can be dependent on if they need it.

@SchrodingersGat
Copy link
Member Author

Having spent some more time reading about this I would tend to agree that Celery is the way to go. Your suggestions for potential uses are very nice too!

I would like to find a way to ensure that it can be easily installed and spun up by even "novice" users.

Perhaps a script to run a development server to get people going, and then comprehe instructions on how to setup a production environment.

@amishHammer
Copy link

I just created #1113 which adds the base framework for Celery.

@eeintech
Copy link
Contributor

eeintech commented Nov 6, 2020

When deploying into production most people will be deploying to a real external DB service (MySQL/Postgresql/etc).

@amishHammer I'm a little concerned for users NOT wanting to deploy production setup and running InvenTree locally, on all available platforms/OSes. In #1113 you seem to point to RabbitMQ as message broker, any idea if it is cross-platform?
How would a single user with no experience whatsoever in backend installation and relying solely on SQLite database handle the installation?

@lookme2
Copy link

lookme2 commented Nov 6, 2020

I agree with @eeintech

@amishHammer
Copy link

@eeintech

Both Redis and RabbitMQ are available for almost every Linux distro via system packages, there are also docker images for both packages. Both of these are pervasive in their deployment in the wild.

I have provided tasks for installing the required system packages the same as has been done for MySQL and PostgreSQL.

As of this point in time I see no need for core InvenTree background tasks, so no broker is required, you can run the Django webserver without a broker running and InvenTree will still function.

I will be committing some updates to the docs project as well to document this and the installation of optional dependencies.

@eeintech
Copy link
Contributor

eeintech commented Nov 6, 2020

As of this point in time I see no need for core InvenTree background tasks, so no broker is required, you can run the Django webserver without a broker running and InvenTree will still function.

Right now I agree. In the future, when "background tasks" are put in place, I believe it would be important to keep this option open (eg. keep no-broker needed compatible settings). The downside is that instead of creating a new Celery task, InvenTree should still be able to run a synchronized routine instead.

During installation/deployment user should have the choice to:

  • run "easy-to-install" InvenTree environment with synchronized tasks (currently it is only that guy 😄)
  • run "experienced-users-install" InvenTree environment with "asynchronized" tasks (the future!)

The former is for single/low-count users with no to low-count plugins, the later is for multiple power users with many plugins.

I'm definitely including myself to the second category but I do believe the first category of users would much rather keep their setup as minimal as possible.

Your effort to enable Celery is great (I personally love it as a fan of Celery) but IMO we should keep in mind that in the InvenTree settings, one should have the choice to define a broker or not (which will drive the type of task/routine, eg. sync vs async). Adding packages/dependencies shouldn't be the problem but when we start creating task/routine relying solely on Celery and the message broker, this is where it can become problematic.

@amishHammer
Copy link

The problem with having 2 long running task subsystems is code duplication and the added complexity maintaining both systems side by. If that could be adequately abstracted that may make that simpler,

As far as complexity, at least on Ubuntu its as simple as apt-get install rabbitmq-server the default configuration is enough to make celery work. Personally I don't find this a huge leap but I understand wanting to keep it minimal.

@rcludwick
Copy link
Contributor

rcludwick commented Jan 11, 2021

As far as complexity, at least on Ubuntu its as simple as apt-get install rabbitmq-server

I've been running Django Q for a while, and it seems pretty functional, and the only requirement it has is a database. It's even simpler. No apt-get install required!

That said, if you want, it can run with a message broker, and supports several. https://github.com/Koed00/django-q

Also rabbitmq have care and feeding cycles that are non-trivial to debug and maintain. If the goal is to get this running in a simple docker container while allowing the user to expand if he wants, then I much prefer Django Q.

@SchrodingersGat
Copy link
Member Author

django-q looks interesting, I'll have to investigate that

@eeintech
Copy link
Contributor

I would also vote in favor of Django Q if it can run without message broker as it relies only on Python dependencies, much less to worry about!

@SchrodingersGat
Copy link
Member Author

I've recently been looking more seriously into django-q

The tasks I wish to use this for right now are:

  • Checking for InvenTree updates (every week or so)
  • Sending emails (e.g. for new users, password reset, etc)

It looks "fit for purpose", there are a couple of outstanding questions I need to answer first. These are probably "easy" questions for someone with more system-admin experience than myself.

Essentially the question is how to start the django-q process, and ensure that it is running. Also, when the server is stopped, the django-q process should be brought down too.

Currently you have to "manually" start the cluster with python manage.py qcluster docs. This starts the cluster process running.

Ideally I want this process to be run automatically when the server is started, and brought down when the server stops.

Admittedly it is possible that InvenTree is now at the point of complexity where it needs to be managed externally (e.g. systemctl) and we just have to write some more comprehensive documentation..

I would also like to continue to support windows, although with the ability to use WSL under Windows (which is how I dev) this is less of an issue.

Perhaps we have to move to linux only, and provide documentation on how to setup a systemd script for setup / teardown.

For those with experience using django-q - is this the "best" way to approach this? Any other suggestions?

@rcludwick
Copy link
Contributor

rcludwick commented Mar 11, 2021

I like celery for massively scalability, but it doesn't make a lot of sense for lightweight tasking, since you now have to do a care and feeding of a RabbitMQ server. You can still use RabbitMQ in Django Q, but it doesn't force you into it. You can use the DB as the queue, and because you're doing that, you do get scalability instantly as long as you can share a DB connection.

Most celery workers I've seen in production were still hitting a single database, even though they were split across multiple worker machines. So the benefits of separating messages into a message broker, didn't really gain anything other than administration of RabbitMQ. And if you still need a database, you might just as well use Django Q with the ORM.

I'm assuming that Django Q is just multiprocessing the workers, though I haven't looked to be honest. I would expect that to work just fine on windows, and since it's all python, there shouldn't be any real surprises to keep it from running on windows, I think.

In practice I had more issues with Celery running on windows (cuz I had to develop on windows, cuz you know, security). Celery had more dependencies that broke more often. They ran just fine on linux.

I was thinking if you wanted to combine them easily you would use multiprocess to launch 4 server instances (uwsgi/gunicorn etc) and one django q cluster manager process... then the django q cluster manager would launch it's own workers. From there you're just monitoring the processes, and kill any that don't get cleaned up.

Or just use supervisord (which is python). Supervisord works with cygwin, or it looks like you can write a windows-service to do it, which may be better anyway to give native windows support.

Inside the docker container if you use it, use supervisord to start the web instance (or 4 with an nginx proxy running in front of it) and the django Q workers, and then provide a sample supervisord.conf for people that don't want to use docker. If you're supporting windows, I'd say it's the responsibility of a package installer to install the windows service easily, or a script that can do it with Administrator privs and then let windows be responsible for starting and restarting it. Apparently there's windows support for running python in a windows service, I haven't tried it though.

My opinion though is that supervisord > systemctl since it can handle multiple services easily and bring them all up and down. But having multiple options for people is not a bad idea either.

@SchrodingersGat
Copy link
Member Author

@rcludwick thanks for the input here. Sounds like you have a lot of experience in this area - I'm coming from an embedded systems background, so having do a lot of reading to keep maintaining and scaling this project :)

I might throw a few more questions your way - thanks for sharing your knowledge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants