Skip to content
This repository has been archived by the owner on Dec 7, 2022. It is now read-only.

Replace Celery with RQ #3454

Merged
merged 1 commit into from May 14, 2018
Merged

Replace Celery with RQ #3454

merged 1 commit into from May 14, 2018

Conversation

bmbouter
Copy link
Member

@bmbouter bmbouter commented Apr 20, 2018

This PR does the following:

  • port travis to RQ
  • remove all celery references
  • install, systemd docs updates
  • create a custom RQ worker method for Pulp's needs
  • ports the status API to report on the Redis connection
  • port orphan cleanup tasks to work with RQ
  • port all pulp tasks to RQ
  • handle fatal error exception recording on tasks
  • replace apply_async_with_reservation to enqueue_with_reservation
  • task cancellation kills a running task correctly
  • work discovery, normal shutdown, and crash shutdown tested
  • adds settings for Redis connection to settings files
  • removes celery as a dependency
  • adds rq as a dependency

There is a devel repo PR here to be used by vagrant up when testing
this PR: pulp/devel#146

Required PR: pulp/pulp-smash#960
Required PR: pulp/pulp_file#72

@pep8speaks
Copy link

pep8speaks commented Apr 20, 2018

Hello @bmbouter! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on May 14, 2018 at 16:57 Hours UTC

bmbouter pushed a commit to bmbouter/pulp-smash that referenced this pull request Apr 20, 2018
@bmbouter bmbouter force-pushed the switch-to-rq branch 3 times, most recently from c5676d3 to 81d4c0d Compare April 20, 2018 19:18
bmbouter pushed a commit to bmbouter/devel that referenced this pull request Apr 20, 2018
This commit:

* Removes rabbitMQ and Qpid installation
* Installs, enables, and starts Redis
* Updates the systemd assets to use the RQ worker syntax

This PR should be tested against these core and pulp_file PRs:

pulp/pulp#3454
pulp/pulp_file#72
bmbouter pushed a commit to bmbouter/devel that referenced this pull request Apr 20, 2018
This commit:

* Removes rabbitMQ and Qpid installation
* Installs, enables, and starts Redis
* Updates the systemd assets to use the RQ worker syntax

This PR modifies the environment for these changes:

pulp/pulp#3454
pulp/pulp_file#72
bmbouter pushed a commit to bmbouter/pulp-smash that referenced this pull request Apr 20, 2018
The attribute name in the status API changed when RQ with Redis replaced
AMQP brokers like RabbitMQ.

This PR is meant to be used with these core and pulp_file PRs:

pulp/pulp#3454
pulp/pulp_file#72
@bmbouter bmbouter added the 3.0 label Apr 20, 2018
inner_task_id = str(uuid.uuid4())
task_name = self.name
This method provides normal enqueue functionality, while also requesting necessary locks for
serialized urls No two tasks that claim the same resource can execute concurrently. It
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serialized urls No two tasks

missing a period

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty. fixed in next push.


from pulpcore.app.models import Task

from .constants import TASKING_CONSTANTS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolute imports are preferable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty fixed in next push.


return super().perform_job(job, queue)

def handle_job_failure(self, job, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These next couple of methods need docstrings

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next push, ty.

bmbouter pushed a commit to bmbouter/pulp-smash that referenced this pull request Apr 23, 2018
The attribute name in the status API changed when RQ with Redis replaced
AMQP brokers like RabbitMQ.

This PR is meant to be used with these core and pulp_file PRs:

pulp/pulp#3454
pulp/pulp_file#72
@bmbouter bmbouter force-pushed the switch-to-rq branch 3 times, most recently from 297feff to ae3e46d Compare April 23, 2018 17:55
@dralley
Copy link
Contributor

dralley commented Apr 23, 2018

This is definitely not an "issue", but an annoyance we may want to address at some point post-merge:

When restarted, workers usually come up first and they immediately complain about the lack of a resource manager, just a fraction of a second before the resource manager comes online. It unnecessarily clutters the logs with warnings, but is otherwise innocuous.

Apr 23 18:16:52 pulp3.dev pulp[20649]: pulpcore.tasking.services.worker_watcher:INFO: Cleaning up shutdown worker 'resource_manager@pulp3.dev'.                                
Apr 23 18:16:52 pulp3.dev pulp[20648]: pulpcore.tasking.services.worker_watcher:INFO: Worker 'reserved_resource_worker_2@pulp3.dev' is back online.                            
Apr 23 18:16:52 pulp3.dev pulp[20647]: pulpcore.tasking.services.worker_watcher:INFO: Worker 'reserved_resource_worker_1@pulp3.dev' is back online.                            
Apr 23 18:16:52 pulp3.dev pulp[20648]: pulpcore.tasking.services.worker_watcher:ERROR: There are 0 pulp_resource_manager processes running. Pulp will not operate correctly without at least one pulp_resource_mananger process running.                                                
Apr 23 18:16:52 pulp3.dev pulp[20648]: rq.worker:INFO: Cleaning registries for queue: reserved_resource_worker_2@pulp3.dev                  
Apr 23 18:16:52 pulp3.dev pulp[20647]: pulpcore.tasking.services.worker_watcher:ERROR: There are 0 pulp_resource_manager processes running. Pulp will not operate correctly without at least one pulp_resource_mananger process running.                                                
Apr 23 18:16:52 pulp3.dev pulp[20647]: rq.worker:INFO: Cleaning registries for queue: reserved_resource_worker_1@pulp3.dev                  
Apr 23 18:16:52 pulp3.dev pulp[20647]: pulpcore.tasking.services.worker_watcher:ERROR: There are 0 pulp_resource_manager processes running. Pulp will not operate correctly without at least one pulp_resource_mananger process running.                                                
Apr 23 18:16:52 pulp3.dev pulp[20648]: pulpcore.tasking.services.worker_watcher:ERROR: There are 0 pulp_resource_manager processes running. Pulp will not operate correctly without at least one pulp_resource_mananger process running.                                                
Apr 23 18:16:52 pulp3.dev pulp[20649]: rq.worker:INFO: RQ worker 'rq:worker:resource_manager@pulp3.dev' started, version 0.10.0             
Apr 23 18:16:52 pulp3.dev pulp[20649]: rq.worker:INFO: *** Listening on resource_manager...                                                 
Apr 23 18:16:52 pulp3.dev pulp[20649]: pulpcore.tasking.services.worker_watcher:INFO: Worker 'resource_manager@pulp3.dev' is back online.   
Apr 23 18:16:52 pulp3.dev pulp[20649]: rq.worker:INFO: Cleaning registries for queue: resource_manager                     

@dralley
Copy link
Contributor

dralley commented Apr 23, 2018

Restarting the workers during task execution causes the task to be forever in "waiting" state instead of "cancelled".

Given how troublesome this is with Celery, I wouldn't consider it a blocker, but we should look into it later.

[vagrant@pulp3 ~]$ http POST $REMOTE_HREF'sync/' repository=$REPO_HREF; prestart                                                            
HTTP/1.1 202 Accepted                                                                                                                       
Allow: POST, OPTIONS                                                                                                                        
Content-Type: application/json                                                                                                              
Date: Mon, 23 Apr 2018 18:25:02 GMT                                                                                                         
Server: WSGIServer/0.2 CPython/3.6.2                                                                                                        
Vary: Accept                                                                                                                                
                                                                                                                                            
{                                                                                                                                           
    "_href": "http://localhost:8000/api/v3/tasks/97532571-0e8f-4fb0-be10-d7211f4b122d/",                                                    
    "task_id": "97532571-0e8f-4fb0-be10-d7211f4b122d"                                                                                       
}                                                                                                                                           
systemctl restart pulp_worker@1 pulp_worker@2 pulp_resource_manager 
[vagrant@pulp3 ~]$ http GET :8000/api/v3/tasks/
HTTP/1.1 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Date: Mon, 23 Apr 2018 18:25:59 GMT
Server: WSGIServer/0.2 CPython/3.6.2
Vary: Accept

{
    "next": null,
    "previous": null,
    "results": [
        {
            "_href": "http://localhost:8000/api/v3/tasks/97532571-0e8f-4fb0-be10-d7211f4b122d/",
            "created": "2018-04-23T18:25:02.943956Z",
            "finished_at": null,
            "parent": null,
            "started_at": null,
            "state": "waiting",
            "worker": null
        },
        {
            "_href": "http://localhost:8000/api/v3/tasks/6820e2e6-3e0f-4303-bfb6-228d59ece111/",
            "created": "2018-04-23T18:23:55.632709Z",
            "finished_at": "2018-04-23T18:23:56.770609Z",
            "parent": null,
            "started_at": "2018-04-23T18:23:55.783521Z",
            "state": "completed",
            "worker": "http://localhost:8000/api/v3/workers/48050c6f-9928-47c6-8155-8da62a15bff9/"
        }
    ]
}

[vagrant@pulp3 ~]$

Luckily, it does not block other work from going forwards, so it's still a net win...

[vagrant@pulp3 ~]$ http GET :8000/api/v3/tasks/
HTTP/1.1 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Date: Mon, 23 Apr 2018 18:26:40 GMT
Server: WSGIServer/0.2 CPython/3.6.2
Vary: Accept

{
    "next": null,
    "previous": null,
    "results": [
        {
            "_href": "http://localhost:8000/api/v3/tasks/8234ad6d-95d4-4bf7-8c10-3b336c931fd3/",
            "created": "2018-04-23T18:26:37.658708Z",
            "finished_at": "2018-04-23T18:26:38.003875Z",
            "parent": null,
            "started_at": "2018-04-23T18:26:37.772027Z",
            "state": "completed",
            "worker": "http://localhost:8000/api/v3/workers/fc14551f-1460-4b91-8118-d4eecb918755/"
        },
        {
            "_href": "http://localhost:8000/api/v3/tasks/97532571-0e8f-4fb0-be10-d7211f4b122d/",
            "created": "2018-04-23T18:25:02.943956Z",
            "finished_at": null,
            "parent": null,
            "started_at": null,
            "state": "waiting",
            "worker": null
        },
        {
            "_href": "http://localhost:8000/api/v3/tasks/6820e2e6-3e0f-4303-bfb6-228d59ece111/",
            "created": "2018-04-23T18:23:55.632709Z",
            "finished_at": "2018-04-23T18:23:56.770609Z",
            "parent": null,
            "started_at": "2018-04-23T18:23:55.783521Z",
            "state": "completed",
            "worker": "http://localhost:8000/api/v3/workers/48050c6f-9928-47c6-8155-8da62a15bff9/"
        }
    ]
}

[vagrant@pulp3 ~]$ 

@dralley
Copy link
Contributor

dralley commented Apr 23, 2018

If you kill the redis server while a task is running, the workers go down (good!) and cancel the task (also good!). But if you continue to send tasks you will get no indication that anything is wrong, and these tasks will be lost forever because there is no queue or worker to put them on.

When I did a prestart, the workers came back up but the resource manager did not. systemctl status provided:

Apr 23 18:36:07 pulp3.dev rq[21046]:     self.register_birth()
Apr 23 18:36:07 pulp3.dev rq[21046]:   File "/home/vagrant/devel/pulp/pulpcore/pulpcore/tasking/worker.py", line 148, in register_birth
Apr 23 18:36:07 pulp3.dev rq[21046]:     return super().register_birth(*args, **kwargs)
Apr 23 18:36:07 pulp3.dev rq[21046]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/rq/worker.py", line 278, in regis
Apr 23 18:36:07 pulp3.dev rq[21046]:     raise ValueError(msg.format(self.name))
Apr 23 18:36:07 pulp3.dev rq[21046]: ValueError: There exists an active worker named 'resource_manager@pulp3.dev' already
Apr 23 18:36:07 pulp3.dev systemd[1]: pulp_resource_manager.service: Main process exited, code=exited, status=1/FAILURE
Apr 23 18:36:07 pulp3.dev systemd[1]: pulp_resource_manager.service: Unit entered failed state.
Apr 23 18:36:07 pulp3.dev systemd[1]: pulp_resource_manager.service: Failed with result 'exit-code'.

Subsequent restarts did not resolve the situation. ps aux | grep resource reveals that there are no resource manager processes running, so it's purely a book keeping situation.

@dralley
Copy link
Contributor

dralley commented Apr 23, 2018

All in all, I don't see any hard blockers (except for maybe the last issue mentioned, where resource_manager was rendered permanently unstartable). Mostly because everything that was an issue here, was already an issue before. We do have some things to sort out, though, which hopefully will be easier to do under this stack.

@codecov
Copy link

codecov bot commented Apr 23, 2018

Codecov Report

Merging #3454 into 3.0-dev will increase coverage by 0.28%.
The diff coverage is 16.47%.

Impacted file tree graph

@@             Coverage Diff             @@
##           3.0-dev    #3454      +/-   ##
===========================================
+ Coverage    57.05%   57.33%   +0.28%     
===========================================
  Files           59       59              
  Lines         2510     2426      -84     
===========================================
- Hits          1432     1391      -41     
+ Misses        1078     1035      -43
Impacted Files Coverage Δ
pulpcore/pulpcore/app/settings.py 98.38% <ø> (ø) ⬆️
pulpcore/pulpcore/app/tasks/base.py 25% <ø> (-8.34%) ⬇️
pulpcore/pulpcore/exceptions/base.py 45.45% <ø> (ø) ⬆️
pulpcore/pulpcore/tasking/constants.py 100% <ø> (ø) ⬆️
pulpcore/pulpcore/app/tasks/orphan.py 11.76% <ø> (-9.29%) ⬇️
pulpcore/pulpcore/app/tasks/repository.py 32.25% <ø> (-5.98%) ⬇️
pulpcore/pulpcore/app/models/repository.py 54.28% <ø> (ø) ⬆️
...lpcore/pulpcore/tasking/services/manage_workers.py 0% <0%> (ø) ⬆️
pulpcore/pulpcore/app/response.py 66.66% <0%> (ø) ⬆️
...lpcore/pulpcore/tasking/services/worker_watcher.py 0% <0%> (ø) ⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2565468...c069db5. Read the comment docs.

@bmbouter bmbouter force-pushed the switch-to-rq branch 2 times, most recently from c9b10f8 to aaa145c Compare April 23, 2018 21:41
@@ -56,6 +58,9 @@ else
# upload coverage report to codecov
codecov
fi

cat ~/resource_manager.log
cat ~/reserved_worker-1.log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these lines are being called twice (although reserved_workers-1.log has changed to reserved_worker-1.log in your pr).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were. It's fixed now ty.

@dralley
Copy link
Contributor

dralley commented May 7, 2018

@bmbouter
Copy link
Member Author

bmbouter commented May 7, 2018

I applied the patch locally, and I added a breaking change release note. I am testing it now, and I will push all the rebased/patched/tested branches soon so Travis can try also.

bmbouter pushed a commit to bmbouter/pulp-smash that referenced this pull request May 7, 2018
The attribute name in the status API changed when RQ with Redis replaced
AMQP brokers like RabbitMQ.

This PR is meant to be used with these core and pulp_file PRs:

pulp/pulp#3454
pulp/pulp_file#72
bmbouter pushed a commit to bmbouter/devel that referenced this pull request May 7, 2018
This commit:

* Removes rabbitMQ and Qpid installation
* Installs, enables, and starts Redis
* Updates the systemd assets to use the RQ worker syntax
* Installs a specific commit of RQ to workaround a temporary RQ release
  issue

This PR modifies the environment for these changes:

pulp/pulp#3454
pulp/pulp_file#72
@bmbouter bmbouter force-pushed the switch-to-rq branch 4 times, most recently from 292f22b to 2b8bfd8 Compare May 7, 2018 22:07
@daviddavis
Copy link
Contributor

I reviewed the code and it LGTM. I'll let @dralley have final approval though. Thanks @bmbouter.

bmbouter pushed a commit to bmbouter/devel that referenced this pull request May 8, 2018
This commit:

* Removes rabbitMQ and Qpid installation
* Installs, enables, and starts Redis
* Updates the systemd assets to use the RQ worker syntax
* Installs a specific commit of RQ to workaround a temporary RQ release
  issue

This PR modifies the environment for these changes:

pulp/pulp#3454
pulp/pulp_file#72
@bmbouter bmbouter force-pushed the switch-to-rq branch 2 times, most recently from b91c0d4 to 8be2ae6 Compare May 8, 2018 17:14
@bmbouter
Copy link
Member Author

bmbouter commented May 8, 2018

Here is the blog post draft that I plan to merge today or tomorrow ahead of the code being merged on May 15th. I will then use that link to replace the example.com link in the release notes for this PR.

dralley added a commit to dralley/pulp_python that referenced this pull request May 12, 2018
dralley added a commit to dralley/pulp_python that referenced this pull request May 14, 2018
This PR does the following:

* port travis to RQ
* remove all celery references
* install, systemd docs updates
* create a custom RQ worker method for Pulp's needs
* ports the status API to report on the Redis connection
* port orphan cleanup tasks to work with RQ
* port all pulp tasks to RQ
* handle fatal error exception recording on tasks
* replace apply_async_with_reservation to enqueue_with_reservation
* task cancellation kills a running task correctly
* work discovery, normal shutdown, and crash shutdown tested
* adds settings for Redis connection to settings files
* removes celery as a dependency
* adds rq as a dependency

There is a devel repo PR here to be used by `vagrant up` when testing
this PR: pulp/devel#146

Required PR: pulp/pulp-smash#960
Required PR: pulp/pulp_file#72
dralley added a commit to dralley/pulp_python that referenced this pull request May 14, 2018
@bmbouter bmbouter merged commit 2ce1974 into pulp:3.0-dev May 14, 2018
@bmbouter bmbouter deleted the switch-to-rq branch May 14, 2018 18:18
bmbouter pushed a commit to pulp/pulp-smash that referenced this pull request May 14, 2018
The attribute name in the status API changed when RQ with Redis replaced
AMQP brokers like RabbitMQ.

This PR is meant to be used with these core and pulp_file PRs:

pulp/pulp#3454
pulp/pulp_file#72
CodeHeeler pushed a commit to CodeHeeler/pulp_python that referenced this pull request Jun 26, 2018
daviddavis pushed a commit to daviddavis/pulp_file that referenced this pull request May 14, 2019
* Updates Travis to use RQ worker runners
* Remove usage of Celery
* port apply_async_with_reservation to enqueue_with_reservation

Required PR: pulp/pulp-smash#960
Required PR: pulp/pulp#3454
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants