Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RQ: periodically clear failed jobs #4306

Merged
merged 11 commits into from
Nov 7, 2019
Merged

RQ: periodically clear failed jobs #4306

merged 11 commits into from
Nov 7, 2019

Conversation

rauchy
Copy link
Contributor

@rauchy rauchy commented Nov 6, 2019

As per rq/rq#1143, failed jobs stay in Redis forever. If this is true, we should implement our own periodic cleanup of these jobs.

@arikfr arikfr added the Backend label Oct 28, 2019
@arikfr arikfr added this to To do in Switch from Celery to RQ via automation Oct 28, 2019
@rauchy rauchy moved this from To do to In progress in Switch from Celery to RQ Nov 4, 2019
@rauchy rauchy requested review from arikfr and jezdez November 6, 2019 11:30
def purge_failed_jobs():
jobs = rq_redis_connection.scan_iter('rq:job:*')

is_idle = lambda key: rq_redis_connection.object('idletime', key) > settings.JOB_DEFAULT_FAILURE_TTL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subcommand is available when maxmemory-policy is set to an LRU policy or noeviction.

(This command being idletime)

Is this the default config for Redis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is noeviction. On AWS, for example, it defaults to volatile-lru.

stale_jobs = [key for key in jobs if is_idle(key) and has_failed(key) and not_in_any_failed_registry(key)]

for key in stale_jobs:
rq_redis_connection.delete(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth removing it from the FailedRegistry while we at it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that, but the point is that we want to let the FailedJobRegistry handle its own state. From what I can tell, there aren't any dire consequences to have ghost job ids in the FailedJobRegistry (it is only used for requeueing and in that case - these jobs will simply not get requeued)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we delete from FailedJobRegistry, we might as well just do that and avoid checking for job inclusion (and avoid the whole comment+bypass at the top of the function).

🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just worried that over a year it can accumulate quite a lot of job ids there, which might have some consequences in performance or at least memory usage.

Re. avoid checking job inclusion: I guess we can skip this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah 02555b9 makes things simpler.

redash/settings/__init__.py Outdated Show resolved Hide resolved
Co-Authored-By: Arik Fraimovich <arik@arikfr.com>
Copy link
Member

@arikfr arikfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@rauchy rauchy merged commit a33d11d into master Nov 7, 2019
Switch from Celery to RQ automation moved this from In progress to Done Nov 7, 2019
@rauchy rauchy deleted the clean-failed-jobs branch November 7, 2019 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants