Add scheduled jobs #2743

dcramer · 2016-02-24T04:51:39Z

WIP

Basic wire-up of scheduled jobs.

Refs GH-2730

This change is

dcramer · 2016-02-24T04:51:54Z

@tkaemming @mattrobenolt thoughts?

codecov-io · 2016-02-24T04:56:04Z

Current coverage is `82.93%`

Merging #2743 into master will decrease coverage by -0.47% as of 7dd9293

@@            master   #2743   diff @@
======================================
  Files          881     883     +2
  Stmts        33665   33695    +30
  Branches         0       0       
  Methods          0               
======================================
- Hit          28079   27945   -134
  Partial          0       0       
- Missed        5586    5750   +164

Review entire Coverage Diff as of 7dd9293

Uncovered Suggestions

+0.07% via ...try/utils/apidocs.py#432...454
+0.07% via ...try/utils/apidocs.py#117...139
+0.06% via ...gs/sentry_helpers.py#191...211
See 7 more...

Powered by Codecov. Updated on successful CI builds.

mattrobenolt · 2016-02-24T04:56:26Z

src/sentry/tasks/scheduler.py

+        with Lock(lock_key, nowait=True, timeout=60):
+            queryset = list(ScheduledJob.objects.filter(
+                date_scheduled__lte=timezone.now(),
+            )[:100])


If we had a ton of scheduled jobs, this would be possible that it'd start backing up, since it's only consuming 100 at a time, then waiting 1 minute between.

In practice, I don't think this is an issue since it'll be pretty lightly utilized.

And to be pedantic, this isn't a queryset. ;)

I explicitly chose to limit it to make sure we dont have a similar issue we have in the resolution cleanup (that its unbounded). If it gets backlogged that's fine. We could obviously be more smart and have it requeue itself, but I dont foresee this being an issue.

mattrobenolt · 2016-02-24T05:17:27Z

Overall, this is what I had in my head (you just beat me to it). +1

tkaemming · 2016-02-24T05:24:39Z

Seems reasonable — I don't have any background on what this is for, though.

dcramer · 2016-05-17T20:53:02Z

Going to merge once green as its needed in getsentry

mattrobenolt · 2016-09-12T19:44:10Z

I'm going to take this over.

mattrobenolt · 2017-06-16T02:04:27Z

src/sentry/tasks/scheduler.py

+
+    lock_key = 'scheduler:process'
+    try:
+        locks = LockManager(RedisLockBackend(redis.clusters.get('default')))


Nah. Just use:

from sentry.app import locks from sentry.utils import TimedRetryPolicy ... lock = locks.get('scheduler:process', duration=60) with TimedRetryPolicy(5)(lock.acquire): ...

You don't wanna do all the stuff manually.

What @mattrobenolt said.

If you want to maintain the nowait=True behavior from the previous change, you don't have to wrap the lock.acquire in the retry policy and can instead do this:

with locks.get('scheduler:process', duration=60).acquire():

Ok fixed. I'm sticking with Matt's behavior, as it looks like it's relatively common in the codebase and some of the scheduled tasks include sending email, so I assume the potential for 3rd party glitches I'd want to retry.

nvm, going with Ted's behavior.

tkaemming

I'm sticking with Matt's behavior, as it looks like it's relatively common in the codebase and some of the scheduled tasks include sending email, so I assume the potential for 3rd party glitches I'd want to retry.

I'm not sure I understand your rationale here, since this is going to retry every 60 seconds anyway but whatever.

tkaemming · 2017-06-23T23:38:47Z

src/sentry/tasks/scheduler.py

@@ -0,0 +1,33 @@
+


Extra newline

tkaemming · 2017-06-23T23:40:00Z

src/sentry/tasks/scheduler.py

+
+    lock = locks.get('scheduler:process', duration=60)
+    with TimedRetryPolicy(5)(lock.acquire):
+        queryset = list(ScheduledJob.objects.filter(


Not a queryset, like @mattrobenolt said.

I couldn't find @mattrobenolt's comment? Just cmd-F'd over this page...

I tried...something? Not sure if it's what you wanted fixed though.

Not sure where it went (victim of force push at some point maybe) but the gist of the comment is that the variable name doesn't actually represent what the value is, since casting it to list causes it not to be a QuerySet.

Ah, that's what I thought. I moved somethings around (as part of the warning when there's >100 jobs), so queryset should be appropriate now.

tkaemming · 2017-06-23T23:41:30Z

src/sentry/tasks/scheduler.py

+        )[:100])
+
+        for job in queryset:
+            logger.info('Sending scheduled job %s with payload %r',


This probably shouldn't be info, I'll defer to @JTCunning's judgement here if he's got opinions.

This is a debug statement.

tkaemming · 2017-06-23T23:48:06Z

src/sentry/tasks/scheduler.py

+    with TimedRetryPolicy(5)(lock.acquire):
+        queryset = list(ScheduledJob.objects.filter(
+            date_scheduled__lte=timezone.now(),
+        )[:100])


Might be worth logging something if the size of this is > 100 since that could get out of control.

JTCunning · 2017-06-24T16:25:28Z

src/sentry/tasks/scheduler.py

+from sentry.tasks.base import instrumented_task
+from sentry.utils.retries import TimedRetryPolicy
+
+logger = logging.getLogger('sentry')


logger = logging.getLogger('sentry.scheduler')

JTCunning · 2017-06-24T16:27:47Z

src/sentry/tasks/scheduler.py

+        )[:100])
+
+        for job in queryset:
+            logger.info('Sending scheduled job %s with payload %r',


This is a debug statement.

JTCunning · 2017-06-24T16:30:06Z

src/sentry/tasks/scheduler.py

+
+        ScheduledJob.objects.filter(
+            id__in=[o.id for o in queryset],
+        ).delete()


We should probably be deleting objects one by one after a successful enqueuing. If for some reason we error out on send_task, the next time this runs we will reschedule all jobs that were enqueued before the error.

JTCunning

Logging stuff is good. Deferring rest of this to someone else.

tkaemming

More nitpicky stuff but not going to block on it

tkaemming · 2017-06-28T23:57:43Z

src/sentry/tasks/scheduler.py

+            date_scheduled__lte=timezone.now(),
+        )
+        job_count = queryset.count()
+        if job_count > 100:


Probably better to just select 101 and see if size of the size of the result exceeds 100 to avoid making the database do the query twice. I'm not sure the exact number matters, we just want to be able to figure out if we're lagging behind or not.

Good idea. Done.

tkaemming · 2017-06-28T23:58:17Z

src/sentry/tasks/scheduler.py

+        )
+        job_count = queryset.count()
+        if job_count > 100:
+            logger.debug('More than 100 ScheduledJob\'s: %d jobs found.' % job_count)


This should use parameterized logging instead of string formatting like the log statement below.

Ah, I did not realized it worked that that. Done.

tkaemming · 2017-06-29T21:18:57Z

src/sentry/tasks/scheduler.py

+    from sentry.celery import app
+
+    with locks.get('scheduler.process', duration=60).acquire():
+        job_list = ScheduledJob.objects.filter(


I forget exactly how the query cache works but it'd be safer to at this point just coerce this result to a list so that way it's explicit that you don't intend for it to run twice.

tkaemming · 2017-06-29T21:19:10Z

src/sentry/tasks/scheduler.py

+        )[:101]
+
+        if len(job_list) > 100:
+            logger.debug('More than 100 ScheduledJob\'s: %r jobs found.', len(job_list))


This will always be 101 if it logs now.

Also ScheduledJob's is a possessive, which this is not.

ghost · 2017-06-29T22:39:05Z

	1 Warning
⚠️	PR includes migrations

Migration Checklist

new columns need to be nullable (unless table is new)
migration with any new index needs to be done concurrently
data migrations should not be done inside a transaction
before merging, check to make sure there aren't conflicting migration ids

Generated by 🚫 danger

mattrobenolt reviewed Feb 24, 2016
View reviewed changes

mattrobenolt mentioned this pull request Mar 18, 2016

Add internal queue monitoring #2809

Closed

4 tasks

mattrobenolt force-pushed the master branch from 4f70387 to 2a92381 Compare March 20, 2016 03:25

mattrobenolt force-pushed the master branch 2 times, most recently from 03928d0 to 59cc451 Compare March 27, 2016 21:25

dcramer force-pushed the scheduled-jobs branch 2 times, most recently from 3c507a7 to 1736236 Compare May 17, 2016 20:51

mattrobenolt self-assigned this Sep 12, 2016

ehfeng force-pushed the scheduled-jobs branch from 1736236 to 82a94eb Compare June 15, 2017 22:29

ehfeng requested a review from tkaemming June 16, 2017 00:15

mattrobenolt reviewed Jun 16, 2017

View reviewed changes

ehfeng force-pushed the scheduled-jobs branch 2 times, most recently from 856007b to 50f7783 Compare June 22, 2017 22:48

tkaemming reviewed Jun 23, 2017

View reviewed changes

JTCunning previously requested changes Jun 24, 2017

View reviewed changes

dcramer and others added 8 commits June 26, 2017 11:06

Add scheduled jobs

38294e3

regenerating migration, adding __core__ to model

d3b72d2

moving to LockManager pattern

b877a2e

updating change with new table

844e26e

using TimedRetryPolicy instead of manually creating locks

e6681c3

fixing CHANGES

1c9b7a7

updating migration

6469a7c

adding ted and jtcunning's comments

6648166

flipping info to debug when there's more than 100 jobs

3f6a458

ehfeng force-pushed the scheduled-jobs branch from b947d63 to 3f6a458 Compare June 26, 2017 18:19

JTCunning reviewed Jun 27, 2017

View reviewed changes

tkaemming approved these changes Jun 28, 2017

View reviewed changes

addressing ted's comments

12d32e5

tkaemming reviewed Jun 29, 2017

View reviewed changes

fixing ted comments

049dc5c

getsentry deleted a comment from getsentry-bot Jun 29, 2017

ehfeng merged commit a5f8e7f into master Jun 29, 2017

ehfeng deleted the scheduled-jobs branch June 29, 2017 22:39

github-actions bot locked and limited conversation to collaborators Dec 22, 2020

Uh oh!

Add scheduled jobs #2743

Add scheduled jobs #2743

Uh oh!

Conversation

dcramer commented Feb 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcramer commented Feb 24, 2016

Uh oh!

codecov-io commented Feb 24, 2016

Current coverage is 82.93%

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattrobenolt commented Feb 24, 2016

Uh oh!

tkaemming commented Feb 24, 2016

Uh oh!

dcramer commented May 17, 2016

Uh oh!

mattrobenolt commented Sep 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tkaemming left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JTCunning left a comment

Choose a reason for hiding this comment

Uh oh!

tkaemming left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcramer commented Feb 24, 2016 •

edited

Loading

Current coverage is `82.93%`