Skip to content

Conversation

@dcramer
Copy link
Member

@dcramer dcramer commented Feb 24, 2016

WIP

Basic wire-up of scheduled jobs.

Refs GH-2730


This change is Reviewable

@dcramer
Copy link
Member Author

dcramer commented Feb 24, 2016

@tkaemming @mattrobenolt thoughts?

@codecov-io
Copy link

Current coverage is 82.93%

Merging #2743 into master will decrease coverage by -0.47% as of 7dd9293

@@            master   #2743   diff @@
======================================
  Files          881     883     +2
  Stmts        33665   33695    +30
  Branches         0       0       
  Methods          0               
======================================
- Hit          28079   27945   -134
  Partial          0       0       
- Missed        5586    5750   +164

Review entire Coverage Diff as of 7dd9293


Uncovered Suggestions

  1. +0.07% via ...try/utils/apidocs.py#432...454
  2. +0.07% via ...try/utils/apidocs.py#117...139
  3. +0.06% via ...gs/sentry_helpers.py#191...211
  4. See 7 more...

Powered by Codecov. Updated on successful CI builds.

with Lock(lock_key, nowait=True, timeout=60):
queryset = list(ScheduledJob.objects.filter(
date_scheduled__lte=timezone.now(),
)[:100])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had a ton of scheduled jobs, this would be possible that it'd start backing up, since it's only consuming 100 at a time, then waiting 1 minute between.

In practice, I don't think this is an issue since it'll be pretty lightly utilized.

And to be pedantic, this isn't a queryset. ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explicitly chose to limit it to make sure we dont have a similar issue we have in the resolution cleanup (that its unbounded). If it gets backlogged that's fine. We could obviously be more smart and have it requeue itself, but I dont foresee this being an issue.

@mattrobenolt
Copy link
Contributor

Overall, this is what I had in my head (you just beat me to it). +1

@tkaemming
Copy link
Contributor

Seems reasonable — I don't have any background on what this is for, though.

@mattrobenolt mattrobenolt mentioned this pull request Mar 18, 2016
4 tasks
@mattrobenolt mattrobenolt force-pushed the master branch 2 times, most recently from 03928d0 to 59cc451 Compare March 27, 2016 21:25
@dcramer dcramer force-pushed the scheduled-jobs branch 2 times, most recently from 3c507a7 to 1736236 Compare May 17, 2016 20:51
@dcramer
Copy link
Member Author

dcramer commented May 17, 2016

Going to merge once green as its needed in getsentry

@mattrobenolt mattrobenolt self-assigned this Sep 12, 2016
@mattrobenolt
Copy link
Contributor

I'm going to take this over.

@ehfeng ehfeng requested a review from tkaemming June 16, 2017 00:15

lock_key = 'scheduler:process'
try:
locks = LockManager(RedisLockBackend(redis.clusters.get('default')))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah. Just use:

from sentry.app import locks
from sentry.utils import TimedRetryPolicy
...

lock = locks.get('scheduler:process', duration=60)
with TimedRetryPolicy(5)(lock.acquire):
  ...

You don't wanna do all the stuff manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @mattrobenolt said.

If you want to maintain the nowait=True behavior from the previous change, you don't have to wrap the lock.acquire in the retry policy and can instead do this:

with locks.get('scheduler:process', duration=60).acquire():

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok fixed. I'm sticking with Matt's behavior, as it looks like it's relatively common in the codebase and some of the scheduled tasks include sending email, so I assume the potential for 3rd party glitches I'd want to retry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, going with Ted's behavior.

@ehfeng ehfeng force-pushed the scheduled-jobs branch 2 times, most recently from 856007b to 50f7783 Compare June 22, 2017 22:48
Copy link
Contributor

@tkaemming tkaemming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sticking with Matt's behavior, as it looks like it's relatively common in the codebase and some of the scheduled tasks include sending email, so I assume the potential for 3rd party glitches I'd want to retry.

I'm not sure I understand your rationale here, since this is going to retry every 60 seconds anyway but whatever.

@@ -0,0 +1,33 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra newline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


lock = locks.get('scheduler:process', duration=60)
with TimedRetryPolicy(5)(lock.acquire):
queryset = list(ScheduledJob.objects.filter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a queryset, like @mattrobenolt said.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find @mattrobenolt's comment? Just cmd-F'd over this page...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried...something? Not sure if it's what you wanted fixed though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where it went (victim of force push at some point maybe) but the gist of the comment is that the variable name doesn't actually represent what the value is, since casting it to list causes it not to be a QuerySet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's what I thought. I moved somethings around (as part of the warning when there's >100 jobs), so queryset should be appropriate now.

)[:100])

for job in queryset:
logger.info('Sending scheduled job %s with payload %r',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably shouldn't be info, I'll defer to @JTCunning's judgement here if he's got opinions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a debug statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

with TimedRetryPolicy(5)(lock.acquire):
queryset = list(ScheduledJob.objects.filter(
date_scheduled__lte=timezone.now(),
)[:100])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth logging something if the size of this is > 100 since that could get out of control.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

from sentry.tasks.base import instrumented_task
from sentry.utils.retries import TimedRetryPolicy

logger = logging.getLogger('sentry')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger = logging.getLogger('sentry.scheduler')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

)[:100])

for job in queryset:
logger.info('Sending scheduled job %s with payload %r',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a debug statement.


ScheduledJob.objects.filter(
id__in=[o.id for o in queryset],
).delete()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably be deleting objects one by one after a successful enqueuing. If for some reason we error out on send_task, the next time this runs we will reschedule all jobs that were enqueued before the error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

@JTCunning JTCunning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging stuff is good. Deferring rest of this to someone else.

@JTCunning JTCunning dismissed their stale review June 27, 2017 23:19

Logging stuff is good. Deferring rest of this to someone else.

Copy link
Contributor

@tkaemming tkaemming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More nitpicky stuff but not going to block on it

date_scheduled__lte=timezone.now(),
)
job_count = queryset.count()
if job_count > 100:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to just select 101 and see if size of the size of the result exceeds 100 to avoid making the database do the query twice. I'm not sure the exact number matters, we just want to be able to figure out if we're lagging behind or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

)
job_count = queryset.count()
if job_count > 100:
logger.debug('More than 100 ScheduledJob\'s: %d jobs found.' % job_count)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use parameterized logging instead of string formatting like the log statement below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I did not realized it worked that that. Done.

from sentry.celery import app

with locks.get('scheduler.process', duration=60).acquire():
job_list = ScheduledJob.objects.filter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget exactly how the query cache works but it'd be safer to at this point just coerce this result to a list so that way it's explicit that you don't intend for it to run twice.

)[:101]

if len(job_list) > 100:
logger.debug('More than 100 ScheduledJob\'s: %r jobs found.', len(job_list))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always be 101 if it logs now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ScheduledJob's is a possessive, which this is not.

@getsentry getsentry deleted a comment from getsentry-bot Jun 29, 2017
@ghost
Copy link

ghost commented Jun 29, 2017

1 Warning
⚠️ PR includes migrations

Migration Checklist

  • new columns need to be nullable (unless table is new)
  • migration with any new index needs to be done concurrently
  • data migrations should not be done inside a transaction
  • before merging, check to make sure there aren't conflicting migration ids

Generated by 🚫 danger

@ehfeng ehfeng merged commit a5f8e7f into master Jun 29, 2017
@ehfeng ehfeng deleted the scheduled-jobs branch June 29, 2017 22:39
@github-actions github-actions bot locked and limited conversation to collaborators Dec 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants