Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions src/sentry/tasks/relocation.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,14 +302,18 @@ def uploading_start(uuid: UUID, replying_region_name: str | None, org_slug: str
name="sentry.relocation.fulfill_cross_region_export_request",
queue="relocation",
autoretry_for=(Exception,),
max_retries=MAX_FAST_TASK_RETRIES,
retry_backoff=RETRY_BACKOFF,
# So the 1st retry is after ~0.5 min, 2nd after ~1 min, 3rd after ~2 min, 4th after ~4 min.
max_retries=4,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make these constants instead of magic numbers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only task that uses these specific numbers, so we'll just have max_retries=MAX_RETRIES_FOR_FULFILL.., which seems a bit redundant. I agree that if we used this constant in multiple places that would be a good way to go.

retry_backoff=30,
retry_backoff_jitter=True,
# Setting `acks_late` here allows us to retry the potentially long-lived task if the k8s pod if
# the worker received SIGKILL/TERM/QUIT. We have a timeout check in the task itself to make sure
# it does not loop indefinitely.
# Setting `acks_late` + `task_reject_on_worker_lost` here allows us to retry the potentially
# long-lived task if the k8s pod of the worker received SIGKILL/TERM/QUIT (or we ran out of some
# other resource, leading to the same outcome). We have a timeout check at the very start of the
# task itself to make sure it does not loop indefinitely.
acks_late=True,
soft_time_limit=MEDIUM_TIME_LIMIT,
task_reject_on_worker_lost=True,
# 10 minutes per try.
soft_time_limit=60 * 10,
silo_mode=SiloMode.REGION,
)
def fulfill_cross_region_export_request(
Expand Down