You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should probably catch the exception and tell the Celery task to retry.
[2022-01-15 17:34:53,694 ERROR celery.app.trace] Task puzzles.tasks.create_puzzle_sheet_and_channel[40bdd864-4629-4c74-9d9a-fcbee95565a5] raised unexpected: <HttpError 403 when requesting https://www.googleapis.com/drive/v3/files/1rWyhGIkyx4h64us3BGeFrNY_xe01cxnKCNB4AxRYTO4/copy?alt=json returned "User rate limit exceeded.">
Traceback (most recent call last):
File "/app/.heroku/python/lib/python3.8/site-packages/celery/app/trace.py", line 385, in trace_task
R = retval = fun(*args, **kwargs)
File "/app/.heroku/python/lib/python3.8/site-packages/celery/app/trace.py", line 650, in protected_call
return self.run(*args, **kwargs)
File "/app/herring/puzzles/tasks.py", line 164, in create_puzzle_sheet_and_channel
sheet_id = make_sheet(sheet_title)
File "/app/herring/puzzles/spreadsheets.py", line 35, in make_sheet
got = service.files().copy(fileId=settings.HERRING_SECRETS['gapps-doc-to-clone'], body=body).execute()
File "/app/.heroku/python/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "/app/.heroku/python/lib/python3.8/site-packages/googleapiclient/http.py", line 856, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/drive/v3/files/1rWyhGIkyx4h64us3BGeFrNY_xe01cxnKCNB4AxRYTO4/copy?alt=json returned "User rate limit exceeded."> (worker.1, v203)
This also caused the Discord channels not to be created, because the spreadsheet is done first (so it can put the spreadsheet link in the channel topic).
The text was updated successfully, but these errors were encountered:
Just want to flag the obvious but easily-overlooked fact that retrying in the face of a ratelimit is dangerous unless we have insight into what the ratelimit is. Otherwise you can end up in the meltdown we had with Zulip that one year, where each failed retry is counted against the ratelimit and we never recover.
Do we know why we got ratelimited? Was it probably just some kind of glitch on google's end? I can't imagine we were doing anything intensive at the time.
"Better recovery" need not be retrying. Could be a way to manually re-attempt or fix after the fact. Also, when creating the sheet fails, creating the discord channel also failed, and maybe error handling at least would be good so we can end up in a known state without a sheet?
Yeah, I would ideally like to see us have better handling of those kinds of states. Right now I think our model is not very robust to anything in the world being in an unexpected state.
We should probably catch the exception and tell the Celery task to retry.
This also caused the Discord channels not to be created, because the spreadsheet is done first (so it can put the spreadsheet link in the channel topic).
The text was updated successfully, but these errors were encountered: