-
Notifications
You must be signed in to change notification settings - Fork 59
retry on failure #381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retry on failure #381
Conversation
Pull Request Test Coverage Report for Build 1384
💛 - Coveralls |
arostamianfar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this much more robust! Just a few restructuring comments to make it a bit more readable.
| # Cloud Storage allows to compose up to 32 objects. | ||
| _MAX_NUM_OF_BLOBS_PER_COMPOSE = 32 | ||
| _NUMBER_OF_API_CALL_RETRIES = 5 | ||
| _TIMEOUT = 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/_TIMEOUT/_COMPOSE_TIMEOUT_SECONDS
In general, always include the unit in time variables as this can easily be interpreted as 30ms :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tip! Done.
| new_blob_prefix = filesystems.FileSystems.join(blob_prefix, 'composed_') | ||
|
|
||
| proc_pool = multiprocessing.Pool() | ||
| arguments = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/arguments/blobs_to_compose_args
"arguments" is a very generic name, so it's not clear what it's representing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
| arguments.append( | ||
| (self._project, self._bucket_name, blob_names, new_blob_name)) | ||
|
|
||
| retry = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/retry/num_retries (retry usually indicates a boolean variable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| while arguments: | ||
| proc_pool = multiprocessing.Pool(processes=8) | ||
| results = [] | ||
| failed_composing_arguments = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please move this to just after proc_pool.close() ... in general, it's best to declare variables as closes to where they're first used (we sometimes don't do this for unit tests as it's more readable to group the 'expected' cases together, but we should try to do this in the actual code).
Also, I think we can rename this to failed_blobs_to_compose_args to match the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
|
|
||
| # Cloud Storage allows to compose up to 32 objects. | ||
| _MAX_NUM_OF_BLOBS_PER_COMPOSE = 32 | ||
| _NUMBER_OF_API_CALL_RETRIES = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/_NUMBER_OF_API_CALL_RETRIES/_MAX_NUM_COMPOSE_RETRIES
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| 'timeout.', argument[2][0], argument[2][-1]) | ||
| failed_composing_arguments.append(argument) | ||
|
|
||
| arguments = failed_composing_arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more readable to do:
while num_retries <= _MAX_RETRIES:
...
if failed_blobs_to_compose_args:
num_retries += 1
blobs_to_compose_args = failed_blobs_to_compose_args
logging.warning(...)
else:
break
else:
raise RuntimeError(...)
The control flow is more obvious that way (e.g. the while loop clearly defines that it's based on number of retries rather than having arguments; the failed conditions are also more explicitly stated). Note that it uses pythonic 'while-else' statement.
But lemme know what you think or if you have other ideas!
| else: | ||
| logging.warning( | ||
| '%d jobs of composing of blobs failed due to timeout. Retry for ' | ||
| 'the %d time.', len(arguments), retry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/Retry for ... / Retrying %d of %d times. ... so it becomes, "Retrying 1 of 3 times."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done.
allieychen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Asha for the review. It seems much cleaner! PTAL :)
|
|
||
| # Cloud Storage allows to compose up to 32 objects. | ||
| _MAX_NUM_OF_BLOBS_PER_COMPOSE = 32 | ||
| _NUMBER_OF_API_CALL_RETRIES = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| # Cloud Storage allows to compose up to 32 objects. | ||
| _MAX_NUM_OF_BLOBS_PER_COMPOSE = 32 | ||
| _NUMBER_OF_API_CALL_RETRIES = 5 | ||
| _TIMEOUT = 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tip! Done.
| new_blob_prefix = filesystems.FileSystems.join(blob_prefix, 'composed_') | ||
|
|
||
| proc_pool = multiprocessing.Pool() | ||
| arguments = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
| arguments.append( | ||
| (self._project, self._bucket_name, blob_names, new_blob_name)) | ||
|
|
||
| retry = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| while arguments: | ||
| proc_pool = multiprocessing.Pool(processes=8) | ||
| results = [] | ||
| failed_composing_arguments = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
| else: | ||
| logging.warning( | ||
| '%d jobs of composing of blobs failed due to timeout. Retry for ' | ||
| 'the %d time.', len(arguments), retry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done.
arostamianfar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! just two nits...
| else: | ||
| break | ||
| else: | ||
| raise RuntimeError('Composing of blobs fails after {} ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/fails/failed
| num_retries += 1 | ||
| blobs_to_compose_args = failed_blobs_to_compose_args | ||
| logging.warning( | ||
| '%d jobs of composing of blobs failed due to timeout. Retrying for ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just "Retrying %d of %d." is cleaner (I think it should be "times" as "time" has a different meaning, but we can just remove it :) ).
When there are a large number of blobs, the composing may fail. Retry when this happens.
Issues: 85
Tested: manually ran BQ to VCF.