Skip to content

Conversation

@allieychen
Copy link
Contributor

When there are a large number of blobs, the composing may fail. Retry when this happens.

Issues: 85
Tested: manually ran BQ to VCF.

@coveralls
Copy link

coveralls commented Oct 17, 2018

Pull Request Test Coverage Report for Build 1384

  • 3 of 26 (11.54%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.2%) to 87.628%

Changes Missing Coverage Covered Lines Changed/Added Lines %
gcp_variant_transforms/libs/vcf_file_composer.py 3 26 11.54%
Files with Coverage Reduction New Missed Lines %
gcp_variant_transforms/libs/vcf_file_composer.py 1 22.73%
Totals Coverage Status
Change from base Build 1381: -0.2%
Covered Lines: 6261
Relevant Lines: 7145

💛 - Coveralls

Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this much more robust! Just a few restructuring comments to make it a bit more readable.

# Cloud Storage allows to compose up to 32 objects.
_MAX_NUM_OF_BLOBS_PER_COMPOSE = 32
_NUMBER_OF_API_CALL_RETRIES = 5
_TIMEOUT = 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/_TIMEOUT/_COMPOSE_TIMEOUT_SECONDS
In general, always include the unit in time variables as this can easily be interpreted as 30ms :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tip! Done.

new_blob_prefix = filesystems.FileSystems.join(blob_prefix, 'composed_')

proc_pool = multiprocessing.Pool()
arguments = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/arguments/blobs_to_compose_args
"arguments" is a very generic name, so it's not clear what it's representing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

arguments.append(
(self._project, self._bucket_name, blob_names, new_blob_name))

retry = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/retry/num_retries (retry usually indicates a boolean variable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

while arguments:
proc_pool = multiprocessing.Pool(processes=8)
results = []
failed_composing_arguments = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please move this to just after proc_pool.close() ... in general, it's best to declare variables as closes to where they're first used (we sometimes don't do this for unit tests as it's more readable to group the 'expected' cases together, but we should try to do this in the actual code).
Also, I think we can rename this to failed_blobs_to_compose_args to match the above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.


# Cloud Storage allows to compose up to 32 objects.
_MAX_NUM_OF_BLOBS_PER_COMPOSE = 32
_NUMBER_OF_API_CALL_RETRIES = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/_NUMBER_OF_API_CALL_RETRIES/_MAX_NUM_COMPOSE_RETRIES

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

'timeout.', argument[2][0], argument[2][-1])
failed_composing_arguments.append(argument)

arguments = failed_composing_arguments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more readable to do:

while num_retries <= _MAX_RETRIES:
  ...
  if failed_blobs_to_compose_args:
    num_retries += 1
    blobs_to_compose_args = failed_blobs_to_compose_args
    logging.warning(...)
  else:
    break
else:
  raise RuntimeError(...)

The control flow is more obvious that way (e.g. the while loop clearly defines that it's based on number of retries rather than having arguments; the failed conditions are also more explicitly stated). Note that it uses pythonic 'while-else' statement.
But lemme know what you think or if you have other ideas!

else:
logging.warning(
'%d jobs of composing of blobs failed due to timeout. Retry for '
'the %d time.', len(arguments), retry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/Retry for ... / Retrying %d of %d times. ... so it becomes, "Retrying 1 of 3 times."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

Copy link
Contributor Author

@allieychen allieychen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Asha for the review. It seems much cleaner! PTAL :)


# Cloud Storage allows to compose up to 32 objects.
_MAX_NUM_OF_BLOBS_PER_COMPOSE = 32
_NUMBER_OF_API_CALL_RETRIES = 5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Cloud Storage allows to compose up to 32 objects.
_MAX_NUM_OF_BLOBS_PER_COMPOSE = 32
_NUMBER_OF_API_CALL_RETRIES = 5
_TIMEOUT = 30
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tip! Done.

new_blob_prefix = filesystems.FileSystems.join(blob_prefix, 'composed_')

proc_pool = multiprocessing.Pool()
arguments = []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

arguments.append(
(self._project, self._bucket_name, blob_names, new_blob_name))

retry = 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

while arguments:
proc_pool = multiprocessing.Pool(processes=8)
results = []
failed_composing_arguments = []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Done.

else:
logging.warning(
'%d jobs of composing of blobs failed due to timeout. Retry for '
'the %d time.', len(arguments), retry)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

Copy link
Contributor

@arostamianfar arostamianfar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! just two nits...

else:
break
else:
raise RuntimeError('Composing of blobs fails after {} '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/fails/failed

num_retries += 1
blobs_to_compose_args = failed_blobs_to_compose_args
logging.warning(
'%d jobs of composing of blobs failed due to timeout. Retrying for '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just "Retrying %d of %d." is cleaner (I think it should be "times" as "time" has a different meaning, but we can just remove it :) ).

@allieychen allieychen merged commit 11f7d44 into googlegenomics:master Oct 17, 2018
@allieychen allieychen deleted the fix branch October 17, 2018 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants