-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel exports gcloud compose [prototype] #639
base: main
Are you sure you want to change the base?
Conversation
I looked at using |
I've mitigated this with a rolling merge - start off with the list of all fragments, condense that in blocks of 30, then keep condensing the list until you end up with a single product |
# rolling squash of the chunks, should enable infinite-ish scaling | ||
temp_chunk_prefix_num = 1 | ||
condense_job = None | ||
while len(chunk_paths) > 1: | ||
condense_temp = join(temp, f'temp_chunk_{temp_chunk_prefix_num}') | ||
temp_chunk_prefix_num += 1 | ||
chunk_paths, condense_job = compose_condense_fragments( | ||
chunk_paths, | ||
condense_temp, | ||
depends=condense_job | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaces #629
This upgrades the previous solution by incorporating the
gcloud storage objects compose
functionality to concatenate the separate fragments in GCP without needing to localise first.For what it's worth, this same logic could be used to replace the current 'gather VCFs from fragments' here (so long as the files are Block-GZipped, which they are)