Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary MT files not being found when using run_combiner #11891

Closed
jsarro13 opened this issue Jun 3, 2022 · 1 comment · Fixed by #11962
Closed

Temporary MT files not being found when using run_combiner #11891

jsarro13 opened this issue Jun 3, 2022 · 1 comment · Fixed by #11962

Comments

@jsarro13
Copy link

jsarro13 commented Jun 3, 2022

To report a bug, fill in the information below.
For support and feature requests, please use the discussion forum:
https://discuss.hail.is/

Please include the full Hail version and as much detail as possible.


I am aggregating gVCF files using the hail function run_combiner. I am using the command below

hl.experimental.run_combiner(inputs, out_file=output_file, tmp_path=temp_bucket, branch_factor=105, batch_size=100, reference_genome='GRCh38', use_genome_default_intervals=True)

I am finding that when the phase 1 portion of the functions creates 10 or 100 temporary mt files that the phase 2 portion is not estimating the temporary mt file names correctly. For example, when trying to aggregate 1k gVCF files, 10 temporary mt files are created in phase 1. These are labeled 0.mt -9.mt. Phase 2, however, looks for a two digit file name i.e. 00.mt. I have found the same thing occur when phase 1 creates 100 temporary mt files when aggregating 10k gVCF files. 100 mt files labeled 00.mt -99.mt are created, but phase2 looks for a three digit name 000.mt. I am assuming this is because there are 10 files and 100 files respectively. I do not see this issue when for example 11 temp mt files are created as 00.mt-10.mt.

An example of the error this throws is

Hail version: 0.2.81-edeb70bc789c
Error summary: HailException: No file or directory found at gs://<path>//combiner-temporary/040b0721-5359-430d-9fe9-019f7eb263f8/_phase1_job1/00.mt

is there a flag that I can add that will circumvent this?

Thank you.

chrisvittal added a commit to chrisvittal/hail that referenced this issue Jun 27, 2022
We were not properly handling cases where there were exactly 10 or 100
files to write. We would handle all other cases correctly.

Closes hail-is#11891
@chrisvittal
Copy link
Collaborator

Thank you for reporting this. I've fixed the issue. You could lower the batch size to 50, and that will avoid this problem.

danking pushed a commit that referenced this issue Jun 27, 2022
We were not properly handling cases where there were exactly 10 or 100
files to write. We would handle all other cases correctly.

Closes #11891
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants