Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing bubbles file during polishing #584

Closed
GuillaumeHolley opened this issue Mar 28, 2023 · 8 comments
Closed

Error parsing bubbles file during polishing #584

GuillaumeHolley opened this issue Mar 28, 2023 · 8 comments

Comments

@GuillaumeHolley
Copy link

Hi,

I get the following error during the polishing step:

flye --nano-hq lr.fastq.gz -o flye.hap -t 64 --keep-haplotypes --scaffold --asm-coverage 50 --genome-size 3.1g --resume

[2023-03-27 09:54:55] INFO: Starting Flye 2.9.1-b1780
[2023-03-27 09:54:55] INFO: Resuming previous run
[2023-03-27 09:54:55] INFO: >>>STAGE: polishing
[2023-03-27 09:54:55] INFO: Polishing genome (1/1)
[2023-03-27 09:54:55] INFO: Running minimap2
[2023-03-27 14:48:58] INFO: Separating alignment into bubbles
[2023-03-27 14:50:07] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:08] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:08] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:08] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:08] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:08] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:08] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:08] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:08] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:08] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:09] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:09] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:09] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:09] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:09] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:09] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:09] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:09] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:09] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:10] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:10] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:10] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:10] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:10] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:10] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:10] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:10] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:10] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:10] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:11] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:11] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:11] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:11] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:11] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:11] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:12] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:12] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:12] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:12] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:13] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:13] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:13] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:14] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:14] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:14] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:15] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:15] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:15] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:15] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:15] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:17] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:17] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:17] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:18] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:19] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:19] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:20] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:21] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:23] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:24] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:26] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:29] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:30] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 14:50:30] WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
[2023-03-27 16:35:16] INFO: Alignment error rate: 0.002858
[2023-03-27 16:35:16] INFO: Correcting bubbles
0% 10% 20% terminate called after throwing an instance of 'std::runtime_error'
  what():  Error parsing bubbles file
[2023-03-27 18:09:31] ERROR: Command '['flye-modules', 'polisher', '--bubbles', 'flye.hap/40-polishing/bubbles_1.fasta', '--subs-mat', 'flye/config/bin_cfg/nano_r94_substitutions.mat', '--hopo-mat', 'flye/config/bin_cfg/nano_r94_g36_homopolymers.mat', '--out', 'flye.hap/40-polishing/consensus_1.fasta', '--threads', '64']' died with <Signals.SIGABRT: 6>.
[2023-03-27 18:09:31] ERROR: Pipeline aborted

The error happened a first time on a machine with 48 threads and 386G of RAM attributed to the job. I resumed it from the last completed stage (see log above) on a machine with 64 threads and 480G of RAM given to the job but I got the same error. I've run the exact same command many times on similar data and never got this issue before. When inspecting the 40-polishing sub-directory, the minimap2 log shows no issue. The bubble file is quite large, 358GB.

Thank you for your help,
Guillaume

@mikolmogorov
Copy link
Owner

This may have something to do with the warning in the log. Flye does not expect non-ACGT fasta characters in general and may not handle them right. Do you think you may have those in your reads? Did you get this warning earlier in the run as well? If you don't expect this characters, but see the warning, this may be a fasta/q formatting error. Flye parser is supposed to catch those, but maybe not in 100% cases. Otherwise, I've never seen this error in a released version yet.

@GuillaumeHolley
Copy link
Author

GuillaumeHolley commented Mar 29, 2023

Probably I should have mentioned this earlier but my input long reads are Illumina-corrected ONT (R9.4) reads. I don't use the (--nano-corr preset as it was giving me much worse results than --nano-hq). That being said, because they are Illumina corrected, they do contain non-ACGT characters. So I had this warning too on the first run too. I have run Flye on similar data for about 40+ other genomes and none had this issue :(

@GuillaumeHolley
Copy link
Author

I just tried Flye 2.9.2 and got exactly the same error at the same location.

@mikolmogorov
Copy link
Owner

@GuillaumeHolley thanks. I will need an input example that reproduces the problem to fix it.. Can you come up with a bam file that could be provided as input to --polish-target and results into the crash? You mentioned that the input is quite large, so it will be helpful if you could narrow it down to a smaller example. E.g., you can split bam into two equal-ish parts, keep the part that still give you an error, etc..

Misha

@mikolmogorov
Copy link
Owner

Just wanted to check if you were able to fix the problem?

@GuillaumeHolley
Copy link
Author

Hi @fenderglass,

Unfortunately not. The problem is that even though the issue has occurred on multiple occasions, it is still a rare occurrence. Any attempt to reproduce the issue using a smaller input resulted in Flye finishing the job as expected.

@mikolmogorov
Copy link
Owner

Very strange.. But the whole run (that is problematic) is failing consistently? Could you be that you have a disk data corruption somewhere?

@mikolmogorov
Copy link
Owner

Closing the thread because of inactivity. If the issue is still unresolved, feel free to reopen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants