Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flye does not generate any output ("No disjointigs were assembled" message) #128

Open
StefanoLonardi opened this issue Jun 24, 2019 · 90 comments

Comments

@StefanoLonardi
Copy link

StefanoLonardi commented Jun 24, 2019

I have been trying to assemble a 10Mb genome with uncorrected nanopore data (3-4 chromosomes expected). We have a lot of data, is that the reason Flye fails at the end?

[2019-06-22 11:00:05] INFO: >>>STAGE: configure
[2019-06-22 11:00:05] INFO: Configuring run
[2019-06-22 11:00:27] INFO: Total read length: 10964270213
[2019-06-22 11:00:27] INFO: Input genome size: 10000000
[2019-06-22 11:00:27] INFO: Estimated coverage: 1096
[2019-06-22 11:00:27] WARNING: Expected read coverage is 1096, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly?
[2019-06-22 11:00:27] INFO: Reads N50/N90: 29675 / 9753
[2019-06-22 11:00:27] INFO: Minimum overlap set to 5000
[2019-06-22 11:00:27] INFO: Selected k-mer size: 15
[2019-06-22 11:00:27] INFO: >>>STAGE: assembly
[2019-06-22 11:00:27] INFO: Assembling disjointigs
[2019-06-22 11:00:27] INFO: Reading sequences
[2019-06-22 11:01:01] INFO: Generating solid k-mer index
[2019-06-22 11:01:17] INFO: Counting k-mers (1/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2019-06-22 11:02:49] INFO: Counting k-mers (2/2):
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2019-06-22 11:08:39] INFO: Filling index table
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2019-06-22 11:13:50] INFO: Extending reads
[2019-06-22 12:54:29] INFO: Overlap-based coverage: 1177
[2019-06-22 12:54:29] INFO: Median overlap divergence: 0.119637
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2019-06-23 17:20:11] INFO: Assembled 0 disjointigs
[2019-06-23 17:20:23] INFO: Generating sequence
[2019-06-23 17:22:11] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

flye --nano-raw one.fastq --out-dir flye --genome-size 10m --threads 20

@mikolmogorov
Copy link
Owner

Interesting, looks like indeed a lot of overlaps were found, but no disjointigs were assembled. Is it possible to send me the full flye.log? I also suggest to try --meta mode - it is more robust to solid k-mer selection in case there is any contamination / instrumental artificial sequence.

@StefanoLonardi
Copy link
Author

[2019-06-22 11:00:05] root: INFO: Starting Flye 2.4.2-release
[2019-06-22 11:00:05] root: DEBUG: Cmd: /home/stelo/miniconda2/bin/flye --nano-raw Bduncani_06182019_pass.fastq --out-dir babesia_flye --genome-size
10m --threads 20
[2019-06-22 11:00:05] root: INFO: >>>STAGE: configure
[2019-06-22 11:00:05] root: INFO: Configuring run
[2019-06-22 11:00:27] root: INFO: Total read length: 10964270213
[2019-06-22 11:00:27] root: INFO: Input genome size: 10000000
[2019-06-22 11:00:27] root: INFO: Estimated coverage: 1096
[2019-06-22 11:00:27] root: WARNING: Expected read coverage is 1096, the assembly is not guaranteed to be optimal in this setting. Are you sure that
the genome size was entered correctly?
[2019-06-22 11:00:27] root: INFO: Reads N50/N90: 29675 / 9753
[2019-06-22 11:00:27] root: INFO: Minimum overlap set to 5000
[2019-06-22 11:00:27] root: INFO: Selected k-mer size: 15
[2019-06-22 11:00:27] root: INFO: >>>STAGE: assembly
[2019-06-22 11:00:27] root: INFO: Assembling disjointigs
[2019-06-22 11:00:27] root: DEBUG: -----Begin assembly log------
[2019-06-22 11:00:27] root: DEBUG: Running: flye-assemble -l /24-2/home/stelo/babesia/babesia_flye/flye.log -t 20 -v 5000 -k 15 Bduncani_06182019_pas
s.fastq /24-2/home/stelo/babesia/babesia_flye/00-assembly/draft_assembly.fasta 10000000 /home/stelo/miniconda2/lib/python2.7/site-packages/flye/confi
g/bin_cfg/asm_raw_reads.cfg
[2019-06-22 11:00:27] DEBUG: Build date: Apr 7 2019 02:34:37
[2019-06-22 11:00:27] DEBUG: Total RAM: 251 Gb
[2019-06-22 11:00:27] DEBUG: Available RAM: 245 Gb
[2019-06-22 11:00:27] DEBUG: Total CPUs: 40
[2019-06-22 11:00:27] DEBUG: Parameters:
[2019-06-22 11:00:27] DEBUG: big_genome_threshold=29000000
[2019-06-22 11:00:27] DEBUG: low_cutoff_warning=1
[2019-06-22 11:00:27] DEBUG: hard_min_coverage_rate=10
[2019-06-22 11:00:27] DEBUG: assemble_kmer_sample=1
[2019-06-22 11:00:27] DEBUG: repeat_graph_kmer_sample=1
[2019-06-22 11:00:27] DEBUG: read_align_kmer_sample=1
[2019-06-22 11:00:27] DEBUG: maximum_jump=1500
[2019-06-22 11:00:27] DEBUG: maximum_overhang=1500
[2019-06-22 11:00:27] DEBUG: repeat_kmer_rate=100
[2019-06-22 11:00:27] DEBUG: assemble_ovlp_divergence=0.30
[2019-06-22 11:00:27] DEBUG: repeat_graph_ovlp_divergence=0.15
[2019-06-22 11:00:27] DEBUG: repeat_graph_ovlp_end_adjust=0.00
[2019-06-22 11:00:27] DEBUG: read_align_ovlp_divergence=0.25
[2019-06-22 11:00:27] DEBUG: max_coverage_drop_rate=5
[2019-06-22 11:00:27] DEBUG: chimera_window=100
[2019-06-22 11:00:27] DEBUG: min_reads_in_disjointig=4
[2019-06-22 11:00:27] DEBUG: max_inner_reads=10
[2019-06-22 11:00:27] DEBUG: max_inner_fraction=0.25
[2019-06-22 11:00:27] DEBUG: add_unassembled_reads=0
[2019-06-22 11:00:27] DEBUG: max_separation=500
[2019-06-22 11:00:27] DEBUG: tip_length_threshold=100000
[2019-06-22 11:00:27] DEBUG: unique_edge_length=50000
[2019-06-22 11:00:27] DEBUG: min_repeat_res_support=0.51
[2019-06-22 11:00:27] DEBUG: out_paths_ratio=5
[2019-06-22 11:00:27] DEBUG: graph_cov_drop_rate=10
[2019-06-22 11:00:27] DEBUG: coverage_estimate_window=100
[2019-06-22 11:00:27] DEBUG: extend_contigs_with_repeats=1
[2019-06-22 11:00:27] DEBUG: Running with k-mer size: 15
[2019-06-22 11:00:27] DEBUG: Running with minimum overlap 5000
[2019-06-22 11:00:27] DEBUG: Metagenome mode: N
[2019-06-22 11:00:27] INFO: Reading sequences
[2019-06-22 11:01:01] DEBUG: Building positional index
[2019-06-22 11:01:01] DEBUG: Total sequence: 10964270213 bp
[2019-06-22 11:01:01] DEBUG: Expected read coverage: 1096
[2019-06-22 11:01:01] INFO: Generating solid k-mer index
[2019-06-22 11:01:01] DEBUG: Hard threshold set to 5
[2019-06-22 11:01:01] DEBUG: Started k-mer counting
[2019-06-22 11:01:17] INFO: Counting k-mers (1/2):
[2019-06-22 11:02:49] INFO: Counting k-mers (2/2):
[2019-06-22 11:08:39] DEBUG: Estimated minimum kmer coverage: 155
[2019-06-22 11:08:39] DEBUG: Filtered 301351751 erroneous k-mers
[2019-06-22 11:08:39] DEBUG: Repetitive k-mer frequency: 55681
[2019-06-22 11:08:39] DEBUG: Filtered 897 repetitive k-mers (8.98678e-05)
[2019-06-22 11:08:39] INFO: Filling index table
[2019-06-22 11:08:44] DEBUG: Sampling rate: 1
[2019-06-22 11:08:44] DEBUG: Solid k-mers: 9980428
[2019-06-22 11:08:44] DEBUG: K-mer index size: 5380562281
[2019-06-22 11:08:44] DEBUG: Mean k-mer frequency: 539.111
[2019-06-22 11:12:31] DEBUG: Sorting k-mer index
[2019-06-22 11:13:50] DEBUG: Peak RAM usage: 28 Gb
[2019-06-22 11:13:50] INFO: Extending reads
[2019-06-22 11:13:50] DEBUG: Estimating overlap coverage
[2019-06-22 12:54:29] INFO: Overlap-based coverage: 1177
[2019-06-22 12:54:29] INFO: Median overlap divergence: 0.119637
[2019-06-22 12:54:29] DEBUG: Sequence divergence distribution:

|                      *
|                      *
|                    * *
|                   ** **
|                   *****
|                   ******
|                   ********
|                   ********
|                  *********
|                  *********
|                  ***********
|                 ************
|                 ************* *
|                 ************* *
|                 ************* *
|                *****************  *
|                *********************
|                **********************
|               *************************
|             **************************************** * *     ** *
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%

Q25 = 0.1, Q50 = 0.12, Q75 = 0.14

[2019-06-23 17:20:11] INFO: Assembled 0 disjointigs
[2019-06-23 17:20:23] INFO: Generating sequence
[2019-06-23 17:20:23] DEBUG: Writing FASTA
[2019-06-23 17:20:23] DEBUG: Peak RAM usage: 78 Gb
-----------End assembly log------------
[2019-06-23 17:22:11] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

@mikolmogorov
Copy link
Owner

Thank you, indeed looks strange. Maybe high coverage confuses Flye, but I also suspect there might be some non-target reads in the sample.

I suggest to try two more runs (i) metagenome mode (ii) normal mode with --asm-coverage 50 to use the longest 50x reads for disjointig assembly. Please post the corresponding logs as well.

@StefanoLonardi
Copy link
Author

StefanoLonardi commented Jul 18, 2019

I just finished running Flye using the two runs that you suggest. Both of them completed, but the assembly with ''--asm-coverage 50'' seems better (in terms of N50, total size, etc.)
Thank you

@mikolmogorov
Copy link
Owner

Glad that it helped!

@dgiguer
Copy link

dgiguer commented Nov 14, 2019

The solution of normal mode with --asm-coverage 50 has helped in a similar case where lots of overlap is found but no disjointigs are assembled for a plasmid!

@ptrebert
Copy link

@fenderglass
Could you please take a quick look at the log output for the sample where flye fails to assemble disjointigs:
gist.github.com/ptrebert/3964d66cd60af3e7a19d95d166707ed2

Since I am running flye with --asm-coverage 50 by default, I am a bit unsure how to proceed with this sample.

@mikolmogorov
Copy link
Owner

@ptrebert Seems strange. My only guess would be that PacBio reads might not be properly split into subreads (we had a couple cases like that before). Try to process the reads with https://github.com/fenderglass/pbclip - it should tell you if there is a significant amount of "chimeric" subreads.

Alternatively, you can also try to run with --meta option if the reads turn out ok.

@mikolmogorov mikolmogorov reopened this Feb 13, 2020
@mikolmogorov mikolmogorov changed the title No disjointigs were assembled Flye does not generate any output ("No disjointigs were assembled" message) Feb 13, 2020
@ptrebert
Copy link

@fenderglass
Ok, thanks for pointing out your tool, I'll check that and get back to you.

@ptrebert
Copy link

ping: testing Flye 2.7b-b1562 on sample with no disjointigs assembled - still running...

@ptrebert
Copy link

@fenderglass
For my problematic sample, flye 2.7b did not solve the issue (same "no disjointigs assembled"). I followed your suggestion and used your pbclip tool, which finished and reported the following:

Good: 15725667 chopped: 409754 bad: 662955

Could you help with interpreting these numbers (I may want to get in touch with the seq lab about this sample)? I'll try to assemble to output FASTA now with flye v2.7b, let's see what happens.

@mikolmogorov
Copy link
Owner

@ptrebert

pbclip finds PacBio reads that were not properly split into subreads. Depending on the DNA library, polymerase might make multiple passes over the fragment (which is used to produce high quality CCS reads). However, fragments in CLR libraries (at least from the assembly perspective) are not expected to be read multiple times to produce longer reads. When multiple passes does happen, such reads should be split into subreads (each subread is a single polymerase pass). Typically this is handled by the PacBio software at the FASTQ generation stage.

The numbers suggest that ~40% of your reads have multiple polymerase passes. This is a lot (typical value could be 1-2%) and suggests that there is indeed an issue with subread splitting. The number of chopped reads are those reads that pbclip was able to split into parts successfully. The bad reads are the reads with the same pattern that pbclip was not able to recover.

Feel free to run the latest Flye version on the output produced by pbclip - I think it it should work now. You can also double check with the lab if they performed subread splitting or have raw PacBio files to regenerate valid Fastqs.

@ptrebert
Copy link

@fenderglass
Thanks a lot for your detailed explanation. I am not sure, however, I can follow your argument about the 40% "bad" reads:
Total: 16798376
Bad = chopped + bad = 409754 + 662955 = 1072709
% bad = 1072709 / 16798376 ~6.4%
Am I missing something, or did you just misread the "bad" number as 6 million instead of 600k?
In either case, thanks again for all your input, that is very valuable. I'll update this issue as soon as I have the 2.7b results for the corrected reads.

@ptrebert
Copy link

ptrebert commented Mar 2, 2020

probably last comment regarding this: even with the corrected reads (FASTA input now), flye 2.7b fails to assemble disjointigs. Seems like there is something else off about this data...

@mikolmogorov
Copy link
Owner

@ptrebert I see - this could be tricky sometimes. Did you have any luck with other assemblers? Wtdbg2 might be a fast way to check.

@ptrebert
Copy link

ptrebert commented Mar 3, 2020

@fenderglass If I find the time, I'll try another assembler. For now, I asked the sequencing centre to double-check everything about this particular sample, let's see if they find something...

@ptrebert
Copy link

@fenderglass
A postdoc in the sequencing center that produced the problematic data in the first place ran a couple of tests with different input combinations, and also with wtdbg2 as a comparison. Since none of those test runs produced an assembly, it seems fairly clear that the problem is the data. Just out of curiosity, since we have all the flye logs for the different runs, is there any statistic in those log files that could tell us anything about the problem(s) in the data? To me, they all look pretty similar (well, they all failed), so just being thorough here...

@mikolmogorov
Copy link
Owner

@ptrebert good to know, thanks for the update! At this early stage of assembly, not much could be inferred from the logs, I think.. I guess it the log shows that "Overlap-based coverage" is reasonable (let's say, >10), but no disjointigs are produced, then there is a problem somewhere.

@ptrebert
Copy link

No, they all show a zero for the "overlap-based coverage". Whatever the problem is, it's in the data then... thanks for all your support!

@vappiah
Copy link

vappiah commented May 21, 2020

Hello All, I am working an Mycobacterium ulcerans genome which was sequenced with oxford nanopore technology. I am trying to do denovo assembly with flye but I run into a warning and the pipeline stops . The command I used is
flye --nano-raw filename.fa -o outdir -g 0.05m -t 34 -i 2

I get this message below

WARNING: Expected read coverage is 4744, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly?
Pipeline aborted

@mikolmogorov
Copy link
Owner

@jotes35 your expected genome size is 50kb (0.05 Mb). It needs to be "5m", not "0.05m" (assuming you are aiming for 5 Mb genome).

@vappiah
Copy link

vappiah commented May 23, 2020

Please is there a way to know the expected genome size before hand?

@vappiah
Copy link

vappiah commented May 26, 2020

@fenderglass is there a way to know the expected genome size before starting the assembly?

@mikolmogorov
Copy link
Owner

@jotes35 Please check the FAQ - it provides some answers to your question. Let me know if anything us unclear.

@eyayd
Copy link

eyayd commented Jun 3, 2020

Hello, I have the same problem "No disjointigs were assembled". Expected genome is 110M and my expected coverage is about 49, I tried --meta and different --asm-coverage (since my over all coverage is smaller than 50x) but it didn't solve the issue. My N50 is quite high, would that be the reason I am getting the error?
P40.pdf

@mikolmogorov
Copy link
Owner

@rajithadp If it is old PacBio data, I would try pbclip as described in the manual. Otherwise, I don't have any suggestions beyond that. I would also try other assemblers in case it is a specific Flye issue.

@matteo1313
Copy link

Hello @fenderglass

Thank you for the amazing software <3

I am getting a similar issue. Tried to --meta and -g 1.6m --asm-coverage 50 and still no outputs. I have attached the log for both of the attempts listed above in this chat. What do you recommend? I am dealing with a bacteria dataset that I just ran through the latest version of Guppy. That is why I am using --nano-hq. Although I have run each --nano option and still hasn't run.

[2022-09-09 13:56:54] root: INFO: Starting Flye 2.9.1-b1780
[2022-09-09 13:56:54] root: DEBUG: Cmd: /usr/local/bin/flye --asm-coverage 50 --genome-size 1.7m --nano-hq /home/matteo/datasets/Lg/pass/lg_longest_10.fastq --out-dir /home/matteo/datasets/Lg/Flye
[2022-09-09 13:56:54] root: DEBUG: Python version: 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0]
[2022-09-09 13:56:54] root: INFO: >>>STAGE: configure
[2022-09-09 13:56:54] root: INFO: Configuring run
[2022-09-09 13:56:54] root: INFO: Total read length: 858801
[2022-09-09 13:56:54] root: INFO: Input genome size: 1700000
[2022-09-09 13:56:54] root: INFO: Estimated coverage: 0
[2022-09-09 13:56:54] root: WARNING: Expected read coverage is 0, the assembly is not guaranteed to be optimal in this setting. Are you sure that the genome size was entered correctly?
[2022-09-09 13:56:54] root: INFO: Reads N50/N90: 77551 / 71344
[2022-09-09 13:56:54] root: INFO: Minimum overlap set to 10000
[2022-09-09 13:56:54] root: INFO: >>>STAGE: assembly
[2022-09-09 13:56:54] root: INFO: Assembling disjointigs
[2022-09-09 13:56:54] root: DEBUG: -----Begin assembly log------
[2022-09-09 13:56:54] root: DEBUG: Running: flye-modules assemble --reads /home/matteo/datasets/Lg/pass/lg_longest_10.fastq --out-asm /home/matteo/datasets/Lg/Flye/00-assembly/draft_assembly.fasta --config /usr/local/lib/python3.10/dist-packages/flye/config/bin_cfg/asm_nano_hq.cfg --log /home/matteo/datasets/Lg/Flye/flye.log --threads 1 --genome-size 1700000 --min-ovlp 10000
[2022-09-09 13:56:54] DEBUG: Build date: Aug 17 2022 12:31:00
[2022-09-09 13:56:54] DEBUG: Total RAM: 31 Gb
[2022-09-09 13:56:54] DEBUG: Available RAM: 27 Gb
[2022-09-09 13:56:54] DEBUG: Total CPUs: 8
[2022-09-09 13:56:54] DEBUG: Loading /usr/local/lib/python3.10/dist-packages/flye/config/bin_cfg/asm_nano_hq.cfg
[2022-09-09 13:56:54] DEBUG: Loading /usr/local/lib/python3.10/dist-packages/flye/config/bin_cfg/asm_defaults.cfg
[2022-09-09 13:56:54] DEBUG: big_genome_threshold=29000000
[2022-09-09 13:56:54] DEBUG: meta_read_filter_kmer_freq=100
[2022-09-09 13:56:54] DEBUG: chain_large_gap_penalty=2
[2022-09-09 13:56:54] DEBUG: chain_small_gap_penalty=0.5
[2022-09-09 13:56:54] DEBUG: chain_gap_jump_threshold=100
[2022-09-09 13:56:54] DEBUG: max_coverage_drop_rate=5
[2022-09-09 13:56:54] DEBUG: max_extensions_drop_rate=5
[2022-09-09 13:56:54] DEBUG: chimera_window=100
[2022-09-09 13:56:54] DEBUG: chimera_overhang=1000
[2022-09-09 13:56:54] DEBUG: min_reads_in_disjointig=4
[2022-09-09 13:56:54] DEBUG: max_inner_reads=10
[2022-09-09 13:56:54] DEBUG: max_inner_fraction=0.25
[2022-09-09 13:56:54] DEBUG: max_separation=500
[2022-09-09 13:56:54] DEBUG: unique_edge_length=50000
[2022-09-09 13:56:54] DEBUG: min_repeat_res_support=0.51
[2022-09-09 13:56:54] DEBUG: out_paths_ratio=5
[2022-09-09 13:56:54] DEBUG: graph_cov_drop_rate=5
[2022-09-09 13:56:54] DEBUG: coverage_estimate_window=100
[2022-09-09 13:56:54] DEBUG: max_bubble_length=50000
[2022-09-09 13:56:54] DEBUG: loop_coverage_rate=1.5
[2022-09-09 13:56:54] DEBUG: repeat_edge_cov_mult=1.75
[2022-09-09 13:56:54] DEBUG: weak_detach_rate=5
[2022-09-09 13:56:54] DEBUG: tip_coverage_rate=2
[2022-09-09 13:56:54] DEBUG: tip_length_rate=2
[2022-09-09 13:56:54] DEBUG: output_gfa_before_rr=0
[2022-09-09 13:56:54] DEBUG: remove_alt_edges=0
[2022-09-09 13:56:54] DEBUG: low_cutoff_warning=0
[2022-09-09 13:56:54] DEBUG: kmer_size=17
[2022-09-09 13:56:54] DEBUG: use_minimizers=1
[2022-09-09 13:56:54] DEBUG: minimizer_window=5
[2022-09-09 13:56:54] DEBUG: reads_base_alignment=1
[2022-09-09 13:56:54] DEBUG: meta_read_top_kmer_rate=0.75
[2022-09-09 13:56:54] DEBUG: maximum_jump=1500
[2022-09-09 13:56:54] DEBUG: maximum_overhang=1500
[2022-09-09 13:56:54] DEBUG: repeat_kmer_rate=100
[2022-09-09 13:56:54] DEBUG: assemble_ovlp_divergence=0.05
[2022-09-09 13:56:54] DEBUG: assemble_divergence_relative=1
[2022-09-09 13:56:54] DEBUG: repeat_graph_ovlp_divergence=0.05
[2022-09-09 13:56:54] DEBUG: read_align_ovlp_divergence=0.10
[2022-09-09 13:56:54] DEBUG: hpc_scoring_on=1
[2022-09-09 13:56:54] DEBUG: add_unassembled_reads=0
[2022-09-09 13:56:54] DEBUG: extend_contigs_with_repeats=0
[2022-09-09 13:56:54] DEBUG: min_read_cov_cutoff=3
[2022-09-09 13:56:54] DEBUG: short_tip_length=20000
[2022-09-09 13:56:54] DEBUG: long_tip_length=100000
[2022-09-09 13:56:54] DEBUG: Running with k-mer size: 17
[2022-09-09 13:56:54] DEBUG: Running with minimum overlap 10000
[2022-09-09 13:56:54] DEBUG: Metagenome mode: N
[2022-09-09 13:56:54] DEBUG: Short mode: N
[2022-09-09 13:56:54] INFO: Reading sequences
[2022-09-09 13:56:54] DEBUG: Building positional index
[2022-09-09 13:56:54] DEBUG: Total sequence: 858801 bp
[2022-09-09 13:56:54] INFO: Building minimizer index
[2022-09-09 13:56:54] INFO: Pre-calculating index storage
[2022-09-09 13:56:54] DEBUG: Mean k-mer frequency: 1.08596
[2022-09-09 13:56:54] DEBUG: Repetitive k-mer frequency: 108
[2022-09-09 13:56:54] DEBUG: Filtered 0 repetitive k-mers (0)
[2022-09-09 13:56:54] INFO: Filling index
[2022-09-09 13:56:54] DEBUG: Sorting k-mer index
[2022-09-09 13:56:54] DEBUG: Selected k-mers: 263260
[2022-09-09 13:56:54] DEBUG: K-mer index size: 285891
[2022-09-09 13:56:54] DEBUG: Mean k-mer frequency: 1.08596
[2022-09-09 13:56:54] DEBUG: Minimizer rate: 3.00395
[2022-09-09 13:56:54] DEBUG: Peak RAM usage: 0 Gb
[2022-09-09 13:56:54] DEBUG: Estimating k-mer identity bias
[2022-09-09 13:57:16] DEBUG: Initial divergence estimate : 0.0481303
[2022-09-09 13:57:16] DEBUG: Relative threshold: Y
[2022-09-09 13:57:16] DEBUG: Max divergence threshold set to 0.0981303
[2022-09-09 13:57:16] INFO: Extending reads
[2022-09-09 13:57:16] DEBUG: Estimating overlap coverage
[2022-09-09 13:57:16] INFO: Overlap-based coverage: 0
[2022-09-09 13:57:16] INFO: Median overlap divergence: 0.0481303
[2022-09-09 13:57:16] DEBUG: Sequence divergence distribution:

|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0.048, Q50 = 0.048, Q75 = 0.048

[2022-09-09 13:57:16] INFO: Assembled 0 disjointigs
[2022-09-09 13:57:16] INFO: Generating sequence
[2022-09-09 13:57:16] DEBUG: Building positional index
[2022-09-09 13:57:17] DEBUG: Mean k-mer frequency: 0
[2022-09-09 13:57:17] DEBUG: Repetitive k-mer frequency: 0
[2022-09-09 13:57:17] DEBUG: Filtered 0 repetitive k-mers (-nan)
[2022-09-09 13:57:17] DEBUG: Sorting k-mer index
[2022-09-09 13:57:17] DEBUG: Selected k-mers: 0
[2022-09-09 13:57:17] DEBUG: K-mer index size: 0
[2022-09-09 13:57:17] DEBUG: Mean k-mer frequency: -nan
[2022-09-09 13:57:17] DEBUG: Minimizer rate: -nan
[2022-09-09 13:57:17] INFO: Filtering contained disjointigs
[2022-09-09 13:57:17] DEBUG: Computing transitive closure for overlaps
[2022-09-09 13:57:17] DEBUG: Found 0 overlaps
[2022-09-09 13:57:17] DEBUG: Left 0 overlaps after filtering
[2022-09-09 13:57:17] INFO: Contained seqs: 0
[2022-09-09 13:57:17] DEBUG: Writing FASTA
[2022-09-09 13:57:17] DEBUG: Peak RAM usage: 0 Gb
-----------End assembly log------------
[2022-09-09 13:57:17] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
[2022-09-09 13:57:17] root: ERROR: Pipeline aborted
[2022-09-09 13:57:50] root: INFO: Starting Flye 2.9.1-b1780
[2022-09-09 13:57:50] root: DEBUG: Cmd: /usr/local/bin/flye --meta --nano-hq /home/matteo/datasets/Lg/pass/lg_longest_10.fastq --out-dir /home/matteo/datasets/Lg/Flye
[2022-09-09 13:57:50] root: DEBUG: Python version: 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0]
[2022-09-09 13:57:50] root: INFO: >>>STAGE: configure
[2022-09-09 13:57:50] root: INFO: Configuring run
[2022-09-09 13:57:50] root: INFO: Total read length: 858801
[2022-09-09 13:57:50] root: INFO: Reads N50/N90: 77551 / 71344
[2022-09-09 13:57:50] root: INFO: Minimum overlap set to 10000
[2022-09-09 13:57:50] root: INFO: >>>STAGE: assembly
[2022-09-09 13:57:50] root: INFO: Assembling disjointigs
[2022-09-09 13:57:50] root: DEBUG: -----Begin assembly log------
[2022-09-09 13:57:50] root: DEBUG: Running: flye-modules assemble --reads /home/matteo/datasets/Lg/pass/lg_longest_10.fastq --out-asm /home/matteo/datasets/Lg/Flye/00-assembly/draft_assembly.fasta --config /usr/local/lib/python3.10/dist-packages/flye/config/bin_cfg/asm_nano_hq.cfg --log /home/matteo/datasets/Lg/Flye/flye.log --threads 1 --meta --min-ovlp 10000
[2022-09-09 13:57:50] DEBUG: Build date: Aug 17 2022 12:31:00
[2022-09-09 13:57:50] DEBUG: Total RAM: 31 Gb
[2022-09-09 13:57:50] DEBUG: Available RAM: 27 Gb
[2022-09-09 13:57:50] DEBUG: Total CPUs: 8
[2022-09-09 13:57:50] DEBUG: Loading /usr/local/lib/python3.10/dist-packages/flye/config/bin_cfg/asm_nano_hq.cfg
[2022-09-09 13:57:50] DEBUG: Loading /usr/local/lib/python3.10/dist-packages/flye/config/bin_cfg/asm_defaults.cfg
[2022-09-09 13:57:50] DEBUG: big_genome_threshold=29000000
[2022-09-09 13:57:50] DEBUG: meta_read_filter_kmer_freq=100
[2022-09-09 13:57:50] DEBUG: chain_large_gap_penalty=2
[2022-09-09 13:57:50] DEBUG: chain_small_gap_penalty=0.5
[2022-09-09 13:57:50] DEBUG: chain_gap_jump_threshold=100
[2022-09-09 13:57:50] DEBUG: max_coverage_drop_rate=5
[2022-09-09 13:57:50] DEBUG: max_extensions_drop_rate=5
[2022-09-09 13:57:50] DEBUG: chimera_window=100
[2022-09-09 13:57:50] DEBUG: chimera_overhang=1000
[2022-09-09 13:57:50] DEBUG: min_reads_in_disjointig=4
[2022-09-09 13:57:50] DEBUG: max_inner_reads=10
[2022-09-09 13:57:50] DEBUG: max_inner_fraction=0.25
[2022-09-09 13:57:50] DEBUG: max_separation=500
[2022-09-09 13:57:50] DEBUG: unique_edge_length=50000
[2022-09-09 13:57:50] DEBUG: min_repeat_res_support=0.51
[2022-09-09 13:57:50] DEBUG: out_paths_ratio=5
[2022-09-09 13:57:50] DEBUG: graph_cov_drop_rate=5
[2022-09-09 13:57:50] DEBUG: coverage_estimate_window=100
[2022-09-09 13:57:50] DEBUG: max_bubble_length=50000
[2022-09-09 13:57:50] DEBUG: loop_coverage_rate=1.5
[2022-09-09 13:57:50] DEBUG: repeat_edge_cov_mult=1.75
[2022-09-09 13:57:50] DEBUG: weak_detach_rate=5
[2022-09-09 13:57:50] DEBUG: tip_coverage_rate=2
[2022-09-09 13:57:50] DEBUG: tip_length_rate=2
[2022-09-09 13:57:50] DEBUG: output_gfa_before_rr=0
[2022-09-09 13:57:50] DEBUG: remove_alt_edges=0
[2022-09-09 13:57:50] DEBUG: low_cutoff_warning=0
[2022-09-09 13:57:50] DEBUG: kmer_size=17
[2022-09-09 13:57:50] DEBUG: use_minimizers=1
[2022-09-09 13:57:50] DEBUG: minimizer_window=5
[2022-09-09 13:57:50] DEBUG: reads_base_alignment=1
[2022-09-09 13:57:50] DEBUG: meta_read_top_kmer_rate=0.75
[2022-09-09 13:57:50] DEBUG: maximum_jump=1500
[2022-09-09 13:57:50] DEBUG: maximum_overhang=1500
[2022-09-09 13:57:50] DEBUG: repeat_kmer_rate=100
[2022-09-09 13:57:50] DEBUG: assemble_ovlp_divergence=0.05
[2022-09-09 13:57:50] DEBUG: assemble_divergence_relative=1
[2022-09-09 13:57:50] DEBUG: repeat_graph_ovlp_divergence=0.05
[2022-09-09 13:57:50] DEBUG: read_align_ovlp_divergence=0.10
[2022-09-09 13:57:50] DEBUG: hpc_scoring_on=1
[2022-09-09 13:57:50] DEBUG: add_unassembled_reads=0
[2022-09-09 13:57:50] DEBUG: extend_contigs_with_repeats=0
[2022-09-09 13:57:50] DEBUG: min_read_cov_cutoff=3
[2022-09-09 13:57:50] DEBUG: short_tip_length=20000
[2022-09-09 13:57:50] DEBUG: long_tip_length=100000
[2022-09-09 13:57:50] DEBUG: Running with k-mer size: 17
[2022-09-09 13:57:50] DEBUG: Running with minimum overlap 10000
[2022-09-09 13:57:50] DEBUG: Metagenome mode: Y
[2022-09-09 13:57:50] DEBUG: Short mode: N
[2022-09-09 13:57:50] INFO: Reading sequences
[2022-09-09 13:57:50] DEBUG: Building positional index
[2022-09-09 13:57:50] DEBUG: Total sequence: 858801 bp
[2022-09-09 13:57:50] INFO: Building minimizer index
[2022-09-09 13:57:50] INFO: Pre-calculating index storage
[2022-09-09 13:57:50] DEBUG: Mean k-mer frequency: 1.08596
[2022-09-09 13:57:50] DEBUG: Repetitive k-mer frequency: 108
[2022-09-09 13:57:50] DEBUG: Filtered 0 repetitive k-mers (0)
[2022-09-09 13:57:50] INFO: Filling index
[2022-09-09 13:57:50] DEBUG: Sorting k-mer index
[2022-09-09 13:57:50] DEBUG: Selected k-mers: 263260
[2022-09-09 13:57:50] DEBUG: K-mer index size: 285891
[2022-09-09 13:57:50] DEBUG: Mean k-mer frequency: 1.08596
[2022-09-09 13:57:50] DEBUG: Minimizer rate: 3.00395
[2022-09-09 13:57:50] DEBUG: Peak RAM usage: 0 Gb
[2022-09-09 13:57:50] DEBUG: Estimating k-mer identity bias
[2022-09-09 13:58:14] DEBUG: Initial divergence estimate : 0.0481303
[2022-09-09 13:58:14] DEBUG: Relative threshold: Y
[2022-09-09 13:58:14] DEBUG: Max divergence threshold set to 0.0981303
[2022-09-09 13:58:14] INFO: Extending reads
[2022-09-09 13:58:14] DEBUG: Estimating overlap coverage
[2022-09-09 13:58:14] INFO: Overlap-based coverage: 0
[2022-09-09 13:58:14] INFO: Median overlap divergence: 0.0481303
[2022-09-09 13:58:14] DEBUG: Sequence divergence distribution:

|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
|         *         |                                                                                
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0.048, Q50 = 0.048, Q75 = 0.048

[2022-09-09 13:58:14] INFO: Assembled 0 disjointigs
[2022-09-09 13:58:14] INFO: Generating sequence
[2022-09-09 13:58:14] DEBUG: Building positional index
[2022-09-09 13:58:14] DEBUG: Mean k-mer frequency: 0
[2022-09-09 13:58:14] DEBUG: Repetitive k-mer frequency: 0
[2022-09-09 13:58:14] DEBUG: Filtered 0 repetitive k-mers (-nan)
[2022-09-09 13:58:14] DEBUG: Sorting k-mer index
[2022-09-09 13:58:14] DEBUG: Selected k-mers: 0
[2022-09-09 13:58:14] DEBUG: K-mer index size: 0
[2022-09-09 13:58:14] DEBUG: Mean k-mer frequency: -nan
[2022-09-09 13:58:14] DEBUG: Minimizer rate: -nan
[2022-09-09 13:58:14] INFO: Filtering contained disjointigs
[2022-09-09 13:58:14] DEBUG: Computing transitive closure for overlaps
[2022-09-09 13:58:14] DEBUG: Found 0 overlaps
[2022-09-09 13:58:14] DEBUG: Left 0 overlaps after filtering
[2022-09-09 13:58:14] INFO: Contained seqs: 0
[2022-09-09 13:58:14] DEBUG: Writing FASTA
[2022-09-09 13:58:14] DEBUG: Peak RAM usage: 0 Gb
-----------End assembly log------------
[2022-09-09 13:58:14] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
[2022-09-09 13:58:14] root: ERROR: Pipeline aborted

@mikolmogorov
Copy link
Owner

@matteo1313 Seems that you have ~800kb of reads for a bacteria of size 1.6Mb, so it simply not enough coverage to assemble. You typically need at least 10x, and 30x+ is recommended.

Also, your read N50 is 70kb, seems too good to be true for a bacteria - something might be wrong with the input data formatting.

@Scott-Godwin
Copy link

I'm also encountering this error. I'm running Flye as a plugin in Geneious Prime. My data consists of Nanopore reads generated from a cDNA library produced from RNA extracted from a cell culture infected with a virus. I'm trying to assemble the viral genome. I've filtered my reads by mapping against the host transcriptome, but this process is imperfect. I think that of the ~100,000 unmapped reads I have left, about 90% are viral. The virus has a segmented genome consisting of eight segments, with a total size of about 15 Kb. I've tried setting the genome size to various values including 15k, 100k and 2.4g (the approximate size of the host genome), but I keep getting the same error message.

ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct

Failed to run: C:\WINDOWS\System32\bash.exe -c '/mnt/c/Users/sgodwin/AppData/Local/Geneious/plugins/Flye/resources/Windows/bin/flye' --nano-corr input_0_Unpaired.fastq --threads 24 --genome-size 15k --meta --iterations 1 --out-dir out >stdout.txt 2>stderr.txt, exit code: 1

Flye reported the following errors: [2022-09-30 17:43:19] INFO: Starting Flye 2.7-b1585
[2022-09-30 17:43:19] INFO: >>>STAGE: configure [2022-09-30 17:43:19] INFO: Configuring run
[2022-09-30 17:43:19] INFO: Total read length: 4464863
[2022-09-30 17:43:19] INFO: Input genome size: 15000
[2022-09-30 17:43:19] INFO: Estimated coverage: 297
[2022-09-30 17:43:19] INFO: Reads N50/N90: 699 / 191
[2022-09-30 17:43:19] INFO: Minimum overlap set to 1000
[2022-09-30 17:43:19] INFO: Selected k-mer size: 17
[2022-09-30 17:43:19] INFO: >>>STAGE: assembly
[2022-09-30 17:43:19] INFO: Assembling disjointigs
[2022-09-30 17:43:19] INFO: Reading sequences
[2022-09-30 17:43:20] INFO: Generating solid k-mer index
[2022-09-30 17:43:31] INFO: Counting k-mers (1/2): 00102030405060708090100% 0% 020% % 02030% % 0203040% % 020304050% % 02030405060% % 0203040506070% % 020304050607080% % 02030405060708090% % %
[2022-09-30 17:43:31] INFO: Counting k-mers (2/2): 0% 506% % 604% 60% % 6040% % % 60% % % % % 80% 90% 100%

[2022-09-30 17:43:31] INFO: Filling index table (1/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2022-09-30 17:43:31] INFO: Filling index table (2/2) 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2022-09-30 17:43:32] INFO: Extending reads
[2022-09-30 17:43:51] INFO: Overlap-based coverage: 66
[2022-09-30 17:43:51] INFO: Median overlap divergence: 0.0697123 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [2022-09-30 17:43:52] INFO: Assembled 0 disjointigs
[2022-09-30 17:43:52] INFO: Generating sequence
[2022-09-30 17:43:52] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
[2022-09-30 17:43:52] ERROR: Pipeline aborted

@mikolmogorov
Copy link
Owner

@Scott-Godwin you are using outdated version of Flye. The latest release (2.9+) was optimized for viral assembly and should work better for you.

@lucyintheskyzzz
Copy link

I stopped using flye because it did not work on all my virus fastq files. What codes are people using for viral assembly now? I want to try it again. I remember by error was with the genome size.

thanks!

@Scott-Godwin
Copy link

@fenderglass Can I run Flye 2.9 from a bash terminal on a windows machine? I'm a wet lab guy. I'm a total beginner when it comes to all things bioinformatics.

@tolot27
Copy link

tolot27 commented Oct 5, 2022

@Scott-Godwin No, you can't. But you can install WSL (Windows System for Linux) and a Linux distribution like Ubuntu.

@lucyintheskyzzz
Copy link

Hi I uploaded the new version of Flye and I'am still getting "Pipeline aborted".
Thanks!

Also, do you know why Canu can assemble contigs with this fastq file but flye cannot?- I am trying to understand the theory behind different long-read de novo assemblers and why some can assemble, and some cannot, even though I am using the same fastq file.

Thanks!

flye --nano-raw barcode01.fastq --out-dir barcode01.flye --meta --threads 20
[2022-11-19 15:57:26] INFO: Starting Flye 2.9.1-b1780
[2022-11-19 15:57:26] INFO: >>>STAGE: configure
[2022-11-19 15:57:26] INFO: Configuring run
[2022-11-19 15:57:26] INFO: Total read length: 2427265
[2022-11-19 15:57:26] INFO: Reads N50/N90: 760 / 486
[2022-11-19 15:57:26] INFO: Minimum overlap set to 1000
[2022-11-19 15:57:26] INFO: >>>STAGE: assembly
[2022-11-19 15:57:26] INFO: Assembling disjointigs
[2022-11-19 15:57:26] INFO: Reading sequences
[2022-11-19 15:59:56] INFO: Counting k-mers:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2022-11-19 16:00:54] INFO: Filling index table (1/2)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2022-11-19 16:00:54] INFO: Filling index table (2/2)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2022-11-19 16:00:55] INFO: Extending reads
[2022-11-19 16:00:56] INFO: Overlap-based coverage: 59
[2022-11-19 16:00:56] INFO: Median overlap divergence: 0.191868
0% 100%
[2022-11-19 16:00:56] INFO: Assembled 0 disjointigs
[2022-11-19 16:00:56] INFO: Generating sequence
[2022-11-19 16:00:56] INFO: Filtering contained disjointigs
[2022-11-19 16:00:57] INFO: Contained seqs: 0
[2022-11-19 16:00:57] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
[2022-11-19 16:00:57] ERROR: Pipeline aborted
(/lustre/project/taw/share/conda-envs/flye) [kvigil@cypress2 Fastq_Concat]$ flye --nano-raw barcode01.fastq --out-dir barcode01.flye --meta --threads 20
[2022-11-19 16:03:02] INFO: Starting Flye 2.9.1-b1780
[2022-11-19 16:03:02] INFO: >>>STAGE: configure
[2022-11-19 16:03:02] INFO: Configuring run
[2022-11-19 16:03:02] INFO: Total read length: 2427265
[2022-11-19 16:03:02] INFO: Reads N50/N90: 760 / 486
[2022-11-19 16:03:02] INFO: Minimum overlap set to 1000
[2022-11-19 16:03:02] INFO: >>>STAGE: assembly
[2022-11-19 16:03:02] INFO: Assembling disjointigs
[2022-11-19 16:03:02] INFO: Reading sequences
[2022-11-19 16:05:35] INFO: Counting k-mers:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2022-11-19 16:06:36] INFO: Filling index table (1/2)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2022-11-19 16:06:36] INFO: Filling index table (2/2)
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[2022-11-19 16:06:37] INFO: Extending reads
[2022-11-19 16:06:38] INFO: Overlap-based coverage: 59
[2022-11-19 16:06:38] INFO: Median overlap divergence: 0.191868
0% 100%
[2022-11-19 16:06:38] INFO: Assembled 0 disjointigs
[2022-11-19 16:06:38] INFO: Generating sequence
[2022-11-19 16:06:39] INFO: Filtering contained disjointigs
[2022-11-19 16:06:39] INFO: Contained seqs: 0
[2022-11-19 16:06:39] ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
[2022-11-19 16:06:39] ERROR: Pipeline aborted

@lucyintheskyzzz
Copy link

Looks like my N50 is <1kb, so Flye can't assemble anything where the N50 is <1kb? What does N50 mean?

@ChristopherRichie
Copy link

ChristopherRichie commented Nov 23, 2022 via email

@lucyintheskyzzz
Copy link

@ChristopherRichie Thank you! I figured out that Metaflye is based on De Bruijn graph and Canu is an overlapping graph (OLC) based method.

@jolespin
Copy link

I've had this issue when using --nano-hq (my guppy version was 6.4.6+ae70e8f). When I changed the input to --nano-raw it ran to completion.

@PavithraV0223
Copy link

Hello, I'm working with the Nanopore data, of the alpacas. I have tried all the different parameters but each run gives the same error. I'm unsure what the problem is. I have been using the adapter and barcode trimmed fastq file as an input to nano-raw. I have tried all the trouble shooting as mentioned above in the discussion but ending up with the same error.
I have provided my log file for your reference. I have tried using the meta and the normal mode as well. You're help would be much appreciated.

2023-04-27 12:58:27] root: INFO: Starting Flye 2.9.2-b1786
[2023-04-27 12:58:27] root: DEBUG: Cmd: /home/pavi/miniconda3/bin/flye --nano-raw /home/pavi/flye/fitered3_MinIONadapt.fastq --out-dir ./flye_output
[2023-04-27 12:58:27] root: DEBUG: Python version: 3.7.16 (default, Jan 17 2023, 22:20:44)
[GCC 11.2.0]
[2023-04-27 12:58:27] root: INFO: >>>STAGE: configure
[2023-04-27 12:58:27] root: INFO: Configuring run
[2023-04-27 12:58:28] root: INFO: Total read length: 252562133
[2023-04-27 12:58:28] root: INFO: Reads N50/N90: 1137 / 994
[2023-04-27 12:58:28] root: INFO: Minimum overlap set to 1000
[2023-04-27 12:58:28] root: INFO: >>>STAGE: assembly
[2023-04-27 12:58:28] root: INFO: Assembling disjointigs
[2023-04-27 12:58:28] root: DEBUG: -----Begin assembly log------
[2023-04-27 12:58:28] root: DEBUG: Running: flye-modules assemble --reads /home/pavi/flye/fitered3_MinIONadapt.fastq --out-asm /home/pavi/flye/flye_output/00-assembly/draft_assembly.fasta --config /home/pavi/miniconda3/lib/python3.7/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg --log /home/pavi/flye/flye_output/flye.log --threads 1 --min-ovlp 1000
[2023-04-27 12:58:28] DEBUG: Build date: Mar 27 2023 14:17:04
[2023-04-27 12:58:28] DEBUG: Total RAM: 22 Gb
[2023-04-27 12:58:28] DEBUG: Available RAM: 19 Gb
[2023-04-27 12:58:28] DEBUG: Total CPUs: 7
[2023-04-27 12:58:28] DEBUG: Loading /home/pavi/miniconda3/lib/python3.7/site-packages/flye/config/bin_cfg/asm_raw_reads.cfg
[2023-04-27 12:58:28] DEBUG: Loading /home/pavi/miniconda3/lib/python3.7/site-packages/flye/config/bin_cfg/asm_defaults.cfg
[2023-04-27 12:58:28] DEBUG: big_genome_threshold=29000000
[2023-04-27 12:58:28] DEBUG: meta_read_filter_kmer_freq=100
[2023-04-27 12:58:28] DEBUG: chain_large_gap_penalty=2
[2023-04-27 12:58:28] DEBUG: chain_small_gap_penalty=0.5
[2023-04-27 12:58:28] DEBUG: chain_gap_jump_threshold=100
[2023-04-27 12:58:28] DEBUG: max_coverage_drop_rate=5
[2023-04-27 12:58:28] DEBUG: max_extensions_drop_rate=5
[2023-04-27 12:58:28] DEBUG: chimera_window=100
[2023-04-27 12:58:28] DEBUG: chimera_overhang=1000
[2023-04-27 12:58:28] DEBUG: min_reads_in_disjointig=4
[2023-04-27 12:58:28] DEBUG: max_inner_reads=10
[2023-04-27 12:58:28] DEBUG: max_inner_fraction=0.25
[2023-04-27 12:58:28] DEBUG: max_separation=500
[2023-04-27 12:58:28] DEBUG: unique_edge_length=50000
[2023-04-27 12:58:28] DEBUG: min_repeat_res_support=0.51
[2023-04-27 12:58:28] DEBUG: out_paths_ratio=5
[2023-04-27 12:58:28] DEBUG: graph_cov_drop_rate=5
[2023-04-27 12:58:28] DEBUG: coverage_estimate_window=100
[2023-04-27 12:58:28] DEBUG: max_bubble_length=50000
[2023-04-27 12:58:28] DEBUG: loop_coverage_rate=1.5
[2023-04-27 12:58:28] DEBUG: repeat_edge_cov_mult=1.75
[2023-04-27 12:58:28] DEBUG: weak_detach_rate=5
[2023-04-27 12:58:28] DEBUG: tip_coverage_rate=2
[2023-04-27 12:58:28] DEBUG: tip_length_rate=2
[2023-04-27 12:58:28] DEBUG: output_gfa_before_rr=0
[2023-04-27 12:58:28] DEBUG: remove_alt_edges=0
[2023-04-27 12:58:28] DEBUG: low_cutoff_warning=1
[2023-04-27 12:58:28] DEBUG: kmer_size=17
[2023-04-27 12:58:28] DEBUG: use_minimizers=0
[2023-04-27 12:58:28] DEBUG: reads_base_alignment=0
[2023-04-27 12:58:28] DEBUG: meta_read_top_kmer_rate=0.40
[2023-04-27 12:58:28] DEBUG: maximum_jump=1500
[2023-04-27 12:58:28] DEBUG: maximum_overhang=1500
[2023-04-27 12:58:28] DEBUG: repeat_kmer_rate=100
[2023-04-27 12:58:28] DEBUG: assemble_ovlp_divergence=0.10
[2023-04-27 12:58:28] DEBUG: assemble_divergence_relative=1
[2023-04-27 12:58:28] DEBUG: repeat_graph_ovlp_divergence=0.08
[2023-04-27 12:58:28] DEBUG: read_align_ovlp_divergence=0.25
[2023-04-27 12:58:28] DEBUG: hpc_scoring_on=0
[2023-04-27 12:58:28] DEBUG: add_unassembled_reads=0
[2023-04-27 12:58:28] DEBUG: extend_contigs_with_repeats=0
[2023-04-27 12:58:28] DEBUG: min_read_cov_cutoff=3
[2023-04-27 12:58:28] DEBUG: short_tip_length=20000
[2023-04-27 12:58:28] DEBUG: long_tip_length=100000
[2023-04-27 12:58:28] DEBUG: Running with k-mer size: 17
[2023-04-27 12:58:28] DEBUG: Running with minimum overlap 1000
[2023-04-27 12:58:28] DEBUG: Metagenome mode: N
[2023-04-27 12:58:28] DEBUG: Short mode: N
[2023-04-27 12:58:28] INFO: Reading sequences
[2023-04-27 12:58:29] DEBUG: Building positional index
[2023-04-27 12:58:29] DEBUG: Total sequence: 224735072 bp
[2023-04-27 12:58:31] INFO: Counting k-mers:
[2023-04-27 12:59:01] DEBUG: Updating k-mer histogram
[2023-04-27 12:59:39] DEBUG: Hash size: 1033102
[2023-04-27 12:59:39] DEBUG: Total k-mers 40609435
[2023-04-27 12:59:39] INFO: Filling index table (1/2)
[2023-04-27 13:00:49] DEBUG: Mean k-mer frequency: 340.156
[2023-04-27 13:00:49] DEBUG: Repetitive k-mer frequency: 34015
[2023-04-27 13:00:49] DEBUG: Filtered 28293692 repetitive k-mers (0.319157)
[2023-04-27 13:00:49] INFO: Filling index table (2/2)
[2023-04-27 13:01:59] DEBUG: Sorting k-mer index
[2023-04-27 13:02:00] DEBUG: Selected k-mers: 354076
[2023-04-27 13:02:00] DEBUG: Index size: 60427371
[2023-04-27 13:02:00] DEBUG: Mean k-mer index frequency: 170.662
[2023-04-27 13:02:00] DEBUG: Peak RAM usage: 8 Gb
[2023-04-27 13:02:00] DEBUG: Estimating k-mer identity bias
[2023-04-27 13:04:53] DEBUG: Initial divergence estimate : 0.234128
[2023-04-27 13:04:53] DEBUG: Relative threshold: Y
[2023-04-27 13:04:53] DEBUG: Max divergence threshold set to 0.334128
[2023-04-27 13:04:53] INFO: Extending reads
[2023-04-27 13:04:53] DEBUG: Estimating overlap coverage
[2023-04-27 13:07:48] INFO: Overlap-based coverage: 205
[2023-04-27 13:07:48] INFO: Median overlap divergence: 0.234818
[2023-04-27 13:07:48] DEBUG: Sequence divergence distribution:

| * |
| * |
| * * |
| * * * * |
| * * * * |
| *** * * * |
| **** * * * |
| ******* **** |
| ************ |
| ************** |
| ************** |
| * ************** |
| ***************** |
| ****************** |
| ****************** |
| ******************** |
| * * ********************** |
| * *** ** * ************************* |
| ********************************************* * | *
| ****************************************************| ** *

0% 5% 10% 15% 20% 25% 30% 35% 40% 45%

Q25 = 0.21, Q50 = 0.23, Q75 = 0.26
[2023-04-27 22:07:06] INFO: Assembled 0 disjointigs
[2023-04-27 22:07:06] INFO: Generating sequence
[2023-04-27 22:07:06] DEBUG: Building positional index
[2023-04-27 22:07:06] DEBUG: Mean k-mer frequency: 0
[2023-04-27 22:07:06] DEBUG: Repetitive k-mer frequency: 0
[2023-04-27 22:07:06] DEBUG: Filtered 0 repetitive k-mers (-nan)
[2023-04-27 22:07:06] DEBUG: Sorting k-mer index
[2023-04-27 22:07:06] DEBUG: Selected k-mers: 0
[2023-04-27 22:07:06] DEBUG: K-mer index size: 0
[2023-04-27 22:07:06] DEBUG: Mean k-mer frequency: -nan
[2023-04-27 22:07:06] DEBUG: Minimizer rate: -nan
[2023-04-27 22:07:06] INFO: Filtering contained disjointigs
[2023-04-27 22:07:06] DEBUG: Computing transitive closure for overlaps
[2023-04-27 22:07:06] DEBUG: Found 0 overlaps
[2023-04-27 22:07:06] DEBUG: Left 0 overlaps after filtering
[2023-04-27 22:07:06] INFO: Contained seqs: 0
[2023-04-27 22:07:06] DEBUG: Writing FASTA
[2023-04-27 22:07:06] DEBUG: Peak RAM usage: 8 Gb
-----------End assembly log------------
[2023-04-27 22:07:06] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
[2023-04-27 22:07:06] root: ERROR: Pipeline aborted

@mikolmogorov
Copy link
Owner

@PavithraV0223 could you tell more about your sample? And please attach a log with --meta run. In general, read length seems to be very short 1kb N50, is it some kind of amplicon sequencing?

@emmannaemeka
Copy link

Hello, I am having similar issues. I have tried the --meta mode and the --asm-coverage 50 without success.

[2023-05-22 09:29:27] root: INFO: Starting Flye 2.9-b1778
[2023-05-22 09:29:27] root: DEBUG: Cmd: /Users/pamluka/Desktop/programs_bioinformatics/Flye/bin/flye --meta --nano-raw /Users/pamluka/Desktop/UNGSM/sample_6/Sample-06-X-2022_fastq.fastq.gz -o /Users/pamluka/Desktop/UNGSM
[2023-05-22 09:29:27] root: DEBUG: Python version: 3.6.15 | packaged by conda-forge | (default, Dec 3 2021, 18:49:43)
[GCC Clang 11.1.0]
[2023-05-22 09:29:27] root: INFO: >>>STAGE: configure
[2023-05-22 09:29:27] root: INFO: Configuring run
[2023-05-22 09:29:37] root: INFO: Total read length: 229301908
[2023-05-22 09:29:37] root: INFO: Reads N50/N90: 353 / 282
[2023-05-22 09:29:37] root: INFO: Minimum overlap set to 1000
[2023-05-22 09:29:37] root: INFO: >>>STAGE: assembly
[2023-05-22 09:29:37] root: INFO: Assembling disjointigs
[2023-05-22 09:29:37] root: DEBUG: -----Begin assembly log------
[2023-05-22 09:29:37] root: DEBUG: Running: flye-modules assemble --reads /Users/pamluka/Desktop/UNGSM/sample_6/Sample-06-X-2022_fastq.fastq.gz --out-asm /Users/pamluka/Desktop/UNGSM/00-assembly/draft_assembly.fasta --config /Users/pamluka/Desktop/programs_bioinformatics/Flye/flye/config/bin_cfg/asm_raw_reads.cfg --log /Users/pamluka/Desktop/UNGSM/flye.log --threads 1 --meta --min-ovlp 1000
[2023-05-22 09:29:37] DEBUG: Build date: Jun 7 2022 09:22:15
[2023-05-22 09:29:37] DEBUG: Total RAM: 16 Gb
[2023-05-22 09:29:37] DEBUG: Available RAM: 0 Gb
[2023-05-22 09:29:37] DEBUG: Total CPUs: 8
[2023-05-22 09:29:37] DEBUG: Loading /Users/pamluka/Desktop/programs_bioinformatics/Flye/flye/config/bin_cfg/asm_raw_reads.cfg
[2023-05-22 09:29:37] DEBUG: Loading /Users/pamluka/Desktop/programs_bioinformatics/Flye/flye/config/bin_cfg/asm_defaults.cfg
[2023-05-22 09:29:37] DEBUG: big_genome_threshold=29000000
[2023-05-22 09:29:37] DEBUG: meta_read_filter_kmer_freq=100
[2023-05-22 09:29:37] DEBUG: chain_large_gap_penalty=2
[2023-05-22 09:29:37] DEBUG: chain_small_gap_penalty=0.5
[2023-05-22 09:29:37] DEBUG: chain_gap_jump_threshold=100
[2023-05-22 09:29:37] DEBUG: max_coverage_drop_rate=5
[2023-05-22 09:29:37] DEBUG: max_extensions_drop_rate=5
[2023-05-22 09:29:37] DEBUG: chimera_window=100
[2023-05-22 09:29:37] DEBUG: chimera_overhang=1000
[2023-05-22 09:29:37] DEBUG: min_reads_in_disjointig=4
[2023-05-22 09:29:37] DEBUG: max_inner_reads=10
[2023-05-22 09:29:37] DEBUG: max_inner_fraction=0.25
[2023-05-22 09:29:37] DEBUG: max_separation=500
[2023-05-22 09:29:37] DEBUG: unique_edge_length=50000
[2023-05-22 09:29:37] DEBUG: min_repeat_res_support=0.51
[2023-05-22 09:29:37] DEBUG: out_paths_ratio=5
[2023-05-22 09:29:37] DEBUG: graph_cov_drop_rate=5
[2023-05-22 09:29:37] DEBUG: coverage_estimate_window=100
[2023-05-22 09:29:37] DEBUG: max_bubble_length=50000
[2023-05-22 09:29:37] DEBUG: loop_coverage_rate=1.5
[2023-05-22 09:29:37] DEBUG: repeat_edge_cov_mult=1.75
[2023-05-22 09:29:37] DEBUG: weak_detach_rate=5
[2023-05-22 09:29:37] DEBUG: tip_coverage_rate=2
[2023-05-22 09:29:37] DEBUG: tip_length_rate=2
[2023-05-22 09:29:37] DEBUG: output_gfa_before_rr=0
[2023-05-22 09:29:37] DEBUG: remove_alt_edges=0
[2023-05-22 09:29:37] DEBUG: low_cutoff_warning=1
[2023-05-22 09:29:37] DEBUG: kmer_size=17
[2023-05-22 09:29:37] DEBUG: use_minimizers=0
[2023-05-22 09:29:37] DEBUG: reads_base_alignment=0
[2023-05-22 09:29:37] DEBUG: meta_read_top_kmer_rate=0.40
[2023-05-22 09:29:37] DEBUG: maximum_jump=1500
[2023-05-22 09:29:37] DEBUG: maximum_overhang=1500
[2023-05-22 09:29:37] DEBUG: repeat_kmer_rate=100
[2023-05-22 09:29:37] DEBUG: assemble_ovlp_divergence=0.10
[2023-05-22 09:29:37] DEBUG: assemble_divergence_relative=1
[2023-05-22 09:29:37] DEBUG: repeat_graph_ovlp_divergence=0.08
[2023-05-22 09:29:37] DEBUG: read_align_ovlp_divergence=0.25
[2023-05-22 09:29:37] DEBUG: hpc_scoring_on=0
[2023-05-22 09:29:37] DEBUG: add_unassembled_reads=0
[2023-05-22 09:29:37] DEBUG: extend_contigs_with_repeats=0
[2023-05-22 09:29:37] DEBUG: min_read_cov_cutoff=3
[2023-05-22 09:29:37] DEBUG: short_tip_length=20000
[2023-05-22 09:29:37] DEBUG: long_tip_length=100000
[2023-05-22 09:29:37] DEBUG: Running with k-mer size: 17
[2023-05-22 09:29:37] DEBUG: Running with minimum overlap 1000
[2023-05-22 09:29:37] DEBUG: Metagenome mode: Y
[2023-05-22 09:29:37] DEBUG: Short mode: N
[2023-05-22 09:29:37] INFO: Reading sequences
[2023-05-22 09:29:42] DEBUG: Building positional index
[2023-05-22 09:29:42] DEBUG: Total sequence: 3440345 bp
[2023-05-22 09:29:46] INFO: Counting k-mers:
[2023-05-22 09:29:47] DEBUG: Updating k-mer histogram
[2023-05-22 09:30:31] DEBUG: Hash size: 10893
[2023-05-22 09:30:31] DEBUG: Total k-mers 1848766
[2023-05-22 09:30:31] INFO: Filling index table (1/2)
[2023-05-22 09:30:32] DEBUG: Mean k-mer frequency: 7.46855
[2023-05-22 09:30:32] DEBUG: Repetitive k-mer frequency: 746
[2023-05-22 09:30:32] DEBUG: Filtered 5983 repetitive k-mers (0.00455754)
[2023-05-22 09:30:32] INFO: Filling index table (2/2)
[2023-05-22 09:30:34] DEBUG: Sorting k-mer index
[2023-05-22 09:30:34] DEBUG: Selected k-mers: 220513
[2023-05-22 09:30:34] DEBUG: Index size: 1350695
[2023-05-22 09:30:34] DEBUG: Mean k-mer index frequency: 6.12524
[2023-05-22 09:30:34] DEBUG: Peak RAM usage: 8 Gb
[2023-05-22 09:30:34] DEBUG: Estimating k-mer identity bias
[2023-05-22 09:30:35] DEBUG: Initial divergence estimate : 0.0703537
[2023-05-22 09:30:35] DEBUG: Relative threshold: Y
[2023-05-22 09:30:35] DEBUG: Max divergence threshold set to 0.170354
[2023-05-22 09:30:35] INFO: Extending reads
[2023-05-22 09:30:35] DEBUG: Estimating overlap coverage
[2023-05-22 09:30:37] INFO: Overlap-based coverage: 1
[2023-05-22 09:30:37] INFO: Median overlap divergence: 0.0717406
[2023-05-22 09:30:37] DEBUG: Sequence divergence distribution:

|              *                   |                                                                 
|              *                   |                                                                 
|              *                   |                                                                 
|              *                   |                                                                 
|              *                   |                                                                 
|              *                   |                                                                 
|             **                   |                                                                 
|             **                   |                                                                 
|            ***                   |                                                                 
|           ****                   |                                                                 
|           ****                   |                                                                 
|           *****                  |                                                                 
|           *****                  |                                                                 
|           ******                 |                                                                 
|           ****** *               |                                                                 
|           ********               |                                                                 
|          *********               |                                                                 
|          *********  **           |                                                                 
|       *  ********* ***           |                 *                                               
|      ** ************** * *       | *           *   *  *   *                                        
----------------------------------------------------------------------------------------------------
0%        5%        10%       15%       20%       25%       30%       35%       40%       45%       

Q25 = 0.064, Q50 = 0.072, Q75 = 0.083

[2023-05-22 09:30:42] INFO: Assembled 0 disjointigs
[2023-05-22 09:30:42] INFO: Generating sequence
[2023-05-22 09:30:42] DEBUG: Building positional index
[2023-05-22 09:30:42] DEBUG: Mean k-mer frequency: 0
[2023-05-22 09:30:42] DEBUG: Repetitive k-mer frequency: 0
[2023-05-22 09:30:42] DEBUG: Filtered 0 repetitive k-mers (nan)
[2023-05-22 09:30:42] DEBUG: Sorting k-mer index
[2023-05-22 09:30:42] DEBUG: Selected k-mers: 0
[2023-05-22 09:30:42] DEBUG: K-mer index size: 0
[2023-05-22 09:30:42] DEBUG: Mean k-mer frequency: nan
[2023-05-22 09:30:42] DEBUG: Minimizer rate: nan
[2023-05-22 09:30:42] INFO: Filtering contained disjointigs
[2023-05-22 09:30:42] DEBUG: Computing transitive closure for overlaps
[2023-05-22 09:30:42] DEBUG: Found 0 overlaps
[2023-05-22 09:30:42] DEBUG: Left 0 overlaps after filtering
[2023-05-22 09:30:42] INFO: Contained seqs: 0
[2023-05-22 09:30:42] DEBUG: Writing FASTA
[2023-05-22 09:30:42] DEBUG: Peak RAM usage: 8 Gb
-----------End assembly log------------
[2023-05-22 09:30:42] root: ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct
[2023-05-22 09:30:42] root: ERROR: Pipeline aborted

@mikolmogorov
Copy link
Owner

mikolmogorov commented May 29, 2023

@emmannaemeka seems like you're assembling very short reads, Flye really needs few kb reads to work.

granek added a commit to granek/wf-bacterial-genomes that referenced this issue Aug 4, 2023
Trying to fix Error "No disjointigs were assembled", based on mikolmogorov/Flye#128
granek added a commit to granek/wf-bacterial-genomes that referenced this issue Aug 4, 2023
trying --meta, the other suggestion from mikolmogorov/Flye#128
--asm-coverage requires genome size estimate
@miniluphy
Copy link

I encountered a similar issue.
Based on the latest version 2.9.3, when inputting the pacbio-hifi file, I received the following error message:

INFO: Overlap-based coverage: 0
ERROR: No disjointigs were assembled - please check if the read type and genome size parameters are correct.

I have attached the log file.

Upon checking the fastq.gz file via pbclip, the result shows:
Good: 152693 chopped: 5824 bad: 1897.

The -meta result is similar. How should I resolve this issue?
Thank you
flye.log
flye-meta.log

@mikolmogorov
Copy link
Owner

@miniluphy your read error rate is ~13%, so this is not HiFi reads. If it is pacbio, use --pacbio-raw instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests