Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is *.unitigs.fasta? #286

Closed
stomk opened this issue Nov 7, 2016 · 10 comments
Closed

What is *.unitigs.fasta? #286

stomk opened this issue Nov 7, 2016 · 10 comments

Comments

@stomk
Copy link

stomk commented Nov 7, 2016

Hi,

Very excited with Canu assembler as I'm tackling to look into repeat-rich regions in a fish genome.

I found in the output directory *.unitigs.fasta. What is it and how does it relate to *.contigs.fasta, *.unassembled.fasta and *.bubbles.fasta?

Thank you for your help.

@brianwalenz
Copy link
Member

'contigs' will span repeats, as long as the repeat is unambiguous.

'unitigs' are derived from contigs. Wherever a contig end intersects the middle of another contig, the contig is split.

'bubbles' are deprecated and will be removed in the next release. Treat them as contigs for now.

'unassembled' contains mostly reads that failed to assemble into a contig. There will be some assembled sequences, but these will be short and nearly the same as the longest read in them.

Though out of date, and will probably move when it is updated, the relevant section in the docs is http://canu.readthedocs.io/en/latest/quick-start.html#find-the-output

@StefanoLonardi
Copy link

Brian, you say that 'unitigs' are derived from 'contigs', but how comes unitigs in my case is almost 900Mb, while contig is only 510Mb? My genome is 620Mb.

@mictadlo
Copy link

Hi
I would like to run PBjelly.

  1. Should I combine any of the above files or should I use only the contings file as input?
  2. Which reads do you recommend that I should use
  • trimmedReads or
  • correctedReads?

Thank you in advance.

Michal

@skoren
Copy link
Member

skoren commented Mar 17, 2017

@StefanoLonardi: The unitigs are unfiltered contigs. There is some filtering on the contigs to remove ones composed primarily of a single read. There is no such filter on the unitigs (see this option: http://canu.readthedocs.io/en/latest/faq.html#my-asm-contigs-fasta-is-empty-why).

@mictadlo PBJelly is primarily designed to close gaps in scaffolds, Canu doesn't currently produce scaffolds, only contigs. PBJelly can join some contigs but it is unlikely to make much difference. If you have scaffolded the contigs with another technology, then you can run PBJelly. You would use the uncorrected fastq reads (same as you input to Canu) for it not the trimmedReads nor the correctedReads.

@mictadlo
Copy link

What other technology would you recommend to scaffold contigs?

@skoren
Copy link
Member

skoren commented Mar 19, 2017

There are many options, here's a brief non-exhaustive list (see our goat publication: http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3802.html for a few options (Bionano optical maps and Phase HiC)). There is also Dovetail which provides both HiC and Chicago libraries for scaffolding. There's also 10X for scaffolding as well.

@mictadlo
Copy link

Thank you

@mictadlo
Copy link

P.S. Why would you not use for PBjelly the trimmedReads or the correctedReads?

@mictadlo
Copy link

mictadlo commented Apr 25, 2017

Hi,
I read your nature paper and I tried to ran kraken on the assembly as describing here.

Did you use the unclassified-out or classified-out output file for your final assembly?

You wrote in your paper A total of 183 unplaced contigs and 1 scaffold were flagged as contaminant and removed. An additional two unplaced contigs were flagged as vector by NCBI and removed.. How did you do it?

Thank you in advance

Michal

@sekhwal
Copy link

sekhwal commented May 15, 2019

How can I annotate intergenic variants and variants from horizontally-acquired genes in the DBGWAS visualization, so that we can distinguish different types of variants easily?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants