Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CANU is failing - bogart issue #1323

Closed
anja999 opened this issue Apr 11, 2019 · 8 comments
Closed

CANU is failing - bogart issue #1323

anja999 opened this issue Apr 11, 2019 · 8 comments

Comments

@anja999
Copy link

anja999 commented Apr 11, 2019

Dear all,
I am trying to run the CANU on my DATA and it keeps failing before assembly step (bogart failed). So I have corrected and trimmed data as the output. Anyway, I have directRNA data and I am trying to do the metagenomics approach to see what is in the sample of plant material infected with viruses/viroids. The transcripts and genomes should not be longer than 20k.
When I was running the command was:
/canu -d run1 -p run1 genomeSize=20k -nanopore-raw/DATA/run1.fa overlapper=mhap utgReAlign=true corOutCoverage=10000 corMhapSensitivity=high minReadLength=100 minOverlapLength=100 corMinCoverage=0 minMemory=100 maxMemory=200 maxThreads=24

I was trying also some other options of the command which I found by reading all these different issues of bogard failed but without success. Also, the assembly takes 10 days (we have a server with 36 threads and 250 memory) - is this normal?

Many thanks for any help!

@skoren
Copy link
Member

skoren commented Apr 11, 2019

Can you provide more details on how it is failing, post the unitigger.err log? I would guess it is similar to #1281, because the genome size is so low it's trying to load lots of overlaps and doesn't have enough memory (note in the metagenomic FAQ parameters it explicitly increases bogart memory to avoid this). If it is the same issue, you can edit unitigger.sh similarly (increase genome size, increase -M option) and resume canu.

As for total runtime, it depends how much data you have, you're also dropping minimum overlap and read lengths much lower than the default which will add to runtime.

@brianwalenz brianwalenz changed the title CANU is failing - bogard issue CANU is failing - bogart issue Apr 12, 2019
@anja999
Copy link
Author

anja999 commented Apr 18, 2019

Many thanks. I will try to change genomeSize and minOverlapLength. The thing is that the genomes sizes are really around 10k. If I would increase it maybe the assembly would not be correct? The unitigger.err is saying as you assumed, not enough memory. Do you have any idea how much memory would be enough? I could use 500.

Untrigger.err

==> PARAMETERS.

Resources:
Memory 16 GB
Compute Threads 4 (command line)

Lengths:
Minimum read 0 bases
Minimum overlap 500 bases

Overlap Error Rates:
Graph 0.120 (12.000%)
Max 0.120 (12.000%)

Deviations:
Graph 6.000
Bubble 6.000
Repeat 3.000

Edge Confusion:
Absolute 2100
Percent 200.0000

Unitig Construction:
Minimum intersection 500 bases
Maxiumum placements 2 positions

Debugging Enabled:
(none)

==> LOADING AND FILTERING OVERLAPS.

ReadInfo()-- Using 140318 reads, no minimum read length used.

OverlapCache()-- limited to 16384MB memory (user supplied).

OverlapCache()-- 1MB for read data.
OverlapCache()-- 5MB for best edges.
OverlapCache()-- 13MB for tigs.
OverlapCache()-- 3MB for tigs - read layouts.
OverlapCache()-- 5MB for tigs - error profiles.
OverlapCache()-- 4096MB for tigs - error profile overlaps.
OverlapCache()-- 0MB for other processes.
OverlapCache()-- ---------
OverlapCache()-- 4128MB for data structures (sum of above).
OverlapCache()-- ---------
OverlapCache()-- 2MB for overlap store structure.
OverlapCache()-- 12253MB for overlap data.
OverlapCache()-- ---------
OverlapCache()-- 16384MB allowed.
OverlapCache()--
OverlapCache()-- Retain at least 10012 overlaps/read, based on 5006.05x coverage.
OverlapCache()-- Initial guess at 5722 overlaps/read.
OverlapCache()--
OverlapCache()-- Not enough memory to load the minimum number of overlaps; increase -M.

@skoren
Copy link
Member

skoren commented Apr 18, 2019

The genome size won't make the assembly wrong, it's just used to compute some statistics and to guess at the coverage in your dataset.

You can of course increase the memory, I expect 500 will be enough. However, the result of increasing the genome size or memory won't be very different. You don't really need 5000 overlaps per read to assemble the amplicon. I would increase the genome size to 1mb and see if it runs in the current memory.

@anja999
Copy link
Author

anja999 commented Jun 6, 2019 via email

@skoren
Copy link
Member

skoren commented Jun 6, 2019

Sure you can upload your full run directory or just the reads following the instructions on the FAQ. It shouldn't take days to re-run anything in the bogart step, that's all you need to run to test changing memory/genome size.

@skoren
Copy link
Member

skoren commented Jun 13, 2019

Did you ever upload any data, i don't see anything on the FTP site.

@anja999
Copy link
Author

anja999 commented Jun 13, 2019 via email

@skoren
Copy link
Member

skoren commented Jun 17, 2019

The docs are going to be more up to date since there are GitHub issues referring to canu versions that no longer exist. Your second command seems reasonable though the metagenomic options also increase the bat memory which you've omitted (obtOverlapper=mhap obtReAlign=raw utgOverlapper=mhap utgReAlign=raw corOutCoverage=10000 corMhapSensitivity=high minReadLength=100 minOverlapLength=100 corMinCoverage=0 'redMemory=32' 'oeaMemory=32' 'batMemory=200' ). I was able to run an assembly of your data using the above without error setting genome size to 20k. I do see a few contigs in the 17-20kb range.

However, I am not sure what assembling direct RNA means? Aren't these already full-length transcripts so is there anything to assemble? Are you trying to see if you can assemble RNA viruses?

@skoren skoren reopened this Jun 17, 2019
@skoren skoren closed this as completed Jun 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants