CANU is failing - bogart issue #1323

anja999 · 2019-04-11T07:51:31Z

Dear all,
I am trying to run the CANU on my DATA and it keeps failing before assembly step (bogart failed). So I have corrected and trimmed data as the output. Anyway, I have directRNA data and I am trying to do the metagenomics approach to see what is in the sample of plant material infected with viruses/viroids. The transcripts and genomes should not be longer than 20k.
When I was running the command was:
/canu -d run1 -p run1 genomeSize=20k -nanopore-raw/DATA/run1.fa overlapper=mhap utgReAlign=true corOutCoverage=10000 corMhapSensitivity=high minReadLength=100 minOverlapLength=100 corMinCoverage=0 minMemory=100 maxMemory=200 maxThreads=24

I was trying also some other options of the command which I found by reading all these different issues of bogard failed but without success. Also, the assembly takes 10 days (we have a server with 36 threads and 250 memory) - is this normal?

Many thanks for any help!

skoren · 2019-04-11T13:28:10Z

Can you provide more details on how it is failing, post the unitigger.err log? I would guess it is similar to #1281, because the genome size is so low it's trying to load lots of overlaps and doesn't have enough memory (note in the metagenomic FAQ parameters it explicitly increases bogart memory to avoid this). If it is the same issue, you can edit unitigger.sh similarly (increase genome size, increase -M option) and resume canu.

As for total runtime, it depends how much data you have, you're also dropping minimum overlap and read lengths much lower than the default which will add to runtime.

anja999 · 2019-04-18T07:32:23Z

Many thanks. I will try to change genomeSize and minOverlapLength. The thing is that the genomes sizes are really around 10k. If I would increase it maybe the assembly would not be correct? The unitigger.err is saying as you assumed, not enough memory. Do you have any idea how much memory would be enough? I could use 500.

Untrigger.err

==> PARAMETERS.

Resources:
Memory 16 GB
Compute Threads 4 (command line)

Lengths:
Minimum read 0 bases
Minimum overlap 500 bases

Overlap Error Rates:
Graph 0.120 (12.000%)
Max 0.120 (12.000%)

Deviations:
Graph 6.000
Bubble 6.000
Repeat 3.000

Edge Confusion:
Absolute 2100
Percent 200.0000

Unitig Construction:
Minimum intersection 500 bases
Maxiumum placements 2 positions

Debugging Enabled:
(none)

==> LOADING AND FILTERING OVERLAPS.

ReadInfo()-- Using 140318 reads, no minimum read length used.

OverlapCache()-- limited to 16384MB memory (user supplied).

OverlapCache()-- 1MB for read data.
OverlapCache()-- 5MB for best edges.
OverlapCache()-- 13MB for tigs.
OverlapCache()-- 3MB for tigs - read layouts.
OverlapCache()-- 5MB for tigs - error profiles.
OverlapCache()-- 4096MB for tigs - error profile overlaps.
OverlapCache()-- 0MB for other processes.
OverlapCache()-- ---------
OverlapCache()-- 4128MB for data structures (sum of above).
OverlapCache()-- ---------
OverlapCache()-- 2MB for overlap store structure.
OverlapCache()-- 12253MB for overlap data.
OverlapCache()-- ---------
OverlapCache()-- 16384MB allowed.
OverlapCache()--
OverlapCache()-- Retain at least 10012 overlaps/read, based on 5006.05x coverage.
OverlapCache()-- Initial guess at 5722 overlaps/read.
OverlapCache()--
OverlapCache()-- Not enough memory to load the minimum number of overlaps; increase -M.

skoren · 2019-04-18T13:53:44Z

The genome size won't make the assembly wrong, it's just used to compute some statistics and to guess at the coverage in your dataset.

You can of course increase the memory, I expect 500 will be enough. However, the result of increasing the genome size or memory won't be very different. You don't really need 5000 overlaps per read to assemble the amplicon. I would increase the genome size to 1mb and see if it runs in the current memory.

anja999 · 2019-06-06T10:48:00Z

Dear Sergey! I was trying different things and the Canu was still failing in the last stages. Can I share some data with you and then you would maybe see what is wrong with them? I will be really happy if you could help me. Trying different thinks is really time consuming since it takes days to fail again. Sorry for bothering you and many thanks! Anja

…

___________________________________________________ Anja Pecman Mlada raziskovalka / PhD Student Nacionalni inštitut za biologijo<http://www.nib.si/> / National Institute of Biology<http://www.nib.si/eng/> Oddelek za biotehnologijo in sistemsko biologijo<http://www.nib.si/oddelki/oddelek-za-biotehnologijo-in-sistemsko-biologijo> Department of Biotechnology and Systems Biology<http://www.nib.si/eng/index.php/departments/department-of-biotechnology-and-systems-biology> Večna pot 111, SI-1000 Ljubljana, Slovenia Phone: + 386 (0)59 232 823 Fax: + 386 (0)1 257 38 47 E-mail: anja.pecman@nib.si<mailto:anja.pecman@nib.si> [cid:image001.png@01D51C66.0D148C30] From: Sergey Koren [mailto:notifications@github.com] Sent: Thursday, April 11, 2019 3:29 PM To: marbl/canu <canu@noreply.github.com> Cc: Anja Pecman <Anja.Pecman@nib.si>; Author <author@noreply.github.com> Subject: Re: [marbl/canu] CANU is failing - bogard issue (#1323) Can you provide more details on how it is failing, post the unitigger.err log? I would guess it is similar to #1281<#1281>, because the genome size is so low it's trying to load lots of overlaps and doesn't have enough memory (note in the metagenomic FAQ parameters it explicitly increases bogart memory to avoid this). If it is the same issue, you can edit unitigger.sh similarly (increase genome size, increase -M option) and resume canu. As for total runtime, it depends how much data you have, you're also dropping minimum overlap and read lengths much lower than the default which will add to runtime. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#1323 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AvOA6C7FSlxh4QZscUVpZHgYNLgRYsg7ks5vfzhsgaJpZM4co9oU>.

skoren · 2019-06-06T13:23:34Z

Sure you can upload your full run directory or just the reads following the instructions on the FAQ. It shouldn't take days to re-run anything in the bogart step, that's all you need to run to test changing memory/genome size.

skoren · 2019-06-13T14:21:32Z

Did you ever upload any data, i don't see anything on the FTP site.

anja999 · 2019-06-13T14:39:18Z

Dear Sergey, I have just uploaded testAP.fastq file. I was basecalling again the data and because of that it took so long. So I would like to have metagenomics approach and when I was running the canu I tried to use this: overlapper=mhap utgReAlign=true corOutCoverage=10000 corMhapSensitivity=high minReadLength=100 minOverlapLength=100 corMinCoverage=0 (found in canu manual) or this command obtOverlapper=mhap obtReAlign=raw utgOverlapper=mhap utgReAlign=raw corOutCoverage=10000 corMhapSensitivity=high minReadLength=100 minOverlapLength=100 corMinCoverage=0 (found on github). Do you think that this could work? Normally the bogard failed. Many thanks for any help,! Best regards, Anja

…

___________________________________________________ Anja Pecman Mlada raziskovalka / PhD Student Nacionalni inštitut za biologijo<http://www.nib.si/> / National Institute of Biology<http://www.nib.si/eng/> Oddelek za biotehnologijo in sistemsko biologijo<http://www.nib.si/oddelki/oddelek-za-biotehnologijo-in-sistemsko-biologijo> Department of Biotechnology and Systems Biology<http://www.nib.si/eng/index.php/departments/department-of-biotechnology-and-systems-biology> Večna pot 111, SI-1000 Ljubljana, Slovenia Phone: + 386 (0)59 232 823 Fax: + 386 (0)1 257 38 47 E-mail: anja.pecman@nib.si<mailto:anja.pecman@nib.si> [cid:image001.png@01D52205.AF3F9BE0] From: Sergey Koren [mailto:notifications@github.com] Sent: Thursday, June 13, 2019 4:22 PM To: marbl/canu <canu@noreply.github.com> Cc: Anja Pecman <Anja.Pecman@nib.si>; Author <author@noreply.github.com> Subject: Re: [marbl/canu] CANU is failing - bogart issue (#1323) Did you ever upload any data, i don't see anything on the FTP site. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#1323?email_source=notifications&email_token=ALZYB2DAMO73H2PK3WIOHJLP2JJXBA5CNFSM4HFD3IKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXT3IMY#issuecomment-501724211>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ALZYB2ASL2FSF7S5N2HHJ43P2JJXBANCNFSM4HFD3IKA>.

skoren · 2019-06-17T19:54:43Z

The docs are going to be more up to date since there are GitHub issues referring to canu versions that no longer exist. Your second command seems reasonable though the metagenomic options also increase the bat memory which you've omitted (obtOverlapper=mhap obtReAlign=raw utgOverlapper=mhap utgReAlign=raw corOutCoverage=10000 corMhapSensitivity=high minReadLength=100 minOverlapLength=100 corMinCoverage=0 'redMemory=32' 'oeaMemory=32' 'batMemory=200' ). I was able to run an assembly of your data using the above without error setting genome size to 20k. I do see a few contigs in the 17-20kb range.

However, I am not sure what assembling direct RNA means? Aren't these already full-length transcripts so is there anything to assemble? Are you trying to see if you can assemble RNA viruses?

brianwalenz changed the title ~~CANU is failing - bogard issue~~ CANU is failing - bogart issue Apr 12, 2019

brianwalenz closed this as completed May 8, 2019

skoren reopened this Jun 17, 2019

skoren closed this as completed Jun 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CANU is failing - bogart issue #1323

CANU is failing - bogart issue #1323

anja999 commented Apr 11, 2019

skoren commented Apr 11, 2019

anja999 commented Apr 18, 2019

skoren commented Apr 18, 2019

anja999 commented Jun 6, 2019 via email

skoren commented Jun 6, 2019

skoren commented Jun 13, 2019

anja999 commented Jun 13, 2019 via email

skoren commented Jun 17, 2019 •

edited

Loading

CANU is failing - bogart issue #1323

CANU is failing - bogart issue #1323

Comments

anja999 commented Apr 11, 2019

skoren commented Apr 11, 2019

anja999 commented Apr 18, 2019

skoren commented Apr 18, 2019

anja999 commented Jun 6, 2019 via email

skoren commented Jun 6, 2019

skoren commented Jun 13, 2019

anja999 commented Jun 13, 2019 via email

skoren commented Jun 17, 2019 • edited Loading

skoren commented Jun 17, 2019 •

edited

Loading