Question about eventalign parallelization at file level #770

mmiladi · 2020-04-25T08:51:37Z

Hi,

Is it possible to speedup eventalign computations by splitting the files and/or region windowing?

For example to speedup nanopolish eventalign --reads all.fastq --bam all.bam --genome genome.fa > all.tsv, split the fastq file and then run:

nanopolish eventalign --reads half1.fastq --bam all.bam --genome genome.fa > half1.tsv
nanopolish eventalign --reads half2.fastq --bam all.bam --genome genome.fa > half2.tsv
cat half1.tsv half2.tsv > all.tsv

Best,

The text was updated successfully, but these errors were encountered:

jts · 2020-04-25T11:48:07Z

Yes, that is the recommended way to speed it up. Jared

…

On Apr 25, 2020, at 4:51 AM, Milad Miladi ***@***.***> wrote: Hi, Is it possible to speedup eventalign computations by splitting the files and/or region windowing? For example to speedup nanopolish eventalign --reads all.fastq --bam all.bam --genome genome.fa > all.tsv, split the fastq file and the run: nanopolish eventalign --reads half1.fastq --bam all.bam --genome genome.fa > half1.tsv nanopolish eventalign --reads half2.fastq --bam all.bam --genome genome.fa > half2.tsv cat half1.tsv half2.tsv > all.tsv Best, — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mmiladi · 2020-04-25T12:55:19Z

Great, Thanks.
Would this also work with the window option '-w'? For the data I am using, the -w seems to be ineffective as I can see positions outside the requested range withing the .tsv table.

jts · 2020-04-25T13:00:21Z

Sorry, I misread your issue initially (I shouldn't try to answer emails first thing in the morning...).

Splitting the fastq would work, but isn't the recommended way since it will still iterate over every read in the bam, but ignore them because it won't find the signal data. You should provide a coordinate range as the last argument (without -w though):

nanopolish eventalign --reads all.fastq --bam all.bam --genome genome.fa chrA:0-1,000,000
nanopolish eventalign --reads all.fastq --bam all.bam --genome genome.fa chrA:1,000,000-2,000,000
[...]

mmiladi · 2020-04-25T19:56:01Z

Thanks a lot for your prompt supports. The coordinate option hint would be very life (time) saving :-)

mmiladi · 2020-05-07T11:32:17Z

Hi @jts ,

I have got stumbled on the expected input of the eventalign range option. There are cases where the output tsv is empty with no errors:

nanopolish eventalign --reads seq.fastq.gz --bam align.bam --genome ref.fa --samples --print-read-names --scale-events chr:21000-22000

[bam process] iterating over region:chr:21000-22000                                                                                                                

[post-run summary] total reads: 17556, unparseable: 0, qc fail: 2, could not calibrate: 0, no alignment: 1, bad fast5: 0

Here, I have spliced reads with 5'end at the upstream of position 21000, but all the reads fully cover the range 21000-22000. It seems, though not so sure, I only get the aligned events if I use a start range that covers the 5'end of the read. Is it the expected behavior?
Is there a way to parallelize over a region for all the reads that have (partial or complete) bases aligned to the region?
Best,
-M

mmiladi closed this as completed Apr 25, 2020

mmiladi reopened this May 7, 2020

matthewstuartedwards mentioned this issue Apr 23, 2023

Parallelize nanopolish eventalign nf-core/nanoseq#236

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about eventalign parallelization at file level #770

Question about eventalign parallelization at file level #770

mmiladi commented Apr 25, 2020 •

edited

Loading

jts commented Apr 25, 2020 via email

mmiladi commented Apr 25, 2020 •

edited

Loading

jts commented Apr 25, 2020

mmiladi commented Apr 25, 2020

mmiladi commented May 7, 2020

Question about eventalign parallelization at file level #770

Question about eventalign parallelization at file level #770

Comments

mmiladi commented Apr 25, 2020 • edited Loading

jts commented Apr 25, 2020 via email

mmiladi commented Apr 25, 2020 • edited Loading

jts commented Apr 25, 2020

mmiladi commented Apr 25, 2020

mmiladi commented May 7, 2020

mmiladi commented Apr 25, 2020 •

edited

Loading

mmiladi commented Apr 25, 2020 •

edited

Loading