Lane Merging option? #91

apeltzer · 2018-09-13T07:08:45Z

We have quite often some cases where there are more than just one FastQ file per condition, e.g. if samples have been sequenced on more than one lane:

blabla_L001_R1.fastq.gz
blabla_L001_R1.fastq.gz
blabla_L002_R1.fastq.gz
blabla_L002_R1.fastq.gz

in this case single ended.
I thought about a possibility to treat these as a single sample based on the extension and having an option in RNAseq that can be used to specify a lane pattern for example? Would this be of general interest?

Cheers,
Alex

The text was updated successfully, but these errors were encountered:

ewels · 2018-09-13T07:17:49Z

We've talked about the same thing a few times before (SciLifeLab#29, SciLifeLab#98). Although the idea is nice, I'm concerned that it could be easy for such automatic functionality to go wrong silently. We're currently doing this step in our parent master pipeline tool which launches and manages the nextflow runs instead. This is safer for us as we already have the details of which samples are split in our LIMS so can use that directly instead of guessing from filenames.

Having said this - with the right sanity checks in place and no default (or off by default), it should be fine and would certainly be a useful feature for others.

This could potentially even be added to the template repo, as I'm sure many pipelines could benefit from the same thing.

apeltzer · 2018-09-13T07:36:19Z

I guess I could have a go and simply make this turned off by default so it doesn't mess with other options in general, try to generalize it and then submit that to our template.

apeltzer · 2018-09-13T07:37:49Z

I agree that it can be an issue and that one could as well have a separate "mini pipeline" that simply merged FastQ files together for example. Just a thought but I guess I'll give it a try and check how much effort this takes.

maxulysse · 2018-09-13T07:44:18Z

In Sarek, we are merging such fastq files, but every samples path is defined in a tsv file, so I'm guessing that won't apply here

apeltzer · 2018-09-13T07:59:21Z

Yes I saw that but specifying something like the normal --reads option together with a --laneregex or similar would be the intent here :-)

apeltzer · 2018-09-14T09:31:47Z

Merge and then map fastq files

However, I'm not sure whether we can merge stats files for example in a correct way (e.g. STAR/HISAT2 log files) in such cases. @ewels Does MultiQC Handle such things or can I concatenate stats files for that use case? If that doesn't work I'll just implement the "non-ideal" solution now...

maxulysse · 2018-09-14T09:36:48Z

Not completely unrelated, but I have heard people talking about uBAM
Could it be a solution to consider at some point?

apeltzer · 2018-09-14T10:37:43Z

I think so - certainly for things like Sarek and ExoSeq!

ewels · 2018-09-16T14:08:35Z

Non-ideal for now I think. I don't think that mapping split RNA seq FastQ files will make much difference to the speed in practice - projects are typically quite a lot of samples (more than WGS I'd argue) so already parallelised well. And yes, it'll make reporting and everything quite a lot trickier. MultiQC can't really handle these cases well currently.

lconde-ucl · 2018-11-14T12:25:38Z

I agree that it can be an issue and that one could as well have a separate "mini pipeline" that simply merged FastQ files together for example. Just a thought but I guess I'll give it a try and check how much effort this takes.

Hi, do you know if there is such a "mini pipeline" currently available in nextflow? Thanks!

ewels · 2018-12-17T12:30:50Z

NB: This could be added in with the tsv sample input described in #123

ewels · 2018-12-17T12:32:20Z

Ideal solution:

map independent lanes and merge the BAM files

@apeltzer - I'm pretty sure that we can't do this. Usually RNA-seq aligners start by doing non-spliced alignments and building a gene / exon model from this, to be used for a second round of spliced alignments. As such, you want to use as much data as possible in that first step, so the lanes should be merged prior to alignment.

apeltzer · 2018-12-17T12:36:30Z

Yes you're right. For DNA it makes sense to speed up computation (mapping etc) but for RNA-seq alignment it doesn't make sense.

I edited my comment on top...

apeltzer · 2019-12-19T14:11:55Z

So, I will close this now. For some future use-cases, there is something like this and with upcoming nextflow modules, we can even allow users to perform merging by adding optional subworkflows for such specific use-cases, e.g. this one here: https://github.com/czbiohub/fastqcat

Doesn't make any sense to implement it here then ;-)

drpatelh · 2020-02-20T17:28:54Z

Reopening this issue. I swear it was here but couldnt find it 😅 This has come up again at The Crick so we should probably wait until we have an alternative solution to close 👍 cc @lDesiree

apeltzer · 2020-03-06T09:32:43Z

Maybe we should shift this over to the demultiplexing pipeline? What are thoughts on that ...?

drpatelh · 2020-03-06T15:26:42Z

I think we should still have this functionality in the pipeline at some point because it will also allow users to supply pre-demultiplexed data in this format.

drpatelh · 2020-08-24T16:27:10Z

Functionality to add design file input has now been added in #459 so it should now be relatively straightforward to cat FastQ files within the pipeline using an approach similar to this.

drpatelh · 2020-08-26T11:32:15Z

Functionality for this has now been added here -> 5b2e4ca

ewels added the feature-request label Sep 13, 2018

apeltzer closed this as completed Dec 19, 2019

drpatelh reopened this Feb 20, 2020

drpatelh added this to the 1.5 milestone Aug 24, 2020

drpatelh closed this as completed Aug 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lane Merging option? #91

Lane Merging option? #91

apeltzer commented Sep 13, 2018

ewels commented Sep 13, 2018

apeltzer commented Sep 13, 2018

apeltzer commented Sep 13, 2018

maxulysse commented Sep 13, 2018

apeltzer commented Sep 13, 2018

apeltzer commented Sep 14, 2018 •

edited

Loading

maxulysse commented Sep 14, 2018

apeltzer commented Sep 14, 2018

ewels commented Sep 16, 2018

lconde-ucl commented Nov 14, 2018

ewels commented Dec 17, 2018

ewels commented Dec 17, 2018

apeltzer commented Dec 17, 2018 •

edited

Loading

apeltzer commented Dec 19, 2019

drpatelh commented Feb 20, 2020 •

edited

Loading

apeltzer commented Mar 6, 2020

drpatelh commented Mar 6, 2020

drpatelh commented Aug 24, 2020

drpatelh commented Aug 26, 2020

Lane Merging option? #91

Lane Merging option? #91

Comments

apeltzer commented Sep 13, 2018

ewels commented Sep 13, 2018

apeltzer commented Sep 13, 2018

apeltzer commented Sep 13, 2018

maxulysse commented Sep 13, 2018

apeltzer commented Sep 13, 2018

apeltzer commented Sep 14, 2018 • edited Loading

maxulysse commented Sep 14, 2018

apeltzer commented Sep 14, 2018

ewels commented Sep 16, 2018

lconde-ucl commented Nov 14, 2018

ewels commented Dec 17, 2018

ewels commented Dec 17, 2018

apeltzer commented Dec 17, 2018 • edited Loading

apeltzer commented Dec 19, 2019

drpatelh commented Feb 20, 2020 • edited Loading

apeltzer commented Mar 6, 2020

drpatelh commented Mar 6, 2020

drpatelh commented Aug 24, 2020

drpatelh commented Aug 26, 2020

apeltzer commented Sep 14, 2018 •

edited

Loading

apeltzer commented Dec 17, 2018 •

edited

Loading

drpatelh commented Feb 20, 2020 •

edited

Loading