-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lane Merging option? #91
Comments
We've talked about the same thing a few times before (SciLifeLab#29, SciLifeLab#98). Although the idea is nice, I'm concerned that it could be easy for such automatic functionality to go wrong silently. We're currently doing this step in our parent master pipeline tool which launches and manages the nextflow runs instead. This is safer for us as we already have the details of which samples are split in our LIMS so can use that directly instead of guessing from filenames. Having said this - with the right sanity checks in place and no default (or off by default), it should be fine and would certainly be a useful feature for others. This could potentially even be added to the template repo, as I'm sure many pipelines could benefit from the same thing. |
I guess I could have a go and simply make this turned off by default so it doesn't mess with other options in general, try to generalize it and then submit that to our template. |
I agree that it can be an issue and that one could as well have a separate "mini pipeline" that simply merged FastQ files together for example. Just a thought but I guess I'll give it a try and check how much effort this takes. |
In Sarek, we are merging such fastq files, but every samples path is defined in a tsv file, so I'm guessing that won't apply here |
Yes I saw that but specifying something like the normal |
Merge and then map fastq files However, I'm not sure whether we can merge stats files for example in a correct way (e.g. STAR/HISAT2 log files) in such cases. @ewels Does MultiQC Handle such things or can I concatenate stats files for that use case? If that doesn't work I'll just implement the "non-ideal" solution now... |
Not completely unrelated, but I have heard people talking about uBAM |
I think so - certainly for things like Sarek and ExoSeq! |
Non-ideal for now I think. I don't think that mapping split RNA seq FastQ files will make much difference to the speed in practice - projects are typically quite a lot of samples (more than WGS I'd argue) so already parallelised well. And yes, it'll make reporting and everything quite a lot trickier. MultiQC can't really handle these cases well currently. |
Hi, do you know if there is such a "mini pipeline" currently available in nextflow? Thanks! |
NB: This could be added in with the tsv sample input described in #123 |
@apeltzer - I'm pretty sure that we can't do this. Usually RNA-seq aligners start by doing non-spliced alignments and building a gene / exon model from this, to be used for a second round of spliced alignments. As such, you want to use as much data as possible in that first step, so the lanes should be merged prior to alignment. |
Yes you're right. For DNA it makes sense to speed up computation (mapping etc) but for RNA-seq alignment it doesn't make sense. I edited my comment on top... |
So, I will close this now. For some future use-cases, there is something like this and with upcoming nextflow modules, we can even allow users to perform merging by adding optional subworkflows for such specific use-cases, e.g. this one here: https://github.com/czbiohub/fastqcat Doesn't make any sense to implement it here then ;-) |
Reopening this issue. I swear it was here but couldnt find it 😅 This has come up again at The Crick so we should probably wait until we have an alternative solution to close 👍 cc @lDesiree |
Maybe we should shift this over to the |
I think we should still have this functionality in the pipeline at some point because it will also allow users to supply pre-demultiplexed data in this format. |
Functionality for this has now been added here -> 5b2e4ca |
We have quite often some cases where there are more than just one FastQ file per condition, e.g. if samples have been sequenced on more than one lane:
in this case single ended.
I thought about a possibility to treat these as a single sample based on the extension and having an option in RNAseq that can be used to specify a lane pattern for example? Would this be of general interest?
Cheers,
Alex
The text was updated successfully, but these errors were encountered: