Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining multiple libraries #20

Closed
harish0201 opened this issue Dec 16, 2019 · 5 comments
Closed

Combining multiple libraries #20

harish0201 opened this issue Dec 16, 2019 · 5 comments

Comments

@harish0201
Copy link

Hi!

I have a couple of older Illumina datasets (both PE and MP) split across multiple insert sizes and libraries.

Is it possible to pass them as a single argument, as I think that'd make life easy.

Would a FIFO sort of approach work? Or should I give a file of files (FOF) ? I believe that the FOF approach works only for SE-Reads rather than PE.

If needed I can pass MP libraries later to scaffold only.

What would you suggest?

@rchikhi
Copy link
Member

rchikhi commented Jan 18, 2020

Hi! Apologies for the delayed answer. Indeed the FOF strategy doesn't work as is.
You'll need to

  1. if applicable, for each insert size, concatenate all the left (resp. right) files from that insert size into a single left (resp. right) file
  2. pass as argument each insert size library, e.g. -1 left_insertA.fq.gz -2 right_insertA.gz --mp-1 left_insertB.fq.gz --mp-2 right_insertB.fq.gz in order from smaller insert size to larger (ie in my example, insert size A is smaller than B)

Alternatively:

  1. run gatb-pipeline a FOF of all libraries in any order. This will produce contigs but no scaffold. Note that these will be the same contigs as if you had specified that librairies were paired/mate-pairs, as Minia does not care about pairing when making contigs. Then you can run BESST stand-alone (using the gatb-pipeline script by using the -c argument, or just the BESST program, or any other scaffolder) manually using the contigs produced by gatb-pipeline.

@harish0201
Copy link
Author

Thank you for the suggestion and apologies for the delayed response!

I did use a mixture of the two options though. I concatenated the smaller insert fastqs and used the mate-pair separately.

@rchikhi
Copy link
Member

rchikhi commented Jan 30, 2020

sounds good, did it work?

@harish0201
Copy link
Author

Yup, it did! I got a decent contiguity as well.

Got the 2.5Gb genome in 9876 (no joke) scaffolds over 1kb in length with an N50 of 1.83Mb. Having a 8Kb and 20Kb insert MP libraries did help a lot combined with dual scaffolding.

@rchikhi
Copy link
Member

rchikhi commented Feb 1, 2020

very nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants