Add modules outlined in the pipeline proposal #8

kedhammar · 2024-03-19T13:59:29Z

kedhammar · 2024-03-19T13:59:55Z

#6 PR draft to start adding modules

mahesh-panchal · 2024-03-19T14:34:54Z

For WGS data for assembly, GenomeScope (https://github.com/nf-core/modules/blob/master/modules/nf-core/genomescope2/main.nf). The database is built using Meryl ( also on nf-core ).

But there is also a container only version that's a little bit faster and has extra tools that might be useful (https://github.com/nf-core/modules/blob/master/modules/nf-core/genescopefk/main.nf)
The databases for Merquryfk/KATGC, Merquryfk/KATCOMP, Merqury/Ploidyplot, and GeneScopefk are build using FastK.

remiolsen · 2024-03-19T14:55:50Z

Preseq complexity (which subtool?).

I've used preseq lc_extrap before and there's a module for it in nf-core (https://nf-co.re/modules/preseq_lcextrap). However, it is very prone to not working or rather refusing to give a complexity estimate.

Another option would be Picard (https://gatk.broadinstitute.org/hc/en-us/articles/360037591931-EstimateLibraryComplexity-Picard). I've never used it, and for the applications I worry about library complexity (HiC) the tool I use (pairtools) implemented it's own complexity estimate, so I have no need. There's no nf-core module for it as far as I can see.

kedhammar · 2024-03-19T15:23:56Z

Preseq complexity (which subtool?).

I've used preseq lc_extrap before and there's a module for it in nf-core (https://nf-co.re/modules/preseq_lcextrap). However, it is very prone to not working or rather refusing to give a complexity estimate.

Another option would be Picard (https://gatk.broadinstitute.org/hc/en-us/articles/360037591931-EstimateLibraryComplexity-Picard). I've never used it, and for the applications I worry about library complexity (HiC) the tool I use (pairtools) implemented it's own complexity estimate, so I have no need. There's no nf-core module for it as far as I can see.

@remiolsen any idea why preseq lc_extrap tends to refuse?

remiolsen · 2024-03-19T15:34:53Z

@remiolsen any idea why preseq lc_extrap tends to refuse?

I'm fairly certain I used to see this error most commonly - and I quote from the preseq manual

Q — When running lc extrap, I receive the error
ERROR: too many iterations, poor sample

A. — Most commonly this is due to the presence of defects in the approximation which cause the
estimates to be unstable. Setting the step size larger (with the flag -s) will help to avoid the
defects. The default step size is 1M reads or 0.05% of the input sample size rounded up to the
nearest million, whichever is larger. A consequence of this action will be a reduction in the
observed smoothness of the curve.

And setting the step -s flag was a little bit hit or miss if it worked.

kedhammar · 2024-05-13T12:37:05Z

Closed #6 due to being too broad and unspecific. Feel free to start new PRs addressing more specific implementations.

kedhammar added the enhancement New feature or request label Mar 19, 2024

kedhammar mentioned this issue Mar 19, 2024

Add modules #6

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add modules outlined in the pipeline proposal #8

Add modules outlined in the pipeline proposal #8

kedhammar commented Mar 19, 2024 •

edited

Loading

kedhammar commented Mar 19, 2024

mahesh-panchal commented Mar 19, 2024

remiolsen commented Mar 19, 2024 •

edited

Loading

kedhammar commented Mar 19, 2024

remiolsen commented Mar 19, 2024

kedhammar commented May 13, 2024

Add modules outlined in the pipeline proposal #8

Add modules outlined in the pipeline proposal #8

Comments

kedhammar commented Mar 19, 2024 • edited Loading

Functionalities and modules

Mentioned in the pipeline proposal

Standard QC

Duplication + Complexity

Adapter and Artifact detection

Contamination detection

Mentioned in the pipeline Slack channel

kedhammar commented Mar 19, 2024

mahesh-panchal commented Mar 19, 2024

remiolsen commented Mar 19, 2024 • edited Loading

kedhammar commented Mar 19, 2024

remiolsen commented Mar 19, 2024

kedhammar commented May 13, 2024

kedhammar commented Mar 19, 2024 •

edited

Loading

remiolsen commented Mar 19, 2024 •

edited

Loading