Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nf-core radseq design #1

Open
18 of 27 tasks
remiolsen opened this issue Aug 6, 2018 · 0 comments
Open
18 of 27 tasks

nf-core radseq design #1

remiolsen opened this issue Aug 6, 2018 · 0 comments

Comments

@remiolsen
Copy link
Owner

remiolsen commented Aug 6, 2018

This is built on https://github.com/remiolsen/NGI-RADseqQC

This is a major rewrite to make this pipeline harmonious with nf-core, update the tools used (Stacks 2.0, remove read-joining, etc). Also a few other points that have been on my wishlist for improving usability.

Main tasks

  • Think of a cool new name
  • Make cookiecutter template
  • Port over some of the processes from https://github.com/remiolsen/NGI-RADseqQC
  • Use Stacks 2.0
  • Write a tool to scrape the Stacks logfiles for useful stats
  • Others

Core pipeline tasks

  • Remove FLASH
  • Make a dockerfile. Is Stacks 2.0 on bioconda?
  • Write a python script to parse denovo stacks to get: coverage, raw # sample loci, catalog loci per sample, "shared" loci histogram. Parse process_radtags also?
  • Make a MultiQC configuration to import this data
  • Get publically available data from ENA. Make proper test data.
  • Make a MultiQC module for Stacks >= 2.0

Polish

  • Make a GH release
  • Documentation, documentation, documentation
  • Travis-CI
  • Python3 support for in silico digest helper script

Others -- Stretch goals

  • Think about what output files stacks should be creating by default.
  • Let the user specify which output files to create -- Nah the defaults are probably fine -- Nuh-uh we need more!
    • genepop
    • structure
  • Scripts for running the Stacks web UI -- It's been removed in v >= 2.0
  • Pick a set of “best practice” parameters for Stacks and run all of these.
  • Clearly report r80 statistic of each run, i.e # of polymorhic loci shared by at least 80% of individuals in the population -- http://doi.org/10.1111/2041-210X.12775
  • Support running Stacks with a reference genome
  • Support for premade population map file
  • Support for already processed reads (skipping trimming and process_reads)
  • Option to not output trimmed and/or processed fastq files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant