Skip to content

3. QC and parameters

Marit Hetland edited this page Apr 1, 2021 · 15 revisions

Quality check

In the output report the genomes will be given a QC_status status of PASS, WARN or FAIL.

PASS

For a genome to pass the QC, it must:

  • Have at least 97% of bases confidently called
    • Where a confident call = at least 20X per base
  • And as a general rule we do not sequence genomes >30 Ct
  • There must be a stretch of more than 10Kb of sequence without N

WARN

When genomes have passed the above QC criteria, a WARN is given instead of FAIL if:

  • 90-97% of bases are confidently called
  • If there is not a stretch of more than 10Kb of sequence without N
  • If the nextclade report scores QC as 'bad'

If a genome has been given QC_status WARN it means you should do some manual investigation to check that it is OK to be used. WARN does not necessarily mean it has to be resequenced, but it means that perhaps you have an unexpectedly high number of mutations or nonACTGN bases (for more details see nextclades "Quality Control (QC)". For manual investigation, it can be useful to load your genome in to https://clades.nextstrain.org/ and view the mutations and the file in the tree with the reference genomes.

We have found that genomes with <10Kb stretch of sequence without Ns are often just below 10Kb when the coverage is above 90%, and often far lower when coverage is <90%, meaning this is captured with the coverage threshold.

We have added WARN for genomes with 90-97% coverage so the results may still be viewed (i.e. QC, pangolin and nextclade), and so that any sample with 90-97% coverage may be re-analysed without normalising (with flag --renormalise=on).

FAIL

  • <90% of bases confidently called

NOTES

When your genomes have passed QC, they should be ready for downstream analyses (e.g. lineage assignment, phylogeny) and uploading to GISAID.

Note: These QC thresholds have been decided based on recommendations from the Artic network and nextclade. You can change or add parameters as you wish in the file: scripts/articQC.py and then copy it to ncov2019-artic-nf/bin/qc.py

The articQC.py script in this repo is a modified version of the ncov2019-artic-nf/bin/qc.py script. It is modified to include the parameters described above, and scripts/articQC.py needs to be copied to ncov2019-artic-nf/bin/qc.py for the run report to come out correctly.

Parameters

This pipeline runs artic guppyplex and the artic minion pipeline using nanopolish.

  • For artic guppyplex, --min-length 400 --max-length 700 are used (as default in the artic nextflow pipeline)
  • For artic minion, normalising is enabled with value --normalise 500 (default: --normalise 100)
    • We found that 500 produces better consensus sequences than 100 and does not add an unreasonable amount of running time
  • The most recent release of artic will be used when installing with ./scripts/install.sh
  • Otherwise parameters are default as stated in the https://github.com/connor-lab/ncov2019-artic-nf/conf/*.config files
  • If you want to change any of the parameters that are run with artic guppyplex or artic minion, you can do so in the nanopore.config file which you can find in <install_location>/ncov2019-artic-nf/conf/nanopore.config.

Update:

  • 2021-04-01: Added WARN instead of only PASS/FAIL to several parameters - see notes above.
  • 2021-03-12: Updated the threshold for QC PASS from >90% of bases confidently called (with 20X reads) to >97%, based on GISAID and FHI's recommendations.