2. Usage Instructions

Shyam Rallapalli edited this page Jul 24, 2018 · 14 revisions

Usage

Table of Contents

  1. Input and output parameters
  2. How to run CHERIPIC - command line
  3. How to run CHERIPIC - web app
  4. Details about parameters
  5. Output
  6. Test Data

Input and output parameters

Running cheripic without any input at command line interface shows following help options


Cheripic v1.2.6
Authors: Shyam Rallapalli Martin Page and Dan MacLean

Description: Candidate mutation and closely linked marker selection for non reference genomes
Uses bulk segregant data from non-reference sequence genomes

Inputs:
1. Needs a reference fasta file of asssembly use for variant analysis
2. Pileup/Bam files for mutant (phenotype of interest) bulks and background (wildtype phenotype) bulks
3. If providing bam files, you have to include vcf files for the respective bulks
4. If polyploid species, include pileup/bam files from one or both parents

USAGE:
cheripic <options>

OPTIONS:
  -f, --assembly=<s>               Assembly file in FASTA format
  -F, --input-format=<s>           bulk and parent alignment file format types - set either pileup or bam or vcf (default: pileup)
  -a, --mut-bulk=<s>               Pileup or sorted BAM file alignments from mutant/trait of interest bulk 1
  --mut-bulk-vcf=<s>               vcf file for variants from mutant/trait of interest bulk 1 (default: )
  -b, --bg-bulk=<s>                Pileup or sorted BAM file alignments from background/wildtype bulk 2
  --bg-bulk-vcf=<s>                vcf file for variants from background/wildtype bulk 2 (default: )
  --output=<s>                     custom name tag to include in the output file name (default: cheripic_results)
  --loglevel=<s>                   Choose any one of "info / warn / debug" level for logs generated (default: debug)
  --hmes-adjust=<f>                factor added to snp count of each contig to adjust for hme score calculations (default: 0.5)
  --htlow=<f>                      lower level for categorizing heterozygosity (default: 0.2)
  --hthigh=<f>                     high level for categorizing heterozygosity (default: 0.9)
  --mindepth=<i>                   minimum read depth at a position to consider for variant calls (default: 6)
  --max-d-multiple=<i>             multiplication factor for average coverage to calculate maximum read coverage
                                   if set zero no calculation will be made from bam file.
                                   setting this value will override user set max depth (Default: 5)
  --maxdepth=<i>                   maximum read depth at a position to consider for variant calls
                                   if set to zero no user max depth will be used (default: 0)
  --min-non-ref-count=<i>          minimum read depth supporting non reference base at each position (default: 3)
  --min-indel-count-support=<i>    minimum read depth supporting an indel at each position (default: 3)
  --ambiguous-ref-bases=<s>        including variant at completely ambiguous bases in the reference (default: false)
  -q, --mapping-quality=<i>        minimum mapping quality of read covering the position (default: 20)
  -Q, --base-quality=<i>           minimum base quality of bases covering the position (default: 15)
  --noise=<f>                      praportion of reads for a variant to conisder as noise (default: 0.1)
  --cross-type=<s>                 type of cross used to generated mapping population - back or out (default: back)
  --use-all-contigs=<s>            option to select all contigs or only contigs containing variants for analysis (default: false)
  --include-low-hmes=<s>           option to include or discard variants from contigs with
                                   low hme-score or bfr score to list in the final output (default: false)
  --polyploidy=<s>                 Set if the data input is from polyploids (default: false)
  -p, --mut-parent=<s>             Pileup or sorted BAM file alignments from mutant/trait of interest parent (default: )
  -r, --bg-parent=<s>              Pileup or sorted BAM file alignments from background/wildtype parent (default: )
  -R, --repeats-file=<s>           repeat masker output file for the assembly  (default: )
  --bfr-adjust=<f>                 factor added to hemi snp frequency of each parent to adjust for bfr calculations (default: 0.05)
  --sel-seq-len=<i>                sequence length to print from either side of selected variants (default: 50)
  --examples                       shows some example commands with explanation

How to run CHERIPIC [command line]

Simple use case

cheripic -f assembly.fa -a mutbulk.pileup -b bgbulk.pileup --output=cheripic_output

or

cheripic --assembly assembly.fa --mut-bulk mutbulk.pileup --bg-bulk bgbulk.pileup --output cheripic_results

A bit more parameters to play with

Use of vcf file as input

cheripic --assembly assembly.fa --input-format vcf --mut-bulk mutbulk.vcf --bg-bulk bgbulk.vcf --output cheripic_results

Use of vcf file as input

cheripic --assembly assembly.fa --input-format vcf --mut-bulk mutbulk.vcf --bg-bulk bgbulk.vcf --output cheripic_results

Experimental feature - using polyploid data

cheripic --assembly assembly.fa --mut-bulk mutbulk.pileup --bg-bulk bgbulk.pileup 
        --mut-parent mutparent.pileup --bg-parent bgparent.pileup --polyploidy true --output cheripic_results

How to run CHERIPIC [web app]

CHERIPIC web app is available at http://cheripic.tsl.ac.uk/

Web app inputs

Parameter details

-f, --assembly

Assembly file in FASTA format

-F, --input-format

bulk and parent alignment file format types - set either pileup or vcf or bam (default: pileup)

-a, --mut-bulk

Pileup or sorted BAM file alignments from mutant/trait of interest bulk 1

-b, --bg-bulk

Pileup or sorted BAM file alignments from background/wild type bulk 2

--mut-bulk-vcf

vcf file for variants from mutant/trait of interest bulk 1. This is needed when bulk input-format is bam

--bg-bulk-vcf

vcf file for variants from background/wild type bulk 2. This is needed when bulk input-format is bam

--output

custom name tag to include to the output file name (default: cheripic_results)

--loglevel

Choose any one of "info / warn / debug" level for logs generated (default: debug)

--hmes-adjust

factor added to variant count of each contig to adjust for HMES calculations, especially deal with divisions with zeros (default: 0.5)

--htlow

Lower limit of allele frequency to categorise a variant as heterozygous (default: 0.2)

--hthigh

Upper limit of allele frequency to categorise a variant as heterozygous (default: 0.9)

--mindepth

Minimum read depth at a position to be consider as variant for downstream analysis (default: 6)

--max-d-multiple

Multiplication factor for average coverage to calculate maximum read coverage. If set zero no calculation will be made from bam file. Setting this value will override user set max depth (Default: 5)

--maxdepth

Maximum read depth at a position to consider for variant calls. If set to zero no user max depth will be used (default: 0)

--min-non-ref-count

minimum read depth supporting non reference base at each position (default: 3)

--min-indel-count-support

minimum read depth supporting an indel at each position (default: 3)

--ambiguous-ref-bases

including variant at completely ambiguous bases in the reference (default: false)

-q, --mapping-quality

minimum mapping quality of read covering the position (default: 20)

-Q, --base-quality

minimum base quality of bases covering the position (default: 15)

--noise

proportion of reads for a variant to consider as noise (default: 0.1)

--cross-type

type of cross used to generated mapping population - back or out (default: back)

--use-all-contigs

option to select all contigs or only contigs containing variants for analysis (default: false)

--include-low-hmes

option to include or discard variants from contigs with low hme-score or bfr score to list in the final output (default: false)

--polyploidy

Set if the data input is from polyploids (default: false)

-p, --mut-parent

Pileup or sorted BAM file alignments from mutant/trait of interest parent (default: )

-r, --bg-parent

Pileup or sorted BAM file alignments from background/wildtype parent (default: )

-R, --repeats-file

repeat masker output file for the assembly (default: )

--bfr-adjust

factor added to hemi snp frequency of each parent to adjust for bfr calculations (default: 0.05)

--sel-seq-len

sequence length to print from either side of selected variants (default: 50)

--examples

shows some example commands with explanation

Output File

Output from CHERIPIC is a tab-delimited file with following 12 columns of information about the variants selected -

  1. HMES - homozygosity enrichment score of the selected variant
  2. AlleleFreq - allele frequency at variant position,
  3. length - length of the contig,
  4. seq_id - id of the contig,
  5. position - variant position in the contig,
  6. ref_base - reference base,
  7. coverage - read coverage
  8. bases - read bases
  9. base_quals - base qualities,
  10. sequence_left - sequence on the left side of variant,
  11. Alt_seq - variant allele,
  12. sequence_right - sequence on the right side of variant.

Left and right sequences are provided to easily design markers and sequence lengths can user adjusted to retrieve enough sequence information. By default 50 bases on either side of variant are provided, this can be edited to retrieve longer sequences.

Test data set

Download test data

  • Test data folder has following files

    1. input_assembly_file.fa
    2. mutant_bam_file.bam
    3. mutant_pileup_file.pileup
    4. mutant_vcf_file.vcf
    5. wildtype_bam_file.bam
    6. wildtype_pileup_file.pileup
    7. wildtype_vcf_file.vcf
  • input_assembly_file.fa as name suggests is the assembly file to use

  • mutant and wild type bulks files start with mutant_ and wildtype_, respectively, in their file names.

  • We have provided pileup, vcf and bam files for mutant bulks and wild type bulks to be able to test using different bulk input options

  • Example commands to use data in this folder

pileup inputs

./cheripic -f test_data/input_assembly_file.fa -a test_data/mutant_pileup_file.pileup -b test_data/wildtype_pileup_file.pileup

vcf inputs

./cheripic -F vcf -f test_data/input_assembly_file.fa -a test_data/mutant_vcf_file.vcf -b test_data/wildtype_vcf_file.vcf 
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.