Skip to content

papaemmelab/toil_cnacs

Repository files navigation

toil_cnacs

pypi badge travis badge codecov badge docker badge docker badge code formatting

toil pipeline for CNACS

Contents

Usage

toil_cnacs CLI is divided into 3 steps; generate_pool, finalise_pool, and run.

  • generate_pool to create the reference files for a pool of normals
  • finalise_pool to confirm the thresholds for a pool of normals
  • run to run copy number analysis for tumor samples

Notice its required that you use a different jobstore for each sub-command, please see:

toil_cnacs --help

Currently only Targeted Panels and hg19 fasta files are supported Bam files can be gr37 or hg19

Docker and Singularity are supported:

# run with docker
toil_cnacs [STEP] [TOIL-OPTIONS] [PIPELINE-OPTIONS]
    --docker papaemmelab/docker-cnacs
    --volumes <local path> <container path>

# run with singularity
toil_cnacs [STEP] [TOIL-OPTIONS] [PIPELINE-OPTIONS]
    --singularity docker://papaemmelab/docker-cnacs
    --volumes <local path> <container path>

Installation

To install:

git clone git@github.com:papaemmelab/toil_cnacs.git
cd toil_cnacs
pip install .

Generate Pool of Normals

This subfunction will allow you to create pool of normals for a specific panel. Use 5-10 normal samples of varying gender. Example:

toil_cnacs generate_pool \
    {pool_dir}/jobstore_generate_pool \
    --stats \
    --writeLogs {pool_dir}/toil_logs \
    --logFile {pool_dir}/toil_logs.txt \
    --outdir {pool_dir} \
    --probe_bed {panel bed} \
    --fasta {hg19 reference fasta} \
    --pool_samp {normal1 bam} {normal1 gender} \
    --pool_samp {normal2 bam} {normal2 gender} \
    ...

Once you have generated your pool, use the pdf images in outdir/stats to the thresholds in outdir/stats/threshold.txt

Finalise Pool of Normals

This subfunction will finalise your thresholds for your pool of normals. Be sure that you have gone through the images in outdir/stats and set the thresholds in outdir/stats/threshold.txt

toil_cnacs finalise_pool \
    {pool_dir}/jobstore_finalise_pool \
    --stats \
    --writeLogs {pool_dir}/toil_logs \
    --logFile {pool_dir}/toil_logs.txt \
    --outdir {pool_dir} \
    --fasta {hg19 reference fasta}

Run CN Analysis

After you have generated and finalised your pool of normals for your panel, you can run the main pipeline on any number of tumors. Make sure to set pool_dir to the location of your pool output directory --samp flag can be used to specify tumor bams and/or --samp_file can be used to pass a file with a list of bams.

toil_cnacs run \
    {outdir}/jobstore \
    --stats \
    --writeLogs {outdir}/toil_logs \
    --logFile {outdir}/toil_logs.txt \
    --outdir {outdir} \
    --pool_dir {pool_dir} \
    --fasta {hg19 reference fasta} \
    --samp {tumor1 bam}

Contributing

Contributions are welcome, and they are greatly appreciated, check our contributing guidelines!

Credits

CNACS core developers: Yusuke Shiozawa and Ryunosuke Saiki

CNACS have been described in Yoshizato et al, Blood 2017

This package was created using Cookiecutter and the papaemmelab/cookiecutter-toil project template.