Permalink
Browse files

instructions for each snakefile

  • Loading branch information...
gavinha committed Aug 7, 2018
1 parent 316292a commit e908825be33f262e78eea3ef5884aa0a3bfb46fe
Showing with 22 additions and 12 deletions.
  1. +22 −12 README.md
@@ -8,7 +8,7 @@ Viswanathan SR*, Ha G*, Hoff A*, et al. Structural Alterations Driving Castratio
Gavin Ha
Fred Hutchinson Cancer Research Center
contact: <gavinha@gmail.com> or <gha@fredhutch.org>
Date: August 3, 2018
Date: August 7, 2018
## Requirements
### Software packages or libraries
@@ -65,34 +65,44 @@ snakemake -s TitanCNA.snakefile -np
# run the workflow locally using 5 cores
snakemake -s TitanCNA.snakefile --cores 5
```
Users can use launch the jobs on a cluster.
An implementation that works with Broad UGER (qsub) is provided. Parameters for memory, runtime, and parallel environment can be specified directly in the snakemake files; default values for each rule has already been set in `params` within the [config.yaml](config/config.yaml) and the command below can be used as-is. Other cluster parameters can be set directly in [cluster.sh](config/cluster.sh).
Note: users will need to adjust these for use with their cluster-specific settings
Users can launch the snakemake jobs to a cluster.
An implementation that works with Broad UGER (qsub) is provided.
Parameters for memory, runtime, and parallel environment can be specified directly in the snakemake files; default values for each rule has already been set in `params` within the [config.yaml](config/config.yaml) and the command below can be used as-is.
Other cluster parameters can be set directly in [cluster.sh](config/cluster.sh).
*Note: users will need to adjust these for use with their cluster-specific settings*
```
snakemake -s TitanCNA.snakefile --cluster-sync "qsub -l h_vmem={params.mem},h_rt={params.runtime} {params.pe}" -j 50 --jobscript config/cluster.sh
```
## 2. Invoking individual steps in the workflow
Users can run the snakemake files individually. This can be helpful for testing each step or if you only wish to generate results for a particular step. The snakefiles need to be run in this same order since input files are generated by the previous steps.
# a. [moleculeCoverage.snakefile](moleculeCoverage.snakefile)
This part of the workflow
### a. [moleculeCoverage.snakefile](moleculeCoverage.snakefile)
i. Run [bxtools](https://github.com/walaj/bxtools) to compute counts of unique molecules in each window.
ii. Perform GC-content bias correction for barcode counts.
iii. Perform ichorCNA analysis to generate initial molecule coverage-based copy number. For male samples, chrX results will be used from this step.
```
snakemake -s moleculeCoverage.snakefile -np
snakemake -s moleculeCoverage.snakefile --cores 5
# OR
snakemake -s moleculeCoverage.snakefile --cluster-sync "qsub -l h_vmem={params.mem},h_rt={params.runtime} {params.pe}" -j 50 --jobscript config/cluster.sh
```
# b. [getPhasedAlleleCounts.snakefile](getPhasedAlleleCounts.snakefile)
### b. [getPhasedAlleleCounts.snakefile](getPhasedAlleleCounts.snakefile)
i. Read the Long Ranger output file `*phased_variants.vcf.gz` and extract heterozygous SNP sites (that overlap a SNP database, e.g. [hapmap_3.3.hg38.vcf.gz](https://storage.cloud.google.com/genomics-public-data/resources/broad/hg38/v0/hapmap_3.3.hg38.vcf.gz?_ga=2.110868357.-1633399588.1531762721)). You can find all the hg38 reference files here https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0
i. Extract the allelic read counts from the Long Ranger tumor bam file `phased_possorted_bam.bam` for each chromosome.
iii. Cat the allelic read counts from each chromosome file into a single counts file.
```
snakemake -s getPhasedAlleleCounts.snakefile -np
snakemake -s getPhasedAlleleCounts.snakefile --cores 5
# OR
snakemake -s getPhasedAlleleCounts.snakefile --cluster-sync "qsub -l h_vmem={params.mem},h_rt={params.runtime} {params.pe}" -j 50 --jobscript config/cluster.sh
```
### c. [TitanCNA.snakefile](TitanCNA.snakefile)
i. Run the [TitanCNA](https://github.com/gavinha/TitanCNA) analysis and generates solutions for different ploidy initializations and each clonal cluster.
ii. Merge results with ichorCNA output generate by [moleculeCoverage.snakefile](moleculeCoverage.snakefile) and post-processes copy number results.
iii. Select optimal solution for each samples and copies these to a new folder. The parameters are compiled in a text file.

0 comments on commit e908825

Please sign in to comment.