![Snakemake](https://img.shields.io/badge/snakemake- >=3.8.0-brightgreen.svg?style=flat-square)
This is the implementation of KhanLab NGS Pipeline using Snakemake.
The easiest way to get this pipeline is to clone the repository.
git clone https://github.com/patidarr/ngs_pipeline.git
This pipeline is available on NIH biowulf cluster, contact me if you would like to do a test run. The data from this pipeline could directly be ported in OncoGenomics-DB, an application created to visualize NGS data available to NIH users.
mutt
gnu parallel
SLURM or PBS for resource management
Bioinformatics Tools Listed in config files
- Sample names cannot have "/" or "." in them
- Fastq files end in ".fastq.gz"
- Fastq files are stored in DATA_DIR (Set as Environment Variable)
- QC
- BWA, Novoalign
- Broad Standard Practices on bwa bam
- Haplotype Caller, Platupys, Bam2MPG, MuTect, Strelka
- snpEff, Annovar, SIFT, pph2, Custom Annotation
- Coverage Plot, Circos Plot, Hotspot Coverage Box Plot
- Create input format for oncogenomics database (Patient Level)
- Make Actionable Classification for Germline and Somatic Mutations
- Copy number based on the simple T/N LogRatio (N cov >=30), Corrected for Total # Reads
- Copy number, tumor purity using sequenza
- LRR adjusted to center
- Contamination using conpair
- HLA Typing
- Neoantigen Prediction
- pVAC-Seq
methods: NNalign,NetMHC,NetMHCIIpan,NetMHCcons,NetMHCpan,PickPocket,SMM,SMMPMBEC,SMMalign
epitope length: 8,9,10,11
- pVAC-Seq
methods: NNalign,NetMHC,NetMHCIIpan,NetMHCcons,NetMHCpan,PickPocket,SMM,SMMPMBEC,SMMalign
- QC
- Tophat, STAR
- Broad Standard Practices on STAR bam
- fusion-catcher, tophat-fusion, deFuse
- Cufflinks (ENS and UCSC)
- Rsubread TPM (ENS, UCSC), Gene, Transcript and Exon Level
- In-house Exon Expression (ENS and UCSC)
- Haplotype Caller
- snpEff, Annovar, SIFT, pph2, Custom Annotation
- Actionable Fusion classification
- Genotyping On Patient. 1000g sites are evaluated for every library and then compared (all vs all) If two libraries come from a patient the match should be pretty good >80%
- Still to develop: If the match is below a certain threshold, break the pipeline for patient.
Rulegraph