GitHub - pdxgx/neoepiscope-paper: Scripts for reproducing results from neoepiscope paper

We used WES paired-end fastq files from Bassani-Sternberg et al. [1] for benchmarking. The files are available at the European Genome-phenome Archive (EGA) under accession number EGAS00001002050. We used matched tumor and normal samples from patients Mel5, Mel8, Mel12, Mel15, and Mel16.

Before you start:

Run neoepiscope download, answering yes to downloading/indexing the GENCODE v29 annotation, and yes to downloading the hg38 bowtie index. When asked if you would like to integrate NetMHCpan 4.0, answer yes, and provide the path to the NetMHCpan 4.0 directory you installed above.
Move the VEP cache data to a subdirectory called 'cache' in your VEP main directory
Copy the 'chr_synonyms.txt' file in the VEP cache data to your VEP main directory
Move the VEP Downstream plugin to a subdirectory called 'plugins' in your VEP main directory
For each patient, make a config file for TSNAD. Using our template, replace 'INPUT_FILE_HERE' with the path to that patient's future TSNAD VEP output (e.g. /PATH/TO/THIS/REPOSITORY/Mel5_tumor_v_Mel5_normal.mutect.tumor.annotated.filtered.txt for Mel5). Replace OUTPUT_FOLDER_HERE with the patient-specific TSNAD output directory (e.g. /PATH/TO/THIS/REPOSITORY/Mel5_tsnad/ for Mel5). Replace NETMHC_HERE with the path to your netMHCpan 4.0 directory.
For each patient, make an HLA file for NeoPredPipe. Using our template, replace the two 'PATIENT' slots in the first column of the second row with the patient ID (e.g. Mel5).

Running benchmarking scripts (Benchmarking section of MATERIALS AND METHODS)

We ran our benchmarking on an exclusive node of our institution's computer cluster, using the node's first four processors for multithreading (when possible). Each script benchmarks a different step in the neoepitope calling pipeline, and produces relevant output files from these steps along with files summarizing the CPU information and run time information (we used real time in our assessments). As an example, below are the commands to run the benchmarking for patient Mel5 from within the repository:

1) Benchmark BWA performance for tumor and normal samples:

SCRIPTS:

benchmark_bwa.sh

COMMANDS:

taskset -c 0,1,2,3 ./benchmark_bwa.sh HOME_DIRECTORY Mel5_tumor TUMOR_FASTQ1 TUMOR_FASTQ2 REFERENCE_FASTA BWA SAMTOOLS

taskset -c 0,1,2,3 ./benchmark_bwa.sh HOME_DIRECTORY Mel5_normal NORMAL_FASTQ1 NORMAL_FASTQ2 REFERENCE_FASTA BWA SAMTOOLS

INPUTS:

HOME_DIRECTORY is the path to this repository

TUMOR_FASTQ1/2 and NORMAL_FASTQ1/2 are the paired end tumor and normal fastqs

REFERENCE_FASTA is the path to your reference fasta

BWA is the path to your BWA executable

SAMTOOLS is the path to your samtools executable

OUTPUTS:

CPU info is output in Mel5_tumor.bwa.cpu_data and Mel5_normal.bwa.cpu_data

Run time info is output in Mel5_tumor.bwa.time_log and Mel5_normal.bwa.time_log

2) Benchmark BAM processing for tumor and normal samples

SCRIPTS:

benchmark_markduplicates.sh

benchmark_baserecalibration.sh

COMMANDS:

taskset -c 0,1,2,3 ./benchmark_markduplicates.sh HOME_DIRECTORY Mel5_tumor GATK

taskset -c 0,1,2,3 ./benchmark_markduplicates.sh HOME_DIRECTORY Mel5_normal GATK

taskset -c 0,1,2,3 ./benchmark_baserecalibration.sh HOME_DIRECTORY Mel5_tumor REFERENCE_FASTA DBSNP GATK PICARD

taskset -c 0,1,2,3 ./benchmark_baserecalibration.sh HOME_DIRECTORY Mel5_normal REFERENCE_FASTA DBSNP GATK PICARD

INPUTS:

HOME_DIRECTORY is the path to this repository

REFERENCE_FASTA is the path to your reference fasta

DBSNP is the path to your DBSNP VCF

GATK is the path to your GATK jar file

PICARD is the path to your PICARD executable

OUTPUTS:

CPU info is output in Mel5_tumor.markduplicates.cpu_data, Mel5_normal.markduplicates.cpu_data, Mel5_tumor.baserecalibration.cpu_data, and Mel5_normal.baserecalibration.cpu_data

Run time info is output in Mel5_tumor.markduplicates.time_log, Mel5_normal.markduplicates.time_log, Mel5_tumor.baserecalibration.time_log, and Mel5_normal.baserecalibration.time_log

3) Benchmark somatic variant calling

SCRIPTS:

benchmark_mutect.sh

benchmark_filter_mutect.sh

COMMANDS:

taskset -c 0,1,2,3 ./benchmark_mutect.sh HOME_DIRECTORY Mel5_tumor Mel5_normal REFERENCE_FASTA DBSNP GATK

taskset -c 0,1,2,3 ./benchmark_filter_mutect.sh HOME_DIRECTORY Mel5_tumor Mel5_normal

INPUTS:

HOME_DIRECTORY is the path to this repository

REFERENCE_FASTA is the path to your reference fasta

DBSNP is the path to your DBSNP VCF

GATK is the path to your GATK jar file

OUTPUTS:

CPU info is output in Mel5_tumor_v_Mel5_normal.mutect.cpu_data, Mel5_tumor_v_Mel5_normal.mutect.cpu_data, Mel5_tumor_v_Mel5_normal.filtermutect.cpu_data and Mel5_tumor_v_Mel5_normal.filtermutect.cpu_data,

Run time info is output in Mel5_tumor_v_Mel5_normal.mutect.time_log, Mel5_tumor_v_Mel5_normal.mutect.time_log, Mel5_tumor_v_Mel5_normal.filtermutect.time_log, and Mel5_tumor_v_Mel5_normal.filtermutect.time_log

4) Benchmark germline variant calling

SCRIPTS:

benchmark_haplotypecaller.sh

COMMANDS:

taskset -c 0,1,2,3 ./benchmark_haplotypecaller.sh HOME_DIRECTORY Mel5_normal REFERENCE_FASTA DBSNP GATK

INPUTS:

HOME_DIRECTORY is the path to this repository

REFERENCE_FASTA is the path to your reference fasta

DBSNP is the path to your DBSNP VCF

GATK is the path to your GATK jar file

OUTPUTS:

CPU info is output in Mel5_normal.haplotypecaller.cpu_data

Run time info is output in Mel5_normal.haplotypecaller.time_log

5) Benchmark haplotype phasing for neoepiscope

SCRIPTS:

benchmark_hapcut2.sh

COMMANDS:

taskset -c 0,1,2,3 ./benchmark_hapcut2.sh HOME_DIRECTORY Mel5_tumor Mel5_normal HAPCUT2

INPUTS:

HOME_DIRECTORY is the path to this repository

HAPCUT2 is the path to your HapCUT2 build directory

OUTPUTS:

CPU info is output in Mel5_tumor_v_Mel5_normal.hapcut2.cpu_data

Run time info is output in Mel5_tumor_v_Mel5_normal.hapcut2.time_log

6) Benchmark HLA typing

SCRIPTS:

benchmark_optitype.sh

COMMANDS:

taskset -c 0,1,2,3 ./benchmark_optitype.sh HOME_DIRECTORY Mel5_tumor TUMOR_FASTQ1 TUMOR_FASTQ2 OPTITYPE CONFIG

INPUTS:

HOME_DIRECTORY is the path to this repository

TUMOR_FASTQ1/2 are the paired end tumor fastqs

OPTITYPE is the path to your Optitype python script

CONFIG is the path to your Optitype config file

OUTPUTS:

CPU info is output in Mel5_tumor.optitype.cpu_data

Run time info is output in Mel5_tumor.optitype.time_log

7) Benchmark pVACseq

SCRIPTS:

benchmark_vep.sh

benchmark_pvacseq.sh

COMMANDS:

taskset -c 0,1,2,3 ./benchmark_vep.sh HOME_DIRECTORY Mel5_tumor Mel5_normal VEP_DIRECTORY GATK REFERENCE_FASTA REFERENCE_DICT PICARD BGZIP TABIX

taskset -c 0,1,2,3 ./benchmark_pvacseq.sh HOME_DIRECTORY Mel5_tumor Mel5_normal HLA-A*01:01,HLA-B*08:01,HLA-C*07:01

INPUTS:

HOME_DIRECTORY is the path to this repository

VEP_DIRECTORY is the path to your VEP directory

GATK is the path to your GATK jar file

REFERENCE_FASTA is the path to your reference fasta

REFERENCE_DICT is the path to your reference dictionary file

PICARD is the path to your Picard Tools executable

BGZIP is the path to your bgzip executable

TABIX is the path to your TABIX executable

OUTPUTS:

CPU info is output in Mel5_tumor_v_Mel5_normal.vep.cpu_data and Mel5_tumor_v_Mel5_normal.pvacseq.cpu_data

Run time info is output in Mel5_tumor_v_Mel5_normal.vep.time_log and Mel5_tumor_v_Mel5_normal.pvacseq.time_log

pVACseq neoepitopes are output in Mel5_tumor_v_Mel5_normal.final.tsv

8) Benchmark TSNAD

SCRIPTS:

benchmark_tsnad_vep.sh

benchmark_tsnad.sh

COMMANDS:

taskset -c 0,1,2,3 ./benchmark_tsnad_vep.sh HOME_DIRECTORY Mel5_tumor Mel5_normal VEP_DIRECTORY

taskset -c 0,1,2,3 ./benchmark_tsnad.sh HOME_DIRECTORY Mel5 TSNAD CONFIG

INPUTS:

HOME_DIRECTORY is the path to this repository

VEP_DIRECTORY is the path to your VEP directory

TSNAD is the path to your TSNAD antigen_predicting_pipeline.py script

CONFIG is the path to your TSNAD config file for the patient (see above)

OUTPUTS:

CPU info is output in Mel5_tumor_v_Mel5_normal.vep_tsnad.cpu_data and Mel5_tumor_v_Mel5_normal.tsnad.cpu_data

Run time info is output in Mel5_tumor_v_Mel5_normal.vep_tsnad.time_log and Mel5_tumor_v_Mel5_normal.tsnad.time_log

TSNAD neoepitopes are output in Mel5_tsnad/predicted_neoantigen.txt

9) Benchmark MuPeXI

SCRIPTS:

benchmark_mupexi.sh

COMMANDS:

taskset -c 0,1,2,3 ./benchmark_mupexi.sh HOME_DIRECTORY Mel5_tumor Mel5_normal HLA-A01:01,HLA-B08:01,HLA-C07:01 MUPEXI

INPUTS:

HOME_DIRECTORY is the path to this repository

MUPEXI is the path to your MuPeXI python script

OUTPUTS:

CPU info is output in Mel5_tumor_v_Mel5_normal.mupexi.cpu_data

Run time info is output in Mel5_tumor_v_Mel5_normal.mupexi.time_log

MuPeXI neoepitopes are output in Mel5_tumor_v_Mel5_normal.mupexi

10) Benchmark NeoPredPipe

SCRIPTS:

benchmark_neopredpipe.sh

COMMANDS:

taskset -c 0,1,2,3 ./benchmark_neopredpipe.sh HOME_DIRECTORY Mel5_tumor Mel5_normal NEOPREDPIPE HLA_FILE

INPUTS:

HOME_DIRECTORY is the path to this repository

NEOPREDPIPE is the path to your NeoPredPipe main_netMHCpan_pipe.py script

HLA_FILE is the path to your NeoPredPipe HLA input file (see above)

OUTPUTS:

CPU info is output in Mel5_tumor_v_Mel5_normal.neopredpipe.cpu_data

Run time info is output in Mel5_tumor_v_Mel5_normal.neopredpipe.time_log

NeoPredPipe neoepitopes are output in Mel5_tumor_v_Mel5_normal.neoantigens.unfiltered.txt

11) Benchmark neoepiscope

SCRIPTS:

benchmark_neoepiscope.sh

COMMANDS:

taskset -c 0,1,2,3 ./benchmark_neoepiscope.sh HOME_DIRECTORY Mel5_tumor Mel5_normal HLA-A*01:01,HLA-B*08:01,HLA-C*07:01

INPUTS:

HOME_DIRECTORY is the path to this repository

OUTPUTS:

CPU info is output in Mel5_tumor_v_Mel5_normal.neoepiscope.cpu_data

Run time info is output in Mel5_tumor_v_Mel5_normal.neoepiscope.time_log

NeoPredPipe neoepitopes are output in Mel5_tumor_v_Mel5_normal.neoepiscope.out and Mel5_tumor_v_Mel5_normal.neoantigens.Indels.unfiltered.txt

Post-processing

The same commands above can be run for patients Mel8, Mel12, Mel15, and Mel16 by replacing 'Mel5' with 'Mel8', 'Mel12', 'Mel15', or 'Mel16' in the commands.

To compile which epitope sequences were enumerated by each caller and for each patient, you can run our python script to parse the output of each tool for each patient.

python epitope_comparison.py -d HOME_DIRECTORY

INPUTS:

HOME_DIRECTORY is the path to this repository

OUTPUTS:

Mel5.peptide_overlap.out, Mel8.peptide_overlap.out, Mel12.peptide_overlap.out, Mel15.peptide_overlap.out, Mel16.peptide_overlap.out and combined.peptide_overlap.out are tab-delimited text files summarizing the neoepitopes predicted by all tools and which tools predicted them (on a per-patient basis, or for all patients combined).

Identifying phased variants (Variant identification and phasing section of MATERIALS AND METHODS)

Data availability

We used paired tumor-normal WES data of melanoma patients from Amaria et al., Carreno et al., Gao et al., Hugo et al., Roh et al., Snyder et al., Van Allen et al., and Zaretsky et al. (3-10); NSCLC patients from Rizvi et al. (11); and colon, endometrial, and thyroid cancer patients from Le et al. (12).

Read alignment and BAM processing

We aligned WES reads to the GRCh37d5 genome and generated genome-coordinate sorted alignments with duplicates marked using the Sanger cgpmap workflow (commit 0bacb0bee2e5c04b268c629d589ff1c551d34745). We realigned around indels and perform base recalibration using gatk-cocleaning-tool (commit d2bafc23221f6a8dceedd45a534163e0e1bf5c68). The relevant reference bundle is available here (see "Reference bundle"), and the necessary variant files can be found in the Broad Institute Resource Bundle.

The file fastq2bam.cwl.yaml can be used with sample_fastq2bam.sh and sample_fastq2bam.json to run the workflow. In the sample shell script and json file, change the paths to match the relevant paths for your computer.

Somatic variant calling

We used the mc3 workflow to call somatic variants. The relevant reference bundle is available here (see "Reference bundle"), and the necessary variant files can be found in the Broad Institute Resource Bundle.

The file mc3_variant.cwl (commit 72a24b55544e3011ede1c46b13d531a7d05ef4e0) can be used with sample_bam2variants.sh and sample_bam2variants.json to run the workflow. In the sample shell script and json file, change the paths to match the relevant paths for your computer.

Somatic variant processing/consensus calling

Following variant calling with mc3, we processed VCFs produced by MuSE, MuTect, Pindel, RADIA, SomaticSniper, and VarScan2 using vt:

First, we normalized each VCF:

vt normalize INPUT_VCF -n -r /PATH/TO/core_ref_GRCh37d5/genome.fa -o OUTPUT_VCF

Then we decomposed block substitutions:

vt decompose_blocksub -a -p INPUT_VCF -o OUTPUT_VCF

Then we decomposed multi-allelic variants:

vt decompose -s INPUT_VCF -o OUTPUT_VCF

Then we sorted variants:

vt sort INPUT_VCF -o OUTPUT_VCF

Then we took uniq variants:

vt uniq INPUT_VCF -o OUTPUT_VCF

After processing VCFs with vt, we ran VCF_parse.py to produce a consensus call set of variants produced by at least 2 callers and not overlapped by a Pindel variant, or called by Pindel:

python2.7 VCF_parse.py -v MuSE.uniq.vcf,MuTect.uniq.vcf,Pindel.uniq.vcf,RADIA.uniq.vcf,SomaticSniper.uniq.vcf,VarScan2.uniq.vcf -c MuSE,MuTect,Pindel,RADIA,SomaticSniper,VarScan2 -o /PATH/TO/OUTPUT_DIRECTORY -n 2 -s SAMPLE_NAME

This outputs a file called SAMPLE_NAME.consensus.vcf in the OUTPUT_DIRECTORY used for downstream analyses.

Germline variant calling and processing

We used GATK v3.7 to run HaplotypeCaller for calling germline variants, and to run VariantFiltration for filtering the results of HaplotypeCaller. The relevant reference bundle is available here (see "Reference bundle"), and the necessary variant files can be found in the Broad Institute Resource Bundle.

HaplotypeCaller was run using default options:

java -jar GenomeAnalysisTK.jar -R /PATH/TO/core_ref_GRCh37d5/genome.fa -T HaplotypeCaller -I /PATH/TO/NORMAL.realigned.cleaned.bam -o /PATH/TO/germline.vcf

VariantFiltration was run as below:

java -jar GenomeAnalysisTK.jar -R /PATH/TO/core_ref_GRCh37d5/genome.fa -T VariantFiltration --variant /PATH/TO/germline.vcf -o /PATH/TO/germline.filtered.vcf --clusterSize 3 --clusterWindowSize 15 --missingValuesInExpressionsShouldEvaluateAsFailing --filterName 'QDFilter' --filterExpression 'QD < 2.0' --filterName 'QUALFilter' --filterExpression 'QUAL < 100.0' --filterName DPFilter --filterExpression 'DP < 10.0'

Only variants passing all filters were retained for downstream analyses. We also ran vt to normalize, decompose, sort, and obtain a unique list of variants from the filtered germline VCFs (as described above for the VCFs from individual somatic variant callers).

Genomic coverage

We used bedtools (v2.23.0-19-ge65c98b) genomecov to determine the Mbp of genome covered in each tumor sample:

bedtools genomecov -ibam /PATH/TO/TUMOR_BAM -bg > /PATH/TO/OUTPUT/BEDGRAPH

We processed the output BedGraph file to determine the number of bp pairs that were covered by at least 3 reads (sufficient depth for somatic variant calling by SomaticSniper and VarScan 2). This data was stored in a tab-separated text file with columns for the patient ID, tumor ID, and Mbp covered (referenced as coverage_summary.tsv in post-processing R script).

Haplotype phasing

Before performing haplotype phasing, we swapped the sample columns in our consensus somatic variants VCF and merged our filtered germline and consensus somatic variants using neoepiscope:

neoepiscope swap -i SOMATIC_VCF -o SWAPPED_SOMATIC_VCF

neoepiscope merge -g GERMLINE_VCF -s SWAPPED_SOMATIC_VCF -o MERGED_VCF

Then, we predicted haplotypes using HapCUT2:

extractHAIRS --indels 1 --bam TUMOR_SAMPLE_NAME.realigned.cleaned.bam --VCF MERGED_VCF --out FRAGMENT_FILE

HAPCUT2 --fragments FRAGMENT_FILE --vcf MERGED_VCF --output HAPLOTYPES

Finally, we prepared our haplotype predictions for neoepiscope neoepitope prediction using neoepiscope:

neoepiscope prep -v MERGED_VCF -c HAPLOTYPES -o PREPPED_HAPLOTYPES

Neoepitope prediction

We predicted neoepitopes of 8-24 amino acids in length with neoepiscope, both accounting for phasing and germline and somatic variants, and not accounting for phasing:

neoepiscope call -b hg19 -c PREPPED_HAPLOTYPES -o OUTPUT_FILE -k 8,24

neoepiscope call -b hg19 -c PREPPED_HAPLOTYPES -o OUTPUT_FILE -k 8,24 --isolate --germline exclude

To determine the effects of phasing on neoepitope prediction, we used the script epitope_count_comparison.py to identify variants unique to/shared between the two neoepitope calling modes. The neoepiscope output files should be formatted as PATIENT_ID.TUMOR_ID.neoepiscope.out and PATIENT_ID.TUMOR_ID.neoepiscope.somatic_unphased.out for compatiblity with this script. The output file from this script (phasing_epitope_data.tsv) is processed by phasing_stats.R.

Phasing prevalence

We used a combination of two scripts to analyze phasing. First, we ran phasing_analysis.py to identify instances of variant phasing within 33, 72, or 94 bp across all patients and tumors:

python phasing_analysis.py -o OUTPUT_DIR -n NEOEPISCOPE_DATA_DIR -c HAPLOTYPE_DIR -d 33

python phasing_analysis.py -o OUTPUT_DIR -n NEOEPISCOPE_DATA_DIR -c HAPLOTYPE_DIR -d 72

python phasing_analysis.py -o OUTPUT_DIR -n NEOEPISCOPE_DATA_DIR -c HAPLOTYPE_DIR -d 94

The HAPLOTYPE_DIR is the directory containing the prepped haplotype files from HapCUT2/neoepiscope merge used to predict neoepitopes. For consistency with the script, the naming convention on these files should be PATIENT_ID.TUMOR_SAMPLE_ID.hapcut.out.prepped. The NEOEPISCOPE_DATA_DIR is the directory into which neoepiscope download saves it's data - make sure that you have run the downloader and selected 'yes' to download the hg19 GTF and bowtie index files.

To produce summary statistics and figures, we used the R script phasing_stats.R.

RNA-seq phasing evidence

For patients that had tumor RNA sequencing samples corresponding to tumor WES samples, we used STAR v2.6.1c to align reads to the hg19 genome.

First we indexed the genome:

STAR --runMode genomeGenerate --genomeDir STAR_INDEX_DIRECTORY --genomeFastaFiles REFERENCE_FASTA --sjdbGTFfile REFERENCE_GTF

Then we aligned reads:

STAR --runMode alignReads --outSAMattributes NH HI AS nM MD --outSAMstrandField intronMotif --outFileNamePrefix OUTPUT_DIRECTORY --genomeDir STAR_INDEX_DIRECTORY --readFilesCommand zcat --readFilesIn FASTQ1 FASTQ2

OUTPUT_DIRECTORY should be a directory named with the tumor RNA sample ID, as STAR does not specify output file names. We then sorted the resulting SAM files using samtools and converted to BAMs, which we indexed:

samtools sort -O BAM -o OUTPUT_BAM OUTPUT_DIRECTORY/Aligned.out.sam

samtools index OUTPUT_BAM

OUTPUT_BAM should be formatted as TUMOR_RNA_SAMPLE_ID.sorted.bam

We then used the script paired_read_rna_support.py to identify variant paired that were covered by the same RNA-seq read pair:

python paired_read_rna_support.py -p PATIENT_ID,TUMOR_WES_SAMPLE_ID,TUMOR_RNA_SAMPLE_ID -s STAR_OUTPUT_DIRECTORY -c HAPCUT_OUTPUT_DIRECTORY -o OUTPUT_DIRECTORY -d PICKLED_DICTIONARY

STAR_OUTPUT_DIRECTORY should be the directory containing your per-sample STAR output directories, HAPCUT_OUTPUT_DIRECTORY should be the directory containing your output from HapCUT2, OUTPUT_DIRECTORY is the directory to write output from this script, and PICKLED_DICTIONARY is created from running phasing_analysis.py with a distance of 72 bp - it should be in your output directory specified for that script, under the file name 'patient_variants_72.pickle'. This script will generate pickled dictionaries for each patient storing variant pairs with phasing supported by, not supported by, or novel in RNA-seq. We used counts from these to generate a tab separated text file for use in statistical analysis.

References:

Wood MA, Nguyen A, Struck AJ, Ellrott K, Nellore A, Thompson RF. neoepiscope improves neoepitope prediction with multi-variant phasing. Bioinformatics. 2019;btz653.
Bassani-Sternberg M, Bräunlein E, Klar R, Engleitner T, Sinitcyn P, Audehm S, et al. Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry. Nat Commun. 2016;7: 13404.
Amaria RN, Reddy SM, Tawbi HA, Davies MA, Ross MI, Glitza IC, et al. Neoadjuvant immune checkpoint blockade in high-risk resectable melanoma. Nat Med. 2018;24: 1649–1654.
Carreno BM, Magrini V, Becker-Hapak M, Kaabinejadian S, Hundal J, Petti AA, et al. A dendritic cell vaccine increases the breadth and diversity of melanoma neoantigen-specific T cells. Science. 2015;348: 803–808.
Gao J, Shi LZ, Zhao H, Chen J, Xiong L, He Q, et al. Loss of IFN-γ Pathway Genes in Tumor Cells as a Mechanism of Resistance to Anti-CTLA-4 Therapy. Cell. 2016;167: 397–404.e9.
Hugo W, Zaretsky JM, Sun L, Song C, Moreno BH, Hu-Lieskovan S, et al. Genomic and Transcriptomic Features of Response to Anti-PD-1 Therapy in Metastatic Melanoma. Cell. 2017;168: 542.
Roh W, Chen P-L, Reuben A, Spencer CN, Prieto PA, Miller JP, et al. Integrated molecular analysis of tumor biopsies on sequential CTLA-4 and PD-1 blockade reveals markers of response and resistance. Sci Transl Med. 2017;9.
Snyder A, Makarov V, Merghoub T, Yuan J, Zaretsky JM, Desrichard A, et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N Engl J Med. 2014;371: 2189–2199.
Van Allen EM, Miao D, Schilling B, Shukla SA, Blank C, Zimmer L, et al. Genomic correlates of response to CTLA-4 blockade in metastatic melanoma. Science. 2015;350: 207–211.
Zaretsky JM, Garcia-Diaz A, Shin DS, Escuin-Ordinas H, Hugo W, Hu-Lieskovan S, et al. Mutations Associated with Acquired Resistance to PD-1 Blockade in Melanoma. N Engl J Med. 2016;375: 819–829.
Rizvi NA, Hellmann MD, Snyder A, Kvistborg P, Makarov V, Havel JJ, et al. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science. 2015;348: 124–128.
Le DT, Durham JN, Smith KN, Wang H, Bartlett BR, Aulakh LK, et al. Mismatch repair deficiency predicts response of solid tumors to PD-1 blockade. Science. 2017;357: 409–413.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
LICENSE		LICENSE
NeoPredPipe_HLA		NeoPredPipe_HLA
README.md		README.md
TSNAD_config		TSNAD_config
VCF_parse.py		VCF_parse.py
benchmark_baserecalibration.sh		benchmark_baserecalibration.sh
benchmark_bwa.sh		benchmark_bwa.sh
benchmark_hapcut2.sh		benchmark_hapcut2.sh
benchmark_haplotypecaller.sh		benchmark_haplotypecaller.sh
benchmark_markduplicates.sh		benchmark_markduplicates.sh
benchmark_mupexi.sh		benchmark_mupexi.sh
benchmark_mutect.sh		benchmark_mutect.sh
benchmark_neoepiscope.sh		benchmark_neoepiscope.sh
benchmark_neopredpipe.sh		benchmark_neopredpipe.sh
benchmark_optitype.sh		benchmark_optitype.sh
benchmark_pvacseq.sh		benchmark_pvacseq.sh
benchmark_tsnad.sh		benchmark_tsnad.sh
benchmark_tsnad_vep.sh		benchmark_tsnad_vep.sh
benchmark_vep.sh		benchmark_vep.sh
epitope_comparison.py		epitope_comparison.py
epitope_count_comparison.py		epitope_count_comparison.py
fastq2bam.cwl.yaml		fastq2bam.cwl.yaml
paired_read_rna_support.py		paired_read_rna_support.py
phasing_analysis.py		phasing_analysis.py
phasing_stats.R		phasing_stats.R
process_rna_support.py		process_rna_support.py
sample_bam2variants.json		sample_bam2variants.json
sample_bam2variants.sh		sample_bam2variants.sh
sample_fastq2bam.json		sample_fastq2bam.json
sample_fastq2bam.sh		sample_fastq2bam.sh

License

pdxgx/neoepiscope-paper

Folders and files

Latest commit

History

Repository files navigation

Reproducing results from neoepiscope paper

Requirements:

Software:

Reference files:

Data files:

Before you start:

Running benchmarking scripts (Benchmarking section of MATERIALS AND METHODS)

Identifying phased variants (Variant identification and phasing section of MATERIALS AND METHODS)

References:

About

Resources

License

Stars

Watchers

Forks

Languages