Skip to content

Commit

Permalink
Merge pull request #126 from griffithlab/hotfix
Browse files Browse the repository at this point in the history
Hotfix
  • Loading branch information
susannasiebert committed Jul 28, 2016
2 parents b1628ab + d524c0f commit 2bfad20
Show file tree
Hide file tree
Showing 34 changed files with 5,086 additions and 4,815 deletions.
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@
Cancer immunotherapy has gained significant momentum from recent clinical successes of checkpoint blockade inhibition. Massively parallel sequence analysis suggests a connection between mutational load and response to this class of therapy. Methods to identify which tumor-specific mutant peptides (neoantigens) can elicit anti-tumor T cell immunity are needed to improve predictions of checkpoint therapy response and to identify targets for vaccines and adoptive T cell therapies. Here, we provide a cancer immunotherapy pipeline for the identification of **p**ersonalized **V**ariant **A**ntigens by **C**ancer **Seq**uencing (pVAC-Seq) that integrates tumor mutation and expression data (DNA- and RNA-Seq).
http://www.genomemedicine.com/content/8/1/11

## New in version 3.0.3
<ul>
<li>Bugfix: The binding filter used to filter out all but the top peptide candidate for a variant even if the <code>--top-result-per-mutation</code> flag wasn't set. This is now fixed and the top-result-per-mutation filtering only happens when the flag is set.</li>
<li>Bugfix: For large input files the mutant protein sequence wasn't being correctly matched to a wildtype protein sequence. This issue has been corrected.</li>
</ul>

## New in version 3.0.2
<ul>
<li>Bugfix: Some allele names in the list of valid alleles were incorrect. The list has been updated.</li>
Expand All @@ -12,7 +18,7 @@ http://www.genomemedicine.com/content/8/1/11
<ul>
<li>pVAC-Seq now uses the IEDB RESTful interface for making epitope binding predictions. A local install of NetMHC3.4 is no longer required. By using IEDB the user now has a choice between several prediction algorithms, including NetMHC (3.4), NetMHCcons (1.1), NetMHCpan (2.8), PickPocket (1.1), SMM, and SMMPMBEC.</li>
<li>The user can now set the <code>--top-result-per-mutation</code> flag in order to only output the top scoring candidate per allele-length per mutation.</li>
<li>Since it is now possible to run mutliple epitope prediction algorithms at the same time, the scores for each candidate epitope are aggregated as <code>Median MT score All Methods</code>, which is the median mutant ic50 binding score of all chosen prediction methods, and the <code>Best MT score</code>, which is the lowest mutant ic50 binding score of all chosen preidction methods. For the Best MT score we also output the <code>Corresponding WT score</code> and the <code>Best MT Score Method</code>. Individual ic50 binding score for each prediction method are also outputted. The user can specify which metric to use for filtering by setting the <code>--top-score-metric</code> argument to either <code>lowest</code> or <code>median</code>.</li>
<li>Since it is now possible to run mutliple epitope prediction algorithms at the same time, the scores for each candidate epitope are aggregated as <code>Median MT score All Methods</code>, which is the median mutant ic50 binding score of all chosen prediction methods, and the <code>Best MT score</code>, which is the lowest mutant ic50 binding score of all chosen prediction methods. For the Best MT score we also output the <code>Corresponding WT score</code> and the <code>Best MT Score Method</code>. Individual ic50 binding score for each prediction method are also outputted. The user can specify which metric to use for filtering by setting the <code>--top-score-metric</code> argument to either <code>lowest</code> or <code>median</code>.</li>
</ul>

## New in version 2.0.2
Expand Down Expand Up @@ -159,7 +165,6 @@ Since the goal of the pVAC-Seq pipeline is to predict putative 'neo'antigens, we
<li>In the following figure, the amino acid FASTA sequence is built using 10 flanking amino acids on each side of the mutated amino acid. The preceding or succeeding 20 amino acids are taken if the mutation lies near the end or beginning of the transcript, respectively.</li>
<li>All predicted candidate peptides from epitope prediction software based on selected k-mer window size.</li>
<li>Only localized peptides (those containing the mutant amino acid) are considered to compare to wild-type counterpart.</li>
<li>The ‘best candidate’ (lowest MT binding score) per mutation is chosen across all specified k-mers that were installed as input.</li>
</ol>
![alt text][logo]
[logo]:
Expand Down
4,690 changes: 2,345 additions & 2,345 deletions pvacseq/example_data/Test.combined.parsed.tsv

Large diffs are not rendered by default.

96 changes: 96 additions & 0 deletions pvacseq/example_data/Test_21.fa.split_1
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
>WT.IGFBP2_ENST00000233809_1.inframe_ins.20L/LLP
LPLPPPPLLPLLLLLLGASGG
>MT.IGFBP2_ENST00000233809_1.inframe_ins.20L/LLP
LPLPPPPLLPLLPLLLLLGASGG
>WT.RBM47_ENST00000381793_1.inframe_del.495-502AAAAAAAA/A
DPASAAAAAAAAAAAAAAVIPTVSTPPP
>MT.RBM47_ENST00000381793_1.inframe_del.495-502AAAAAAAA/A
DPASAAAAAAAVIPTVSTPPP
>WT.PRICKLE4_ENST00000458694_1.inframe_ins.287-288-/L
VNSATLSRTLLAAAGGSSLQ
>MT.PRICKLE4_ENST00000458694_1.inframe_ins.287-288-/L
VNSATLSRTLLLAAAGGSSLQ
>WT.TTBK1_ENST00000259750_1.inframe_del.750E/-
EDEEEEEEDEEEEEEEEEEEE
>MT.TTBK1_ENST00000259750_1.inframe_del.750E/-
EDEEEEEEDEEEEEEEEEEE
>WT.CECR2_ENST00000262608_1.missense.535R/H
GRSGGSHVWTRSRDPEGSSRK
>MT.CECR2_ENST00000262608_1.missense.535R/H
GRSGGSHVWTHSRDPEGSSRK
>WT.USP18_ENST00000215794_1.missense.124A/V
RQKAVRPLELAYCLQKCNVPL
>MT.USP18_ENST00000215794_1.missense.124A/V
RQKAVRPLELVYCLQKCNVPL
>WT.CLTCL1_ENST00000263200_1.missense.1469H/N
NNKSVNEALNHLLTEEEDYQG
>MT.CLTCL1_ENST00000263200_1.missense.1469H/N
NNKSVNEALNNLLTEEEDYQG
>WT.FAM230A_ENST00000434783_2.missense.322E/Q
KEDAVQGIANEDAAQGIAKED
>MT.FAM230A_ENST00000434783_2.missense.322E/Q
KEDAVQGIANQDAAQGIAKED
>WT.IGLV6-57_ENST00000390285_1.missense.43R/G
PGKTVTISCTRSSGSIASNYV
>MT.IGLV6-57_ENST00000390285_1.missense.43R/G
PGKTVTISCTGSSGSIASNYV
>WT.IGLV6-57_ENST00000390285_2.missense.63S/A
VQWYQQRPGSSPTTVIYEDNQ
>MT.IGLV6-57_ENST00000390285_2.missense.63S/A
VQWYQQRPGSAPTTVIYEDNQ
>WT.TPST2_ENST00000338754_1.missense.274P/H
VLHHEDLIGKPGGVSLSKIER
>MT.TPST2_ENST00000338754_1.missense.274P/H
VLHHEDLIGKHGGVSLSKIER
>WT.NEFH_ENST00000310624_2.missense.830P/T
VKSPVKEEEKPQEVKVKEPPK
>MT.NEFH_ENST00000310624_2.missense.830P/T
VKSPVKEEEKTQEVKVKEPPK
>WT.ELFN2_ENST00000402918_1.missense.186P/L
SLMVCELAGNPFNCECDLFGF
>MT.ELFN2_ENST00000402918_1.missense.186P/L
SLMVCELAGNLFNCECDLFGF
>WT.LGALS2_ENST00000215886_1.missense.132E/Q
SHLSYLSVRGGFNMSSFKLKE
>MT.LGALS2_ENST00000215886_1.missense.132E/Q
SHLSYLSVRGGFNMSSFKLKQ
>WT.GGA1_ENST00000343632_3.missense.484P/A
SLLHTVSPEPPRPPQQPVPTE
>MT.GGA1_ENST00000343632_3.missense.484P/A
SLLHTVSPEPARPPQQPVPTE
>WT.TRIOBP_ENST00000406386_2.FS.219
EDTGGGGRSAGQHWARLRGE
>MT.TRIOBP_ENST00000406386_2.FS.219
EDTGGGGRSAQHWARLRGESGLSLERHRSTLTQASSMTPHSGPRSTTSQASPAQRDTAQAASTREIPRASSPHRITQRDTSRASSTQQEISRASSTQQETSRASSTQEDTPRASSTQEDTPRASSTQWNTPRASSPSRSTQLDNPRTSSTQQDNPQTSFPTCTPQRENPRTPCVQQDDPRASSPNRTTQRENSRTSCAQRDNPKASRTSSPNRATRDNPRTSCAQRDNPRASSPSRATRDNPTTSCAQRDNPRASRTSSPNRATRDNPRTSCAQRDNPRASSPSRATRDNPTTSCAQRDNPRASRTSSPNRATRDNPRTSCAQRDNPRASSPNRAARDNPTTSCAQRDNPRASRTSSPNRATRDNPRTSCAQRDNPRASSPNRATRDNPTTSCAQRDNPRASRTSSPNRATRDNPRTSCAQRDNPRASSPNRTTQQDSPRTSCARRDDPRASSPNRTIQQENPRTSCALRDNPRASSPSRTIQQENPRTSCAQRDDPRASSPNRTTQQENPRTSCARRDNPRASSRNRTIQRDNPRTSCAQRDNPRASSPNRTIQQENLRTSCTRQDNPRTSSPNRATRDNPRTSCAQRDNLRASSPIRATQQDNPRTCIQQNIPRSSSTQQDNPKTSCTKRDNLRPTCTQRDRTQSFSFQRDNPGTSSSQCCTQKENLRPSSPHRSTQWNNPRNSSPHRTNKDIPWASFPLRPTQSDGPRTSSPSRSKQSEVPWASIALRPTQGDRPQTSSPSRPAQHDPPQSSFGPTQYNLPSRATSSSHNPGHQSTSRTSSPVYPAAYGAPLTSPEPSQPPCAVCIGHRDAPRASSPPRYLQHDPFPFFPEPRAPESEPPHHEPPYIPPAVCIGHRDAPRASSPPRHTQFDPFPFLPDTSDAEHQCQSPQHEPLQLPAPVCIGYRDAPRASSPPRQAPEPSLLFQDLPRASTESLVPSMDSLHECPHIPTPVCIGHRDAPSFSSPPRQAPEPSLFFQDPPGTSMESLAPSTDSLHGSPVLIPQVCIGHRDAPRASSPPRHPPSDLAFLAPSPSPGSSGGSRGSAPPGETRHNLEREEYTVLADLPPPRRLAQRQPGPQAQCSSGGRTHSPGRAEVERLFGQERRKSEAAGAFQAQDEGRSQQPSQGQSQLLRRQSSPAPSRQVTMLPAKQAELTRRSQAEPPHPWSPEKRPEGDRQLQGSPLPPRTSARTPERELRTQRPLESGQAGPRQPLGVWQSQEEPPGSQGPHRHLERSWSSQEGGLGPGGWWGCGEPSLGAAKAPEGAWGGTSREYKESWGQPEAWEEKPTHELPRELGKRSPLTSPPENWGGPAESSQSWHSGTPTAVGWGAEGACPYPRGSERRPELDWRDLLGLLRAPGEGVWARVPSLDWEGLLELLQARLPRKDPAGHRDDLARALGPELGPPGTNDVPEQESHSQPEGWAEATPVNGHSPALQSQSPVQLPSPACTSTQWPKIKVTRGPATATLAGLEQTGPLGSRSTAKGPSLPELQFQPEEPEESEPSRGQDPLTDQKQADSADKRPAEGKAGSPLKGRLVTSWRMPGDRPTLFNPFLLSLGVLRWRRPDLLNFKKGWMSILDEPGEPPSPSLTTTSTSQWKKHWFVLTDSSLKYYRDSTAEEADELDGEIDLRSCTDVTEYAVQRNYGFQIHTKDAVYTLSAMTSGIRRNWIEALRKTVRPTSAPDVTKLSDSNKENALHSYSTQKGPLKAGEQRAGSEVISRGGPRKADGQRQALDYVELSPLTQASPQRARTPARTPDRLAKQEELERDLAQRSEERRKWFEATDSRTPEVPAGEGPRRGLGAPLTEDQQNRLSEEIEKKWQELEKLPLRENKRVPLTALLNQSRGERRGPPSDGHEALEKEVQALRAQLEAWRLQGEAPQSALRSQEDGHIPPGYISQEACERSLAEMESSHQQVMEELQRHHERELQRLQQEKEWLLAEETAATASAIEAMKKAYQEELSRELSKTRSLQQGPDGLRKQHQSDVEALKRELQVLSEQYSQKCLEIGALMRQAEEREHTLRRCQQEGQELLRHNQELHGRLSEEIDQLRGFIASQGMGNGCGRSNERSSCELEVLLRVKENELQYLKKEVQCLRDELQMMQKDKRFTSGKYQDVYVELSHIKTRSEREIEQLKEHLRLAMAALQEKESMRNSLAEYSTGQGSGEKAGCPWSGTGQH
>WT.CACNA1I_ENST00000402142_1.missense.107C/Y
GMYQPCDDMDCLSDRCKILQV
>MT.CACNA1I_ENST00000402142_1.missense.107C/Y
GMYQPCDDMDYLSDRCKILQV
>WT.ACO2_ENST00000216254_1.missense.33A/E
ASVLCQRAKVAMSHFEPNEYI
>MT.ACO2_ENST00000216254_1.missense.33A/E
ASVLCQRAKVEMSHFEPNEYI
>WT.ACO2_ENST00000216254_2.missense.510E/Q
AIAGTLKFNPETDYLTGTDGK
>MT.ACO2_ENST00000216254_2.missense.510E/Q
AIAGTLKFNPQTDYLTGTDGK
>WT.PKDREJ_ENST00000253255_3.missense.1875T/I
WLYYSYGLLHTYGSGGYALYF
>MT.PKDREJ_ENST00000253255_3.missense.1875T/I
WLYYSYGLLHIYGSGGYALYF
>WT.MOV10L1_ENST00000262794_2.missense.482A/T
TSAKTTVVVTAQKRNSRRQLP
>MT.MOV10L1_ENST00000262794_2.missense.482A/T
TSAKTTVVVTTQKRNSRRQLP
>WT.PANX2_ENST00000395842_1.missense.147S/F
VPALGWEFLASTRLTSELNFL
>MT.PANX2_ENST00000395842_1.missense.147S/F
VPALGWEFLAFTRLTSELNFL
>WT.TUBGCP6_ENST00000248846_2.missense.220H/R
TRVSLFGALVHSRTYDMDVRL
>MT.TUBGCP6_ENST00000248846_2.missense.220H/R
TRVSLFGALVRSRTYDMDVRL
>WT.PPP6R2_ENST00000395741_6.missense.414S/Y
AREERTEASGSESRVEPPHEN
>MT.PPP6R2_ENST00000395741_6.missense.414S/Y
AREERTEASGYESRVEPPHEN
48 changes: 48 additions & 0 deletions pvacseq/example_data/Test_21.fa.split_1.key
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
1 >WT.IGFBP2_ENST00000233809_1.inframe_ins.20L/LLP
2 >MT.IGFBP2_ENST00000233809_1.inframe_ins.20L/LLP
3 >WT.RBM47_ENST00000381793_1.inframe_del.495-502AAAAAAAA/A
4 >MT.RBM47_ENST00000381793_1.inframe_del.495-502AAAAAAAA/A
5 >WT.PRICKLE4_ENST00000458694_1.inframe_ins.287-288-/L
6 >MT.PRICKLE4_ENST00000458694_1.inframe_ins.287-288-/L
7 >WT.TTBK1_ENST00000259750_1.inframe_del.750E/-
8 >MT.TTBK1_ENST00000259750_1.inframe_del.750E/-
9 >WT.CECR2_ENST00000262608_1.missense.535R/H
10 >MT.CECR2_ENST00000262608_1.missense.535R/H
11 >WT.USP18_ENST00000215794_1.missense.124A/V
12 >MT.USP18_ENST00000215794_1.missense.124A/V
13 >WT.CLTCL1_ENST00000263200_1.missense.1469H/N
14 >MT.CLTCL1_ENST00000263200_1.missense.1469H/N
15 >WT.FAM230A_ENST00000434783_2.missense.322E/Q
16 >MT.FAM230A_ENST00000434783_2.missense.322E/Q
17 >WT.IGLV6-57_ENST00000390285_1.missense.43R/G
18 >MT.IGLV6-57_ENST00000390285_1.missense.43R/G
19 >WT.IGLV6-57_ENST00000390285_2.missense.63S/A
20 >MT.IGLV6-57_ENST00000390285_2.missense.63S/A
21 >WT.TPST2_ENST00000338754_1.missense.274P/H
22 >MT.TPST2_ENST00000338754_1.missense.274P/H
23 >WT.NEFH_ENST00000310624_2.missense.830P/T
24 >MT.NEFH_ENST00000310624_2.missense.830P/T
25 >WT.ELFN2_ENST00000402918_1.missense.186P/L
26 >MT.ELFN2_ENST00000402918_1.missense.186P/L
27 >WT.LGALS2_ENST00000215886_1.missense.132E/Q
28 >MT.LGALS2_ENST00000215886_1.missense.132E/Q
29 >WT.GGA1_ENST00000343632_3.missense.484P/A
30 >MT.GGA1_ENST00000343632_3.missense.484P/A
31 >WT.TRIOBP_ENST00000406386_2.FS.219
32 >MT.TRIOBP_ENST00000406386_2.FS.219
33 >WT.CACNA1I_ENST00000402142_1.missense.107C/Y
34 >MT.CACNA1I_ENST00000402142_1.missense.107C/Y
35 >WT.ACO2_ENST00000216254_1.missense.33A/E
36 >MT.ACO2_ENST00000216254_1.missense.33A/E
37 >WT.ACO2_ENST00000216254_2.missense.510E/Q
38 >MT.ACO2_ENST00000216254_2.missense.510E/Q
39 >WT.PKDREJ_ENST00000253255_3.missense.1875T/I
40 >MT.PKDREJ_ENST00000253255_3.missense.1875T/I
41 >WT.MOV10L1_ENST00000262794_2.missense.482A/T
42 >MT.MOV10L1_ENST00000262794_2.missense.482A/T
43 >WT.PANX2_ENST00000395842_1.missense.147S/F
44 >MT.PANX2_ENST00000395842_1.missense.147S/F
45 >WT.TUBGCP6_ENST00000248846_2.missense.220H/R
46 >MT.TUBGCP6_ENST00000248846_2.missense.220H/R
47 >WT.PPP6R2_ENST00000395741_6.missense.414S/Y
48 >MT.PPP6R2_ENST00000395741_6.missense.414S/Y
1 change: 1 addition & 0 deletions pvacseq/example_data/Test_filtered.tsv
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
Chromosome Start Stop Reference Variant Transcript Ensembl Gene ID Variant Type Mutation Protein Position Gene Name HLA Allele Peptide Length Sub-peptide Position MT Epitope Seq WT Epitope Seq Best MT Score Corresponding WT Score Fold Change Best MT Score Method Median MT Score All Methods PickPocket WT Score PickPocket MT Score NetMHC WT Score NetMHC MT Score
22 38119219 38119220 GA G ENST00000406386 ENSG00000100106 FS 219 TRIOBP HLA-G*01:09 9 2111 KYQDVYVEL NA 323.04 NA NA PickPocket 323.04 NA 323.04
6 41754573 41754573 C CCTT ENST00000458694 ENSG00000124593 inframe_ins -/L 287-288 PRICKLE4 HLA-E*01:01 9 4 ATLSRTLLL ATLSRTLLA 152.0 8255.0 54.309 NetMHC 1137.305 3272.12 2122.61 8255.0 152.0
22 38119219 38119220 GA G ENST00000406386 ENSG00000100106 FS 219 TRIOBP HLA-E*01:01 9 1542 RMPGDRPTL NA 259.0 NA NA NetMHC 311.43 NA 363.86 NA 259.0
39 changes: 10 additions & 29 deletions pvacseq/lib/binding_filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,17 @@ def main(args_input = sys.argv[1:]):

args = parser.parse_args(args_input)

prediction = {}
fieldnames = []

reader = csv.DictReader(args.input_file, delimiter='\t')
fieldnames = reader.fieldnames

writer = csv.DictWriter(
args.output_file,
fieldnames,
delimiter = '\t',
lineterminator = '\n'
)
writer.writeheader()

for entry in reader:
name = entry['Gene Name']
if args.top_score_metric == 'median':
Expand All @@ -48,34 +53,10 @@ def main(args_input = sys.argv[1:]):
if score > args.binding_threshold or fold_change < args.minimum_fold_change:
continue

if (name not in prediction or
score < prediction[name]['SCORE']):
prediction[name] = {
'GENES' : [entry],
'SCORE' : score
}
elif score == prediction[name]['SCORE']:
prediction[name]['GENES'].append(entry)
args.input_file.close()

writer = csv.DictWriter(
args.output_file,
fieldnames,
delimiter = '\t',
lineterminator = '\n'
)

writer.writeheader()

writer.writerows(
entry
for gene in sorted(prediction)
for entry in prediction[gene]['GENES']
)
writer.writerow(entry)

args.input_file.close()
args.output_file.close()



if __name__ == "__main__":
main()

0 comments on commit 2bfad20

Please sign in to comment.