Skip to content

Creating a protein sequence FASTA file

Pablo Cingolani edited this page Dec 6, 2017 · 1 revision

SnpEff ann command has a command line option called -fastaProt that tells SnpEff to output the "original" and "resulting" protein sequences for each variant into a FASTA file.

This means that for each variant, the output FASTA file will have an entry with protein sequence resulting from applying that variant to the reference sequence.

Here is an example:

$ cat z.vcf
1	889455	.	G	A	.	.	.

$ java -Xmx6g -jar snpEff.jar ann -fastaProt z.prot.fa hg19 z.vcf > z.ann.vcf

The resulting fasta file z.prot.fa looks like this (lines edited for readibility):

>NM_015658.3 Ref
MAAAGSR...LLFGKVAKDSSRMLQPSSSPLWGKLRVDIKAYLGS...

>NM_015658.3 Variant 1:889455-889455 Ref:G Alt:A HGVS.p:p.Gln236*
MAAAGSR...LLFGKVAKDSSRML*PSSSPLWGKLRVDIKAYLGS...