split multi-allelic calls into multiple entries in an uncompressed VCF
search for locus tags (in input file) at NCBI via eUtils (https://www.biostars.org/p/278614/)
search for FASTA protein sequences based on protein IDs (e.g. UniProt) taken from an input file (https://www.biostars.org/p/278747/)
cat ProteinFASTASearch.list
Q66LE6
Q9UKV3
python ProteinFASTASearch.py -f 1 -e kevin@clinicalbioinformatics.co.uk ProteinFASTASearch.list
>NP_060931.2 serine/threonine-protein phosphatase 2A 55 kDa regulatory subunit B delta isoform isoform a [Homo sapiens]
MAGAGGGGCPAGGNDFQWCFSQVKGAIDEDVAEADIISTVEFNYSGDLLATGDKGGRVVIFQREQENKSR
PHSRGEYNVYSTFQSHEPEFDYLKSLEIEEKINKIRWLPQQNAAHFLLSTNDKTIKLWKISERDKRAEGY
NLKDEDGRLRDPFRITALRVPILKPMDLMVEASPRRIFANAHTYHINSISVNSDHETYLSADDLRINLWH
LEITDRSFNIVDIKPANMEELTEVITAAEFHPHQCNVFVYSSSKGTIRLCDMRSSALCDRHSKFFEEPED
PSSRSFFSEIISSISDVKFSHSGRYMMTRDYLSVKVWDLNMESRPVETHQVHEYLRSKLCSLYENDCIFD
KFECCWNGSDSAIMTGSYNNFFRMFDRDTRRDVTLEASRESSKPRASLKPRKVCTGGKRRKDEISVDSLD
FNKKILHTAWHPVDNVIAVAATNNLYIFQDKIN
>NP_001158286.1 apoptotic chromatin condensation inducer in the nucleus isoform 2 [Homo sapiens]
MWRRKHPRTSGGTRGVLSGNRGVEYGSGRGHLGTFEGRWRKLPKMPEAVGTDPSTSRKMAELEEVTLDGK
PLQALRVTDLKAALEQRGLAKSGQKSALVKRLKGALMLENLQKHSTPHAAFQPNSQIGEEMSQNSFIKQY
LEKQQELLRQRLEREAREAAELEEASAESEDEMIHPEGVASLLPPDFQSSLERPELELSRHSPRKSSSIS
EEKGDSDDEKPRKGERRSSRVRQARAAKLSEGSQPAEEEEDQETPSRNLRVRADRNLKTEEEEEEEEEEE
EDDEEEEGDDEGQKSREAPILKEFKEEGEEIPRVKPEEMMDERPKTRSQEQEVLERGGRFTRSQEEARKS
HLARQQQEKEMKTTSPLEEEEREIKSSQGLKEKSKSPSPPRLTEDRKKASLVALPEQTASEEETPPPLLT
KEASSPPPHPQLHSEEEIEPMEGPAPPVLIQLSPPNTDADTRELLVSQHTVQLVGGLSPLSSPSDTKAES
PAEKVPEESVLPLVQKSTLADYSAQKDLEPESDRSAQPLPLKIEELALAKGITEECLKQPSLEQKEGRRA
SHTLLPSHRLKQSADSSSSRSSSSSSSSSRSRSRSPDSSGSRSHSPLRSKQRDVAQARTHANPRGRPKMG
SRSTSESRSRSRSRSRSASSNSRKSLSPGVSRDSSTSYTETKDPSSGQEVATPPVPQLQVCEPKERTSTS
SSSVQARRLSQPESAEKHVTQRLQPERGSPKKCEAEEAEPPAATQPQTSETQTSHLPESERIHHTVEEKE
EVTMDTSENRPENDVPEPPMPIADQVSNDDRPEGSVEDEEKKESSLPKSFKRKISVVSTKGVPAGNSDTE
GGQPGRKRRWGASTATTQKKPSISITTESLKEAVVDLHADDSRISEDETERNGDDGTHDKGLKICRTVTQ
VVPAEGQENGQREEEEEEKEPEAEPPVPPQVSVEVALPPPAEHEVKKVTLGDTLTRRSISQQKSGVSITI
DDPVRTAQVPSPPRGKISNIVHISNLVRPFTLGQLKELLGRTGTLVEEAFWIDKIKSHCFVTYSTVEEAV
ATRTALHGVKWPQSNPKFLCADYAEQDELDYHRGLLVDRPSETKTEEQGIPRPLHPPPPPPVQPPQHPRA
EQREQERAVREQWAEREREMERRERTRSEREWDRDKVREGPRSRSRSRDRRRKERAKSKEKKSEKKEKAQ
EEPPAKLLDDLFRKTKAAPCIYWLPLTDSQIVQKEAERAERAKEREKRRKEQEEEEQKEREKEAERERNR
QLEREKRREHSRERDRERERERERDRGDRDRDRERDRERGRERDRRDTKRHSRSRSRSTPVRDRGGRR
search for FASTA sequences based on any keyword (https://www.biostars.org/p/279584/)
python ProteinFASTASearchByFASTATitle.py -e kevin@clinicalbioinformatics.co.uk -t "Bacillus anthracis"
>WP_154574506.1 IS3 family transposase, partial [Bacillus anthracis]
KKDEYSIKEICILIGIPRSTYYRWKNKEKDVKEAKLEQAILTICMTNHFRYGHRKVTALLKRKYNYHPNR
KTVQKIMQKKNLQCRVKRKRRTWINGESRIVVENLLNRNFQANKPNEKWVTDITYLPFGTEMLYLLSIMD
LYNNEIIAYEISNRQDVTLVLRTVEKAIKLQQKTQIILHSDQGAVYTSYAFQTLSKKMALPQVCPVKEIV
MIMP
>WP_154556816.1 DUF4180 domain-containing protein, partial [Bacillus anthracis]
FAIVGDFSMYTSKSLKDFIYECNKGKDIFYLATEQQAIEKLSTLK
>WP_154556815.1 helix-turn-helix transcriptional regulator, partial [Bacillus anthracis]
MEFYDLGITIKELRIKKNISQSELCHGICSQSQISKIEKGVIYPSSILLYQLSERLGIDPNNIFALTKNK
KFKYIENVKCIMKDCIRQHQ
>WP_154556814.1 DUF4180 domain-containing protein, partial [Bacillus anthracis]
MEIKKVVIDGINIAVIRNNKVLISDVQSALDTMATVQYEVNAKHIIIHKSLISEDFFDLKTRLAGDIL