Requirements -Python -Biopython -HMMER3 -MEME-Suite -Pfam-A database
Instructions for running lanthipeptide genome mining pipeline:
Note: replace text between < > with what is appropriate for you
Preparing genbank files
- Download genbank and nucleotide fasta files from NCBI
- Move file prepare_genbank/prepare-genbank-files.py to directory with the genbank and nucleotide fasta files
- Edit the range in prepare-genbank-files.py to reflect the number of files you are preparing
- Execute prepare-genbank-files.py
- Move file prepare_genbank/index_gb.py to the directory with the genbank files
- Execute index_gb.py
Extracting potential lanthipeptide biosynthetic gene clusters
- Edit get_lanC.py so fh = open(<list-of-lanC-like-protein-accession-numbers.txt>) and fh2 = open(<directory_with_genbank_files>/ref_seq-acc.csv')
- Execute get_lanC.py
- Edit lanthipeptide_gb_mining.py so variables at the beginning of file match your computer
- Execute lanthipeptide_gb_mining.py
- Execute split_classes.py (may need to alter Pfam family IDs if versions are different in your Pfam database)
- Edit pull-classes.py so fh1 = open() and fh2 = open('lanC_like-acc.csv')
- Execute pull-classes.py, note results are put to stdout
- Edit pull-classes.py so fh2 = open('lanC-like-peps.csv')
- Execute pull-classes.py, note results are put to stdout
Identifying potential precursor peptides in class I clusters
- Edit make-fasta.py so fh = open()
- Execute make-fasta.py, note results are put to stdout
- Execute fimo classI_leader-meme.txt
- Copy fimo.tsv from created directory to current directory
- Execute get_params_classI.py <classI-peps-params.csv>
- Edit select-features.py so fh2 = open(<classI-peps-params.csv>)
- Execute select-features.py
- Execute classI_svm_classify.py
- Execute fimo-compile.py
- Open classI-peps-params.csv with excel
- Delete the columns with the amino acid pairs feature columns
- Open classI-peps-fimo-scores.csv with excel
- Copy fimo scores and paste into the params file
- Open classI-peps-classification.csv
- Copy classifications and paste into params file
- Calculate score for peptides (+5 for svm classification, +5 for match with leader meme, +2 for core pI less than 9, +2 for 2 or more Cys in core, +1 if leader ends in GG)
- If peptide score is greater than 10, it is predicted to be a precursor peptide
Identifying potential precursor peptides in class II clusters
- Edit make-fasta.py so fh = open()
- Execute make-fasta.py, note results are put to stdout
- Execute fimo classII_leader-meme_20190503.txt
- Copy fimo.tsv from created directory to current directory
- Execute hmmscan -o leader-hmm.out --tblout leader-hmm.tbl classII-precursors.hmm classII-peps.fa
- Execute get_params_classII.py <classII-peps-params.csv>
- Edit select-features.py so fh2 = open(<classII-peps-params.csv>)
- Execute select-features.py
- Execute classII_svm_classify.py
- Execute fimo-compile.py
- Execute hmmer-compile.py
- Open classI-peps-params.csv with excel
- Delete the columns with the amino acid pairs feature columns
- Open classI-peps-fimo-scores.csv with excel
- Copy fimo scores and paste into the params file
- Open classI-peps-classification.csv
- Copy classifications and paste into params file
- Open classII-peps-HMM.csv in excel
- Copy HMM scores into params file
- Calculate score for peptides (+5 for svm classification, +5 for match with leader meme, +5 for match with precursor HMM, +2 for 2 or more Cys in core, +4 if core ends in KRC)
- If peptide score is greater than 10, it is predicted to be a precursor peptide