reimagined-octo-funicular

Requirements -Python -Biopython -HMMER3 -MEME-Suite -Pfam-A database

Instructions for running lanthipeptide genome mining pipeline:

Note: replace text between < > with what is appropriate for you

Preparing genbank files

Download genbank and nucleotide fasta files from NCBI
Move file prepare_genbank/prepare-genbank-files.py to directory with the genbank and nucleotide fasta files
Edit the range in prepare-genbank-files.py to reflect the number of files you are preparing
Execute prepare-genbank-files.py
Move file prepare_genbank/index_gb.py to the directory with the genbank files
Execute index_gb.py

Extracting potential lanthipeptide biosynthetic gene clusters

Edit get_lanC.py so fh = open(<list-of-lanC-like-protein-accession-numbers.txt>) and fh2 = open(<directory_with_genbank_files>/ref_seq-acc.csv')
Execute get_lanC.py
Edit lanthipeptide_gb_mining.py so variables at the beginning of file match your computer
Execute lanthipeptide_gb_mining.py
Execute split_classes.py (may need to alter Pfam family IDs if versions are different in your Pfam database)
Edit pull-classes.py so fh1 = open() and fh2 = open('lanC_like-acc.csv')
Execute pull-classes.py, note results are put to stdout
Edit pull-classes.py so fh2 = open('lanC-like-peps.csv')
Execute pull-classes.py, note results are put to stdout

Identifying potential precursor peptides in class I clusters

Edit make-fasta.py so fh = open()
Execute make-fasta.py, note results are put to stdout
Execute fimo classI_leader-meme.txt
Copy fimo.tsv from created directory to current directory
Execute get_params_classI.py <classI-peps-params.csv>
Edit select-features.py so fh2 = open(<classI-peps-params.csv>)
Execute select-features.py
Execute classI_svm_classify.py
Execute fimo-compile.py
Open classI-peps-params.csv with excel
Delete the columns with the amino acid pairs feature columns
Open classI-peps-fimo-scores.csv with excel
Copy fimo scores and paste into the params file
Open classI-peps-classification.csv
Copy classifications and paste into params file
Calculate score for peptides (+5 for svm classification, +5 for match with leader meme, +2 for core pI less than 9, +2 for 2 or more Cys in core, +1 if leader ends in GG)
If peptide score is greater than 10, it is predicted to be a precursor peptide

Identifying potential precursor peptides in class II clusters

Edit make-fasta.py so fh = open()
Execute make-fasta.py, note results are put to stdout
Execute fimo classII_leader-meme_20190503.txt
Copy fimo.tsv from created directory to current directory
Execute hmmscan -o leader-hmm.out --tblout leader-hmm.tbl classII-precursors.hmm classII-peps.fa
Execute get_params_classII.py <classII-peps-params.csv>
Edit select-features.py so fh2 = open(<classII-peps-params.csv>)
Execute select-features.py
Execute classII_svm_classify.py
Execute fimo-compile.py
Execute hmmer-compile.py
Open classI-peps-params.csv with excel
Delete the columns with the amino acid pairs feature columns
Open classI-peps-fimo-scores.csv with excel
Copy fimo scores and paste into the params file
Open classI-peps-classification.csv
Copy classifications and paste into params file
Open classII-peps-HMM.csv in excel
Copy HMM scores into params file
Calculate score for peptides (+5 for svm classification, +5 for match with leader meme, +5 for match with precursor HMM, +2 for 2 or more Cys in core, +4 if core ends in KRC)
If peptide score is greater than 10, it is predicted to be a precursor peptide

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
cluster_identification		cluster_identification
precursor_identification		precursor_identification
prepare_genbank		prepare_genbank
README.md		README.md

Provide feedback