GitHub - gamcil/synthaser_scripts: Python scripts used in synthaser manuscript

Scripts used in synthaser manuscript

File	Description
`sum_bitscores.py`	Script to sum bitscores of identical query/target sequence pairs in BLAST results.
`extract_PKS.py`	Script to extract PKS/NRPS sequences from MIBiG GenBank/JSON files.
`mibig/*`	synthaser output for MIBiG synthases
`ks_domains/*`	Files generated during KS domain network construction

Methods

Comparison to MIBiG domain architectures

Download MIBiG database GenBank and JSON dumps and extract contents:

wget https://dl.secondarymetabolites.org/mibig/mibig_json_2.0.tar.gz
wget https://dl.secondarymetabolites.org/mibig/mibig_gbk_2.0.tar.gz
tar xzvf mibig_json_2.0.tar.gz
tar xzvf mibig_gbk_2.0.tar.gz

Retrieve all annotated PKS sequences using extract_PKS.py:

python3 extract_PKS.py \
	mibig_gbk_2.0/ \               # GenBank folder
	mibig_json_2.0/ \              # JSON folder
	mibig_table.tsv \              # Output, table with MIBiG metadata
	--fasta mibig_synthases.fasta  # Output, FASTA file with PKS

Setup synthaser:

pip install --user synthaser

Run synthaser on PKS sequences, saving HTML plot and search session:

synthaser search \
	--query_file mibig_synthases.fasta \
	--json_file mibig_synthases.json \
	--output mibig_predictions.txt \
	--plot mibig.html \
	--long_form

The MIBiG metadata table (mibig_table.tsv) was then merged with the synthaser predictions table (mibig_predictions.txt). MIBiG domain architectures were copied from the 'NRPS/PKS domains' tab of each MIBiG entry, added to the table and compared to the predictions in the synthaser output.

Creation of the Aspergillus KS network

Retrieve sequences from NCBI containing the cond_enzymes conserved domain family, removing any unnecessary information from FASTA description lines:

esearch -db cdd -query 238201 |\
	elink -target protein |\
	efilter -query "Aspergillus"[ORGN] -source genbank |\
	efetch -format fasta |\
	sed 's/ .*$//g' - > synthases.faa

Analyse sequences using synthaser:

synthaser search \
	--query_file synthases.faa \
	--json_file synthases.json \
	--output architectures.txt \
	--long_form

Extract KS domains from the search session:

synthaser extract \
	synthases.json \  # Session file
	synthases_  \     # Output file prefix, e.g. synthases_KS.faa
	--mode domain \   # Specify domain extraction
	--types KS        # Specify KS domains

Build DIAMOND database from extracted KS domains:

diamond makedb --in synthases_KS.faa --db KS

Perform all vs all alignments:

diamond blastp --query domains.faa \
	--db KS.dmnd \
	--more-sensitive \
	--outfmt "6 qseqid sseqid bitscore" \
	--out KS_alignments.tsv

Sum bitscores of all non-overlapping high-scoring segment pairs (HSPs):

python3 sum_bitscores.py KS_alignments.tsv summed.tsv

The summed alignment table (summed.tsv) was then imported into CytoScape v3.7.2 to build a similarity network. Domain architecture predictions from synthaser (architectures.txt) were imported and connected to their corresponding nodes, which were then coloured based on an alphabetical ordering of architectures.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ks_domains		ks_domains
mibig		mibig
README.md		README.md
extract_pks.py		extract_pks.py
sum_bitscores.py		sum_bitscores.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scripts used in synthaser manuscript

Methods

Comparison to MIBiG domain architectures

Creation of the Aspergillus KS network

About

Releases

Packages

Languages

gamcil/synthaser_scripts

Folders and files

Latest commit

History

Repository files navigation

Scripts used in synthaser manuscript

Methods

Comparison to MIBiG domain architectures

Creation of the Aspergillus KS network

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages