# Discovering Antibacterial Peptides in Bacterial And Fungal Genomes using ANTISMASH
Input files are "contigs.fasta" OR "scaffolds.fasta" files generated using an Assembly software.  
For the sake of this practical workshop we will be using ANTISMASH software (v.7).  
Complete guide for Installing and using ANTISMASH is available in https://docs.antismash.secondarymetabolites.org/install/  
It is possible to launch "ANTISMASH" online using their web interface https://antismash.secondarymetabolites.org/#!/start  
  

## Installing ANTISMASH
If minconda is not installed in your environment, follow the link to do so https://www.anaconda.com/docs/getting-started/miniconda/install#macos-linux-installation:how-do-i-verify-my-installers-integrity  
As miniconda is installed, i can recreate a virtual environment with all the required dependencies for ANTIISMASH to work in using a "environment.yaml" file that i have previously created using $conda env export function in another machine where everything is set.  


In [None]:
%%bash

#Installing miniconda *if not already installed*
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh #Download latest installer
bash ~/Miniconda3-latest-Linux-x86_64.sh    #Install


#COMMAND FOR CREATING THE .yml file
#conda env export --name antismash_env --no-builds > environment.yml
conda env create -f environment.yml

# If .yml file is not available, install launching the following lines
conda create -n antismash antismash
conda activate antismash
download-antismash-databases
conda deactivate

#Activate environment And download databases
conda activate antismash6 # antismash6 is the name of the environement I just created
download-antismash-databases # download antismash databases


antiSMASH 8.0.1


## Getting My Workbench Ready And launching ANTISMASH
First we will create a folder in which the output files of each sample will be gathered.  
We will then launch antismash with full options while multithreading to shorten the execution time.  

In [None]:
%%bash

output_00="/home/ramzi/Desktop/JOB" #Folder where other output folders are located

date 
echo -e "your current location is \"$(pwd)\""
echo -e "your output folder is $output_00"
mkdir -p "$output_00/ANTISMASH__test"

#Show ANTISMASH version
antismash --version

# Read sample names into an array
mapfile -t sample_names < "$output_00/samples.txt"
	total_samples=${#sample_names[@]}


#        for ((i=0; i<$total_samples; i++)); do                  #       <-------->  Uncomment this line and comment the line under it to launch for all the samples 
        for ((i=0; i<1; i++)); do                               #       <-------->  Test with one sample
		OUTPUT_ANTI="$output_00/ANTISMASH__test/${sample_names[$i]}"
		mkdir -p "$OUTPUT_ANTI"
		antismash ~/Desktop/JOB/assembly_spades/${sample_names[$i]}/contigs.fasta --output-dir "$OUTPUT_ANTI"  --genefinding-gff3 ~/Desktop/JOB/Prokka_annotation/${sample_names[$i]}/${sample_names[$i]}.gff --output-basename ${sample_names[$i]} --taxon bacteria --cpus 8
		sleep 20 
done

Different output files are generated, among which there is an "index.html" file. Double-clicking on it opens a webpage on which we can visualize an pverview of the results.  The overview shows the contigs or regions where a secondary metabolite has been identified.  
Selecting a region shows more details about it, like its protein family, Gene names, Coding sequences features etc.  
We can check visually which contig contains regions for a certain metabolite of interest (for example lassopeptide) in the .html file.  
OR we can use commandline to search for which contig has region containing a lassopeptide sequence as following :  

In [1]:
%%bash

for folder in $(find /home/ramzi/Desktop/JOB/ANTISMASH/ -mindepth 1 -maxdepth 1 -type d);
do name=$(basename ${folder})
output=$(awk '/lasso/ {found=1; for(i=NR-1000;i<NR;i++) if(buffer[i] ~ /ACCESSION/) print buffer[i]} /ACCESSION/ && found {exit} {buffer[NR]=$0}' ~/Desktop/JOB/ANTISMASH/${name}/${name}.gbk | awk -F " " '{print $2}')
  if [[ -n $output ]]; then
    echo -e "$output\t$name" >> ~/tmp.tab
  fi
done

# Comparing Biosynthetic Gene clusters using Clinker
Now after getting all the contigs names that contains lassopeptide regions, ill just sort the file so I get an output with the contig name in the 1st field and the sample name in the 2nd.

In [None]:
%%bash

awk 'NF >= 2 && $2 != ""' ~/tmp.tab |  sort -u > ~/tmp2.tab # lines which second field has a string in it
rm ~/tmp.tab && mv ~/tmp2.tab ~/tmp.tab

#showing the file content  with a header
echo -e "Contigs\tSamples"
while read -r contigs samples; do
  echo -e "$contigs\t$samples"
done < ~/tmp.tab 

mkdir links_clinker
while read -r contig sample; do
  echo "test"
  #ln -s "/home/ramzi/Desktop/JOB/ANTISMASH/${sample}/${contig}*.gbk" links_clinker/"${sample}_${contig}.gbk"
done < ~/tmp.tab 

#Launch clinker
clinker ~/Desktop/links_clinker/*.gbk

After launching clinker (which was installed in a separate virtual envvironment), biosynthetic clusters comparison will appear in an interactive webpage with relevant details.

# Using Abricate to set A CUSTOM database and Searching for genes of interest :
Abricate might be usefull for serching for a custom set of genes through our genomes.  
Complete Guide of Abricate usage and installation is available at https://github.com/tseemann/abricate  
As for ANTISMASH and CLINKER, we will bes using a conda virtual environment to ensure no package integrity for each program to function.
Here is an overview of it

In [None]:
%%bash
# Activate conda
conda activate

# Update conda if necessary
conda update -n base -c conda-forge conda

# Activate conda environment 
conda activate abricate

# Update abricate if necessary
conda update abricate

# Check if Everythign is installed
abricate --check

# Show available databases
abricate --list

# Set up my own Databse     ------->    read How to in the github page "https://github.com/tseemann/abricate"

# Loop through the genomes 
genomes=$(find My_FTP/*  -type d -name "GCA*" | sed 's/My_FTP\///g' ) # work on All genomes 
for name in ${genomes} ;
do
fasta_file="My_FTP/${name}/${name}_contigs.fa"

#abricate --db vfdb  $fasta_file > Abricate_Output/abricate_vfdb_${name}.tab
#abricate --db resfinder  $fasta_file > Abricate_Output/abricate_resfinder_${name}.tab
#abricate --db card  $fasta_file > Abricate_Output/abricate_card_${name}.tab
abricate --db Micol  $fasta_file > Abricate_Output/abricate_Micol_${name}.tab


done


# NOW LAUNCH THE SAME FOR MY 39 E.coli
while IFS=, read -r name ; do
echo "Treating Sample $name"
fasta_file="/media/ramzi/One Touch/JOB/temporary/anvio_39/${name}/E__coli_${name}_contigs.fa"

#grep -c "^>" "$fasta_file"


#abricate --db vfdb  "$fasta_file" >> Abricate_Output/abricate_vfdb_${name}.tab
#abricate --db resfinder  "$fasta_file" >> Abricate_Output/abricate_resfinder_${name}.tab
#abricate --db card  "$fasta_file" >> Abricate_Output/abricate_card_${name}.tab
abricate --db Micol  "$fasta_file" >> Abricate_Output/abricate_Micol_${name}.tab

done <  "/media/ramzi/One Touch/JOB/samples.txt"

# Summarize in a presence absence table VFDB GCA
abricate --summary Abricate_Output/abricate_vfdb_GCA.tab > Abricate_vfdb/summary_vfdb_GCA.tab

# Summarize in a presence absence table resfinder GCA
abricate --summary Abricate_Output/abricate_resfinder_GCA.tab > Abricate_vfdb/summary_resfinder_GCA.tab

At the End,  summary Table should be containing Blast identity percentage os each sample to genes of the databases used previously. This table can be used later to draw relevant heatmaps AND calculate statistics