# Moving window analysis (MWA) of cyano BGC

This document outlines the workflow implemented for a moving window analysis of specific modules of cyanobacterial natural product producing BGCs (biosynthetic gene clusters). 

## Creation of software environment to do the analysis

This is the prologue to create a bioinformatics software environment for the required moving window analysis. 

Note: This needs to be done only once and need not be repeated everytime the workflow is carried out. 

In [1]:
# Initialise the conda environment if not already initialised
source activate

(base) 

: 1

In [6]:
conda env create -f 00-conda_recipe/MWA_cyano_env.yml

Collecting package metadata (repodata.json): done
Solving environment: done

Downloading and Extracting Packages
r-cli-2.0.2          | 396 KB    | ##################################### | 100% 
r-plogr-0.2.0        | 19 KB     | ##################################### | 100% 
r-optparse-1.6.6     | 79 KB     | ##################################### | 100% 
r-generics-0.0.2     | 75 KB     | ##################################### | 100% 
r-purrr-0.3.4        | 407 KB    | ##################################### | 100% 
r-rprojroot-1.3_2    | 94 KB     | ##################################### | 100% 
r-whisker-0.4        | 80 KB     | ##################################### | 100% 
r-lattice-0.20_41    | 1.1 MB    | ##################################### | 100% 
freetype-2.10.2      | 874 KB    | ##################################### | 100% 
r-backports-1.1.7    | 80 KB     | ##################################### | 100% 
r-fs-1.4.1           | 290 KB    | ##################################### | 10

r-haven-2.2.0        | 344 KB    | ##################################### | 100% 
r-modelr-0.1.8       | 218 KB    | ##################################### | 100% 
r-tidyr-1.0.3        | 742 KB    | ##################################### | 100% 
r-lifecycle-0.2.0    | 112 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate MWA_cyano_env
#
# To deactivate an active environment, use
#
#     $ conda deactivate

(base) 

: 1

In [7]:
# Activate the environment
conda activate MWA_cyano_env

(base) (MWA_cyano_env) 

: 1

In [8]:
conda list > 00-conda_recipe/env_package_list.txt

(MWA_cyano_env) 

: 1

## Start analysis for pairs

We will use the package pyfasta to create custom windows where window size will be set at `50` and overlap of `25` is specified. The help documentation provides the usage of the tool.

### The bash script that runs below does a number of steps:
- creates output folders
- runs alignment of input files
- replaces hyphens in the alignment with X 
- demultiplexes the file to generate individual fasta
- fragments the demultiplexed fasta into 50aa size with 25aa overlap.

In [65]:
bash 02-bash_scripts/gen_blast_input.sh

creating new files:
fragmented_pairs/p1/M1_nostopeptinB.split.50mer.25overlap.fa
creating new files:
fragmented_pairs/p1/M1_nostophycin.split.50mer.25overlap.fa
creating new files:
fragmented_pairs/p10/M6_micropeptin88A.split.50mer.25overlap.fa
creating new files:
fragmented_pairs/p10/M7_cyanopeptolin963A.split.50mer.25overlap.fa
creating new files:
fragmented_pairs/p11/M3_oscillapeptinG.split.50mer.25overlap.fa
creating new files:
fragmented_pairs/p11/M6_oscillapeptinG.split.50mer.25overlap.fa
creating new files:
fragmented_pairs/p12/M5_oscillapeptinE.split.50mer.25overlap.fa
creating new files:
fragmented_pairs/p12/M6_oscillapeptinG.split.50mer.25overlap.fa
creating new files:
fragmented_pairs/p13/M7_cyanopeptolin963A.split.50mer.25overlap.fa
creating new files:
fragmented_pairs/p13/M8_oscillapeptinG.split.50mer.25overlap.fa
creating new files:
fragmented_pairs/p14/M2_oscillagininB.split.50mer.25overlap.fa
creating new files:
fragmented_pairs/p14/M7_cyanopeptolin963A.split.50mer.25ov

: 1

## Running blast for the pairs

Generate blast inputs

In [66]:
# Pair1 M1_nostopeptinB v M1_nostophycin
blastp -subject fragmented_pairs/p1/M1_nostopeptinB.split.50mer.25overlap.fa \
-query fragmented_pairs/p1/M1_nostophycin.split.50mer.25overlap.fa \
-out blast_out/pair1_blast_out.txt -outfmt 6 

#Pair 2 M1_nostopeptinB v M1_nostopeptinE
blastp -subject fragmented_pairs/p2/M1_nostopeptinB.split.50mer.25overlap.fa \
-query fragmented_pairs/p2/M1_nostopeptinE.split.50mer.25overlap.fa \
-out blast_out/pair2_blast_out.txt -outfmt 6

# Pair 3 M5_micropeptinK139 v M6_micropeptin88A
blastp -subject fragmented_pairs/p3/M5_micropeptinK139.split.50mer.25overlap.fa \
-query fragmented_pairs/p3/M6_micropeptin88A.split.50mer.25overlap.fa \
-out blast_out/pair3_blast_out.txt -outfmt 6

# Pair 4 M5_micropeptinK139 v M6_micropeptin88A
blastp -subject fragmented_pairs/p4/M5_micropeptinK139.split.50mer.25overlap.fa \
-query fragmented_pairs/p4/M7_micropeptin139.split.50mer.25overlap.fa \
-out blast_out/pair4_blast_out.txt -outfmt 6

#Pair 5 M5_nostopeptinB v M5_nostopeptinE
blastp -subject fragmented_pairs/p5/M5_nostopeptinB.split.50mer.25overlap.fa \
-query fragmented_pairs/p5/M5_nostopeptinE.split.50mer.25overlap.fa \
-out blast_out/pair5_blast_out.txt -outfmt 6

#Pair 6 M5_nostopeptinB v M7_nostopeptinB
blastp -subject fragmented_pairs/p6/M5_nostopeptinB.split.50mer.25overlap.fa \
-query fragmented_pairs/p6/M7_nostopeptinB.split.50mer.25overlap.fa \
-out blast_out/pair6_blast_out.txt -outfmt 6

#Pair 7 M5_cyanopeptolin984 v M5_oscillapeptinE
blastp -subject fragmented_pairs/p7/M5_cyanopeptolin984.split.50mer.25overlap.fa \
-query fragmented_pairs/p7/M5_oscillapeptinE.split.50mer.25overlap.fa \
-out blast_out/pair7_blast_out.txt -outfmt 6

#Pair 8 M5_oscillapeptinE v M7_cyanopeptolin984
blastp -subject fragmented_pairs/p8/M5_oscillapeptinE.split.50mer.25overlap.fa \
-query fragmented_pairs/p8/M7_cyanopeptolin984.split.50mer.25overlap.fa \
-out blast_out/pair8_blast_out.txt -outfmt 6

#Pair 9 M5_cyanopeptolin963A v M6_micropeptin88A
blastp -subject fragmented_pairs/p9/M5_cyanopeptolin963A.split.50mer.25overlap.fa \
-query fragmented_pairs/p9/M6_micropeptin88A.split.50mer.25overlap.fa \
-out blast_out/pair9_blast_out.txt -outfmt 6

#Pair 10 M6_micropeptin88A v M7_cyanopeptolin963A
blastp -subject fragmented_pairs/p10/M6_micropeptin88A.split.50mer.25overlap.fa \
-query fragmented_pairs/p10/M7_cyanopeptolin963A.split.50mer.25overlap.fa \
-out blast_out/pair10_blast_out.txt -outfmt 6

#Pair 11 M3_oscillapeptinG v M6_oscillapeptinG
blastp -subject fragmented_pairs/p11/M3_oscillapeptinG.split.50mer.25overlap.fa \
-query fragmented_pairs/p11/M6_oscillapeptinG.split.50mer.25overlap.fa \
-out blast_out/pair11_blast_out.txt -outfmt 6

#Pair 12 M5_oscillapeptinE v M6_oscillapeptinG
blastp -subject fragmented_pairs/p12/M5_oscillapeptinE.split.50mer.25overlap.fa \
-query fragmented_pairs/p12/M6_oscillapeptinG.split.50mer.25overlap.fa \
-out blast_out/pair12_blast_out.txt -outfmt 6

#Pair 13 M7_cyanopeptolin963A v M8_oscillapeptinG
blastp -subject fragmented_pairs/p13/M7_cyanopeptolin963A.split.50mer.25overlap.fa \
-query fragmented_pairs/p13/M8_oscillapeptinG.split.50mer.25overlap.fa \
-out blast_out/pair13_blast_out.txt -outfmt 6

#Pair 14 M2_oscillagininB v M7_cyanopeptolin963A
blastp -subject fragmented_pairs/p14/M2_oscillagininB.split.50mer.25overlap.fa \
-query fragmented_pairs/p14/M7_cyanopeptolin963A.split.50mer.25overlap.fa \
-out blast_out/pair14_blast_out.txt -outfmt 6

# Pair15 -- M2_anabaenolysinA v M2_anabaenolysinC

blastp -subject fragmented_pairs/p15/M2_anabaenolysinA.split.50mer.25overlap.fa \
-query fragmented_pairs/p15/M2_anabaenolysinC.split.50mer.25overlap.fa \
-out blast_out/pair15_blast_out.txt -outfmt 6

# Pair17 -- M2_microginin v M6_hassallidin
blastp -subject fragmented_pairs/p17/M2_microginin.split.50mer.25overlap.fa \
-query fragmented_pairs/p17/M2_oscillaginin.split.50mer.25overlap.fa \
-out blast_out/pair17_blast_out.txt -outfmt 6

# Pair16 -- M2_anabaenolysinC v M6_hassallidin
blastp -subject fragmented_pairs/p16/M2_anabaenolysinC.split.50mer.25overlap.fa \
-query fragmented_pairs/p16/M6_hassallidin.split.50mer.25overlap.fa \
-out blast_out/pair16_blast_out.txt -outfmt 6

# Pair18 -- M2_oscillaginin v M7_cyanopeptolin963A
blastp -subject fragmented_pairs/p18/M2_oscillaginin.split.50mer.25overlap.fa \
-query fragmented_pairs/p18/M7_cyanopeptolin963A.split.50mer.25overlap.fa \
-out blast_out/pair18_blast_out.txt -outfmt 6

# Pair19 -- M1_nostopeptolide v M2_speudospumigin
blastp -subject fragmented_pairs/p19/M1_nostopeptolide.split.50mer.25overlap.fa \
-query fragmented_pairs/p19/M2_speudospumigin.split.50mer.25overlap.fa \
-out blast_out/pair19_blast_out.txt -outfmt 6

# Pair20 -- M2_speudospumigin v M2_spumigin
blastp -subject fragmented_pairs/p20/M2_speudospumigin.split.50mer.25overlap.fa \
-query fragmented_pairs/p20/M2_spumigin.split.50mer.25overlap.fa \
-out blast_out/pair20_blast_out.txt -outfmt 6

# Pair21 -- M3_microginin v M3_oscillaginin
blastp -subject fragmented_pairs/p21/M3_microginin.split.50mer.25overlap.fa \
-query fragmented_pairs/p21/M3_oscillaginin.split.50mer.25overlap.fa \
-out blast_out/pair21_blast_out.txt -outfmt 6

# Pair22 -- M3_microginin v M3_oscillaginin -- Mislabeled File
blastp -subject fragmented_pairs/p22/M3_microginin.split.50mer.25overlap.fa \
-query fragmented_pairs/p22/M3_oscillaginin.split.50mer.25overlap.fa \
-out blast_out/pair22_mislabeledFasta_blast_out.txt -outfmt 6

# Pair23 -- M4_bacillomycinD v M4_bacillomycinF 
blastp -subject fragmented_pairs/p23/M4_bacillomycinD.split.50mer.25overlap.fa \
-query fragmented_pairs/p23/M4_bacillomycinF.split.50mer.25overlap.fa \
-out blast_out/pair23_blast_out.txt -outfmt 6

# Pair24 -- M4_bacillomycinD v M7_fengycin 
blastp -subject fragmented_pairs/p24/M4_bacillomycinD.split.50mer.25overlap.fa \
-query fragmented_pairs/p24/M7_fengycin.split.50mer.25overlap.fa \
-out blast_out/pair24_blast_out.txt -outfmt 6

# Pair25 -- M4_microcystinLR v M4_microcystinRR 
blastp -subject fragmented_pairs/p25/M4_microcystinLR.split.50mer.25overlap.fa \
-query fragmented_pairs/p25/M4_microcystinRR.split.50mer.25overlap.fa \
-out blast_out/pair25_blast_out.txt -outfmt 6

# Pair26 -- M4_microcystinRR v M6_microcystinRR 
blastp -subject fragmented_pairs/p26/M4_microcystinRR.split.50mer.25overlap.fa \
-query fragmented_pairs/p26/M6_microcystinRR.split.50mer.25overlap.fa \
-out blast_out/pair26_blast_out.txt -outfmt 6

# Pair27 -- M4_anabaenopeptin915 v M5_anabaenopeptin915 
blastp -subject fragmented_pairs/p27/M4_anabaenopeptin915.split.50mer.25overlap.fa \
-query fragmented_pairs/p27/M5_anabaenopeptin915.split.50mer.25overlap.fa \
-out blast_out/pair27_blast_out.txt -outfmt 6

# Pair28 -- M5_anabaenopeptin915 v M5_anabaenopeptinA 
blastp -subject fragmented_pairs/p28/M5_anabaenopeptin915.split.50mer.25overlap.fa \
-query fragmented_pairs/p28/M5_anabaenopeptinA.split.50mer.25overlap.fa \
-out blast_out/pair28_blast_out.txt -outfmt 6

# Pair29 -- M1_nostopeptolide v M6_anabaenopeptin915 
blastp -subject fragmented_pairs/p29/M1_nostopeptolide.split.50mer.25overlap.fa \
-query fragmented_pairs/p29/M6_anabaenopeptin915.split.50mer.25overlap.fa \
-out blast_out/pair29_blast_out.txt -outfmt 6

# Pair30 -- M6_anabaenopeptin915 v M6_anabaenopeptinA 
blastp -subject fragmented_pairs/p30/M6_anabaenopeptin915.split.50mer.25overlap.fa \
-query fragmented_pairs/p30/M6_anabaenopeptinA.split.50mer.25overlap.fa \
-out blast_out/pair30_blast_out.txt -outfmt 6

# Pair31 -- M6_mojavensin v M6_mycosubtilin 
blastp -subject fragmented_pairs/p31/M6_mojavensin.split.50mer.25overlap.fa \
-query fragmented_pairs/p31/M6_mycosubtilin.split.50mer.25overlap.fa \
-out blast_out/pair31_blast_out.txt -outfmt 6

# Pair32 -- M6_mojavensin v M7_mojavensin 
blastp -subject fragmented_pairs/p32/M6_mojavensin.split.50mer.25overlap.fa \
-query fragmented_pairs/p32/M7_mojavensin.split.50mer.25overlap.fa \
-out blast_out/pair32_blast_out.txt -outfmt 6

# Pair33 -- M1_fusaricidin v M7_bacillomycinF 
blastp -subject fragmented_pairs/p33/M1_fusaricidin.split.50mer.25overlap.fa \
-query fragmented_pairs/p33/M7_bacillomycinF.split.50mer.25overlap.fa \
-out blast_out/pair33_blast_out.txt -outfmt 6

# Pair34 -- M7_bacillomycinF v M7_iturin 
blastp -subject fragmented_pairs/p34/M7_bacillomycinF.split.50mer.25overlap.fa \
-query fragmented_pairs/p34/M7_iturin.split.50mer.25overlap.fa \
-out blast_out/pair34_blast_out.txt -outfmt 6

# Pair35 -- M6_mycosubtilin v M7_iturin 
blastp -subject fragmented_pairs/p35/M6_mycosubtilin.split.50mer.25overlap.fa \
-query fragmented_pairs/p35/M7_iturin.split.50mer.25overlap.fa \
-out blast_out/pair35_blast_out.txt -outfmt 6

# Pair36 -- M7_iturin v M7_mojavensin 
blastp -subject fragmented_pairs/p36/M7_iturin.split.50mer.25overlap.fa \
-query fragmented_pairs/p36/M7_mojavensin.split.50mer.25overlap.fa \
-out blast_out/pair36_blast_out.txt -outfmt 6

# Pair37 -- M2_polymyxinP v M7_polymyxinP 
blastp -subject fragmented_pairs/p37/M2_polymyxinP.split.50mer.25overlap.fa \
-query fragmented_pairs/p37/M7_polymyxinP.split.50mer.25overlap.fa \
-out blast_out/pair37_blast_out.txt -outfmt 6

# Pair38 -- M2_polymyxinP v M7_polymyxinP 
blastp -subject fragmented_pairs/p38/M7_polymyxinB.split.50mer.25overlap.fa \
-query fragmented_pairs/p38/M7_polymyxinP.split.50mer.25overlap.fa \
-out blast_out/pair38_blast_out.txt -outfmt 6




(MWA_cyano_env) (MWA_cyano_env) (MWA_cyano_env) (MWA_cyano_env) (MWA_cyano_env) (MWA_cyano_env) (MWA_cyano_env) (MWA_cyano_env) (MWA_cyano_env) (MWA_cyano_env) 

: 1

# Parse Blast Output

Generates
- Output tables with pairwise fragement blast result
- Output pdf files of plots of fragement number and percent identity

In [67]:
bash 02-bash_scripts/gen_results_MWA_from_blastOut.sh

(MWA_cyano_env) 

: 1