## **Project**: Natural products from the Palaeolithic

## **Section**: Plasmid assemblies

Anan Ibrahim, 01.01.2022

**Contents**
 - **Step1**: Create conda envirorment with required dependencies if not already installed 
 - **Step2**: Download sequencing results from eurofins using commandline
 - **Step3**: Plasmid assembly

##########

**Step1**: Create conda envirorment with required dependencies if not already installed 

##########

In [None]:
# All conda envs can be found in EMN001_Paleofuran/02-scripts/ENVS_*.yml
conda env create -f plasmid_assembly.yml
conda env create -f samtools.yml

##########

**Step2**: Download sequencing results from eurofins using commandline

##########

*Manually*: Add the fasta/gff sequences of the plasmid in folders named by samples

In [None]:
mkdir /Net/Groups/ccdata/projects/ancientDNA/Plasmid-assembly/Input/Second_batch_ref_seq/
cd /Net/Groups/ccdata/projects/ancientDNA/Plasmid-assembly/Input/Second_batch_ref_seq/

*Manually*: Add the fastq sequences of the plasmid retrieved from eurofins (in the downloads folder below) in folders named by samples

In [None]:
mkdir /Net/Groups/ccdata/projects/ancientDNA/Plasmid-assembly/Input/Second_batch/
cd /Net/Groups/ccdata/projects/ancientDNA/Plasmid-assembly/Input/Second_batch/

In [None]:
cd /Net/Groups/ccdata/projects/ancientDNA/Plasmid-assembly/eurofins-downloads

wget -m --ftp-user=######### --ftp-password=######## ftp://ftp.gatc-biotech.com/2021-11-16/ 

How to deal with jupyter notebook on your local computer

 - If miniconda is not installed, install miniconda in your local directory. Close and reopen the terminal

 - Now create en env bytyping in the terminal: conda env create -n jupyter-notebook -c anaconda jupyter
 
 - To modify the jupyter notebook after downloading the recent copy of the JN from file-zilla: 
 
 - Activate the env: conda activate jupyter-notebook
 
 - Run by typing: jupyter-notebook

##########

**Step3**: Plasmid assembly

##########

*NOTE:* Before running please make sure you change the IN OUT REF directories paths according to the project.

*NOTE:* Before running please make sure you change the the file names in the IN OUT directory to always match *_3_1.fastq.gz and *_3_2.fastq.gz

In [None]:
#!/bin/bash

# Always remember to put all seq in folders named according to the samples
# "/Net/Groups/ccdata/projects/ancientDNA/Plasmid-assembly/Input/XXX_batch/" 
# Always remember to put all seq in
# "/Net/Groups/ccdata/projects/ancientDNA/Plasmid-assembly/Input/XXX_batch_ref_seq/" 
# before running the script

IN=/Net/Groups/ccdata/projects/ancientDNA/Plasmid-assembly/Input/Third_batch #change the names accordingly 
OUT=/Net/Groups/ccdata/projects/ancientDNA/Plasmid-assembly/Outputs/Third_batch1 #change the names accordingly
REF=/Net/Groups/ccdata/projects/ancientDNA/Plasmid-assembly/Input/Third_batch_ref_seq #change the names accordingly

eval "$(conda shell.bash hook)"
conda activate plasmid_assembly

mkdir $OUT
# (1) Create log file with all the tools versions
mkdir  $OUT/00_Log_files
conda list > $OUT/00_Log_files/plasmid_assembly_env_log.txt

# (2) Run FASTQC on raw data
mkdir  $OUT/01_FastQC_results
for F in $IN/*; do 
N=$(basename $F) ;
mkdir $OUT/01_FastQC_results/$N ;
cd "$F"; 
fastqc $F/*_3_1.fastq.gz $F/*_3_2.fastq.gz -t 30 -o $OUT/01_FastQC_results/$N;
done 

# (3) Run trimmomatic
mkdir  $OUT/02_Trimmomatic_results 
for F in $IN/*; do
N=$(basename $F) ;
mkdir $OUT/02_Trimmomatic_results/$N; 
cd "$F";
trimmomatic PE -threads 30 \
-trimlog $N.trimlog.txt \
-summary $N.stats.txt \
$F/*_3_1.fastq.gz $F/*_3_2.fastq.gz \
-baseout $N.filtered100.fastq.gz \
MINLEN:100 SLIDINGWINDOW:9:35 ; 
mv *.txt $OUT/02_Trimmomatic_results/$N
mv *_1P.fastq.gz $OUT/02_Trimmomatic_results/$N
mv *_2P.fastq.gz $OUT/02_Trimmomatic_results/$N
mv *_1U.fastq.gz $OUT/02_Trimmomatic_results/$N
mv *_2U.fastq.gz $OUT/02_Trimmomatic_results/$N
done 

# (4) Run FastQC again 
mkdir $OUT/03_FastQC_results_post_trim
for F in $OUT/02_Trimmomatic_results/*; do 
N=$(basename $F) ;
mkdir $OUT/03_FastQC_results_post_trim/$N ;
cd "$F"; 
fastqc $F/*_1P.fastq.gz $F/*_2P.fastq.gz -t 30 -o $OUT/03_FastQC_results_post_trim/$N;
done 

# (6) Unicycler_results_bold (most likely to produce a complete assembly but carries greater risk of misassembly.)
mkdir $OUT/05_Unicycler_results_bold
for F in $OUT/02_Trimmomatic_results/*; do 
N=$(basename $F) ;
cd "$F";
unicycler -1 $F/*_1P.fastq.gz -2 $F/*_2P.fastq.gz \
-t 30 \
--depth_filter 0.25 \
--mode bold \
--no_pilon \
--no_rotate \
-o $OUT/05_Unicycler_results_bold/$N;
done 

# (8) Run Quast on the assembled files (stats)
mkdir  $OUT/07_Quast_results

for F in $IN/*; do 
N=$(basename $F) ;
mkdir $OUT/07_Quast_results/$N ;
done

for F in $IN/*; do 
N=$(basename $F) ;
quast.py -o $OUT/07_Quast_results/$N \
-r $REF/$N/*.fasta \
-l conservative,bold,all-in-one \
-t 30 \
--min-contig 1000 \
-g $REF/$N/*.gff \
$OUT/05_Unicycler_results_bold/$N/assembly.fasta \
done 

# (09) Run Minimap2 allignment (for all three) 
mkdir  $OUT/08_Minimap2_results_bold
for F in $OUT/05_Unicycler_results_bold/*; do 
N=$(basename $F) ;
mkdir  $OUT/08_Minimap2_results_bold/$N
cd $OUT/08_Minimap2_results_bold/$N
minimap2 -ax asm5 $REF/$N/*.fasta $OUT/05_Unicycler_results_bold/$N/assembly.fasta > aln.sam ;
done 

# (10) Reference based alignment 
mkdir  $OUT/09_Minimap2_samtools_ref_alignment | cd

for F in $OUT/02_Trimmomatic_results/*; do 
N=$(basename $F) ;
mkdir  $OUT/09_Minimap2_samtools_ref_alignment/$N
cd $OUT/09_Minimap2_samtools_ref_alignment/$N
minimap2 -ax sr $REF/$N/*.fasta $OUT/02_Trimmomatic_results/$N/*_1P.fastq.gz $OUT/02_Trimmomatic_results/$N/*_2P.fastq.gz > $N.minimap2.sam;  
conda activate /Net/Groups/ccdata/apps/conda_envs/samtools
samtools sort $N.minimap2.sam -O SAM --threads 28 > $N.minimap2_sorted.sam;
samtools consensus -a --show-ins --show-del --low-MQ 20 --threads 29 $N.minimap2_sorted.sam -o $N.consensus_minimap2.fa;
done 

conda deactivate
conda deactivate

; echo "Plasmid assembly DONE" | mail -s "Plasmid assembly DONE" #######@outlook.com