## Trim the unwanted sequences with Trimmomatics and FASTP
Layal Abo Khayal, PhD ***** 30 Sep 2024

## Trimmomatics

First to trim with Trimmomatics, by using a window of 4 bp and minimum quality score of 15, trim the edges of the reads which is phred score (quality score)lower than 3, allow the reads to be shortened but minimum 36 bp, and trim the adapters

In [None]:
%%bash
#!/bin/bash

#the script calls out Trimmomatic to perform trimming on the RNAseq reads

#path to where the data
BASE='/home/layal/Documents/EOC'

TRIMMOMATIC='/home/layal/Trimmomatic-0.39'

pathRNA=${BASE}/Raw_RNA

pathTrimRNAout=${BASE}/RNA_afterTrimming

## make folder of the RNAseq the output of the trimming
if [ ! -d ${pathTrimRNAout} ];
then
    mkdir ${pathTrimRNAout}
fi

for f in $pathRNA/*.fastq.gz ; 
do
	echo "the full path of the file is : $f"
	i=$((basename $f) | cut -f1,2,5 -d'_')
	echo "Calling Trimmomatic on $i"
	
	java -jar ${TRIMMOMATIC}/trimmomatic-0.39.jar SE -phred33 $f $pathTrimRNAout/${i}_trimmed.fq.gz	ILLUMINACLIP:${TRIMMOMATIC}/adapters/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 2>&1 | tee $pathTrimRNAout/log_$i.txt
	
done

**Unfortunately the outpi=ut of the trimming seems that just 2- 30 % of reads survived in most of the samples!**

| Sample | Input Reads | Survived Reads | Dropped |
| --- | --- | --- |---|
| OB_8_S1 | 44 290 907 | 2705511 (6.11%) | 41585396 (93.89%) |
| OB_9_S2 | 34 740 411 | 6509569 (18.74%) | 28230842 (81.26%)|
| OB_10_S3 | 44 484 737 | 1 030 815 (2.32%) | 43453922 (97.68%) |
| OB_11_S4 | 27 450 130 | 3 047 825 (11.10%) | 24402305 (88.90%) |
| OB_12_S5 | 31 592 577 | 3 484 619 (11.03%) | 28107958 (88.97%) |
| OB_13_S6 | 36 860 123 | 4 002 367 (10.86%) | 32857756 (89.14%) |
| YA_10_S7 | 32 437 191 | 4 118 618 (12.70%) | 28318573 (87.30%) |
| YA_11_S8 | 37 542 279 | 2 036 733 (5.43%) | 35505546 (94.57%) |
| YA_12_S9 | 59 503 826 | 2 177 410 (3.66%) | 57326416 (96.34%) |
| YA_13_S10 | 24 395 368 | 6 574 715 (26.95%)| 17820653 (73.05%) |
| YA_14_S11 | 22 238 665 | 5 894 828 (26.51%)| 16343837 (73.49%) |
| YA_15_S12 | 14 656 862 | 8507504 (58.04%) | 6149358 (41.96%) |

## FASTP tool for trimming

In [None]:
%%bash
# download the latest build
wget http://opengene.org/fastp/fastp
chmod a+x ./fastp


In [None]:
%%bash
#!/bin/bash

#the script calls out fastP to perform trimming on the RNAseq reads

#path to where the data
BASE='/home/layal/Documents/EOC'

pathRNA=${BASE}/Raw_RNA

trimmed=${BASE}/fastP_trimmed

## make folder of the RNAseq the output of the trimming
if [ ! -d ${trimmed} ];
then
    mkdir ${trimmed}
fi

for f in $pathRNA/*.fastq.gz ; 
do
	echo "the full path of the file is : $f"
	i=$((basename $f) | cut -f1,2,5 -d'_')
	echo "Calling fastP on $i"
	./fastp -i $f -o ${trimmed}/${i}_trimmed.fq -h ${trimmed}/${i}_report.html
done

| Sample | Input Reads | Survived Reads | Dropped |
| --- | --- | --- |---|
| OB_8_S1 | 44.290907 M | 3.502568 M (7.908%) | 40.775650 M (92.063%) |
| OB_9_S2 | 34.740411 M | 6.996079 M (20.138%) | 27.724597 M (79.805%) |
| OB_10_S3 | 44.484737 M | 1.658911 M (3.729%) | 42.812523 M (96.241%) |
| OB_11_S4 | 27.450130 M | 3.922357 M (14.289%) | 	23.519486 M (85.681%) |
| OB_12_S5 | 31.592577 M | 3.937689 M (12.464%) | 27.644989 M (87.505%) |
| OB_13_S6 | 36.860123 M | 4.459100 M (12.097%) | 32.383973 M (87.856%) |
| YA_10_S7 | 32.437191 M | 4.604432 M (14.195%) | 27.821014 M (85.769%) |
| YA_11_S8 | 37.542279 M | 2.513386 M (6.695%) | 35.020735 M (93.284%)|
| YA_12_S9 | 59.503826 M | 3.445588 M (5.791%) | 56.046559 M (94.190%) |
| YA_13_S10 | 24.395368 M | 6.983627 M (28.627%) | 17.397050 M (71.313%) |
| YA_14_S11 | 22.238665 M | 6.230978 M (28.019%) | 16343837 (73.49%) |
| YA_15_S12 | 14.656862 M | 8.744052 M (59.658%) | 15.997131 M (71.934%) |