![image](silene.jpeg)

# sRNA profiles of flower dimorphism in _Silene latifolia_

#### Eddy J. Mendoza-Galindo
#### Advisor: Aline Muyle, CEFE Montpellier
April 2023

#### Exploration of sRNA abundance based on length and species, only for 21,22, 24

In [5]:
! cat scripts/count.sh

cd raw/sRNA_MGX/trimmed/
for file in *.fastq
do
echo "working with $file"
perl -e ' $count=0; $len=0; while(<>) { s/\r?\n//; s/\t/ /g; if (s/^@//) { if ($. != 1) { print "\n" } s/ |$/\t/; $count++; $_ .= "\t"; } else { s/ //g; $len += length($_) } print $_; } print "\n"; ' $file | sed -E 's/^.+\t(\w+)\+.*$/\1/g' | perl -e ' $col=0; while (<>) { s/\r?\n//; @F = split /\t/, $_; $len = length($F[$col]); print "$_\t$len\n" }; ' | awk '$2 ~ /(21|22|24)/ ' > ${file}_count.tsv
done


In [6]:
! bash scripts/count.sh

working with F1B_final_trimming.fastq
working with F1L_final_trimming.fastq
working with F2B_final_trimming.fastq
working with F2L_final_trimming.fastq
working with F3B_final_trimming.fastq
working with F3L_final_trimming.fastq
working with M1B_final_trimming.fastq
working with M1L_final_trimming.fastq
working with M2B_final_trimming.fastq
working with M2L_final_trimming.fastq
working with M4B_final_trimming.fastq
working with M4L_final_trimming.fastq


#### Select reads for 21, 22 and 24 nt long

In [1]:
! cat scripts/filter_size.sh

cd raw/
rm -r fastq/
mkdir fastq

out=fastq
files=sRNA_MGX/trimmed/*.fastq


for file in $files
do

name=$(echo $file | sed -E 's/^sR.*ed\/(\w+)_f.*/\1/g')

echo "WORKING WITH $name"

# Select reads of 21, 22 and 24 in length
seqtk comp $file | awk '$2 == 21' | cut -f 1 > 21.list
seqtk subseq $file 21.list > $out/${name}_21.fq

seqtk comp $file | awk '$2 == 22' | cut -f 1 > 22.list
seqtk subseq $file 22.list > $out/${name}_22.fq

seqtk comp $file | awk '$2 == 24' | cut -f 1 > 24.list
seqtk subseq $file 24.list > $out/${name}_24.fq

rm 21.list 22.list 24.list
cat ${name}_21.fq ${name}_22.fq ${name}_24.fq > ${name}_dicer.fq 
rm ${name}_21.fq ${name}_22.fq ${name}_24.fq 

done

In [4]:
! bash scripts/filter_size.sh

WORKING WITH F1B
21-nt sRNAS done
22-nt sRNAS done
24-nt sRNAS done
WORKING WITH F1L
21-nt sRNAS done
22-nt sRNAS done
24-nt sRNAS done
WORKING WITH F2B
21-nt sRNAS done
22-nt sRNAS done
24-nt sRNAS done
WORKING WITH F2L
21-nt sRNAS done
22-nt sRNAS done
24-nt sRNAS done
WORKING WITH F3B
21-nt sRNAS done
22-nt sRNAS done
24-nt sRNAS done
WORKING WITH F3L
21-nt sRNAS done
22-nt sRNAS done
24-nt sRNAS done
WORKING WITH M1B
21-nt sRNAS done
22-nt sRNAS done
24-nt sRNAS done
WORKING WITH M1L
21-nt sRNAS done
22-nt sRNAS done
24-nt sRNAS done
WORKING WITH M2B
21-nt sRNAS done
22-nt sRNAS done
24-nt sRNAS done
WORKING WITH M2L
21-nt sRNAS done
22-nt sRNAS done
24-nt sRNAS done
WORKING WITH M4B
21-nt sRNAS done
22-nt sRNAS done
24-nt sRNAS done
WORKING WITH M4L
21-nt sRNAS done
22-nt sRNAS done
24-nt sRNAS done


In [6]:
#check lengths after filtering
! seqtk comp raw/fastq/M2B_dicer.fq | cut -f 2 | sort | uniq 
! seqtk comp raw/fastq/F1L_dicer.fq | cut -f 2 | sort | uniq 

21
22
24
21
22
24


### Alingment and quantification

We followed the Shortstack workflow,
Only uniquely-aligned reads are used as weights for placement of multi-mapped reads.

conda activate ShortStack4

ShortStack --genomefile ../genome/silat.fa --readfile fastq/*.fq --threads 8 --knownRNAs caryophyllaceae_mirnas.fa --mmap u

### We performed the analysis also for 21-22 and 24 nt independtly:
ShortStack --genomefile ../genome/silat.fa --bamfile ShortStack_results/merged_alignments.bam --threads 8 --outdir only_21-22 --dicermax 22 --mmap u
##### To identify microRNAs in 21-22
ShortStack --genomefile ../genome/silat.fa --bamfile ShortStack_results/merged_alignments.bam --threads 8 --outdir only_21-22 --dicermax 22 --mmap u --knownRNAs caryophyllaceae_mirnas.fa 
#### 24-nt
ShortStack --genomefile ../genome/silat.fa --bamfile ShortStack_results/merged_alignments.bam --threads 8 --outdir only_24 --dicermin 23 --mmap u

#### We also performed a de-novo exploration of microRNAs, not finding any :
ShortStack --genomefile ../genome/silat.fa --bamfile ShortStack_results/merged_alignments.bam --threads 8 --outdir de_novo --mmap u --dn_mirna

No microRNA loci were found!


### Depth quantification

In [17]:
! bash scripts/mapping_depth.sh # Output is very heavy

WORKING WITH F1B
Extracting depths for the Forward strand
Extracting depths for the Reverse strand
WORKING WITH F1L
Extracting depths for the Forward strand
Extracting depths for the Reverse strand
WORKING WITH F2B
Extracting depths for the Forward strand
Extracting depths for the Reverse strand
cat: write error: No space left on device
WORKING WITH F2L
Extracting depths for the Forward strand
Extracting depths for the Reverse strand
cat: write error: No space left on device
WORKING WITH F3B
Extracting depths for the Forward strand
Extracting depths for the Reverse strand
^C


# RNA-seq analysis

In [23]:
# Check quality
#! mkdir rna-seq/fastqc
#! rna-seq/FastQC/fastqc -t 16 -o rna-seq/fastqc rna-seq/*.gz
! multiqc --outdir rna-seq/fastqc rna-seq/fastqc


  [34m/[0m[32m/[0m[31m/[0m ]8;id=864208;https://multiqc.info\[1mMultiQC[0m]8;;\ 🔍 [2m| v1.14[0m

[34m|           multiqc[0m | Search path : /home/eddy/silene/rna-seq/fastqc
[2K[34m|[0m         [34msearching[0m | [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m100%[0m [32m24/24[0m  /24[0m [2mrna-seq/fastqc/49_male_2_fastqc.html[0m
[?25h[34m|            fastqc[0m | Found 12 reports
[34m|           multiqc[0m | Compressing plot data
[34m|           multiqc[0m | Report      : rna-seq/fastqc/multiqc_report.html
[34m|           multiqc[0m | Data        : rna-seq/fastqc/multiqc_data
[34m|           multiqc[0m | MultiQC complete
