Subsample reads file and run rmseq

Set variables in your bash terminal

sampled_reads=0
increment=50000
total_reads_number=1000000 # do not exceed the number of reads in your fastq file
fastq_file_R1=my_R1.fq # replace with path to your R1 fastq file
fastq_file_R2=my_R2.fq # replace with path to your R2 fastq file
refnuc=my_ref_nuc.fa # replace with path to your reference nucleotide fasta file
refprot=ref_prot.fa # replace with path to your reference protein fasta file

Subsample reads with seqtk and run rmseq pipeline

while [ "$sampled_reads" -lt "$total_reads_number" ]
do
  sampled_reads=$((sampled_reads+increment))
  echo "oooo Subsampling $sampled_reads reads with seqtk oooo"
  seqtk sample -s100 $fastq_file_R1 $sampled_reads > $sampled_reads_R1.fq
  seqtk sample -s100 $fastq_file_R2 $sampled_reads > $sampled_reads_R2.fq
  echo "oooo Running rmseq on $sampled_reads subsampled reads"
  rmseq run $sampled_reads_R1.fq $sampled_reads_R2.fq $refnuc $refprot $sampled_reads
done

Concatenate all amplicons.effect files obtained from different reads subsamplings

cat ./*/amplicons.effect >> all.amplicons.tab

Plot the number of consensus reads (from 10 reads) versus the number of reads (depth of sequencing) with Rstudio

Install and load the tidyverse package

install.packages("tidyverse")
library(tidyverse)

Import the concatenated consensus amplicon table as a dataframe

df_reads_subsampling <- read.table(file = "all_amplicons.tab", header = T)

Count the number of consensus amplicon obtained at the different sequencing depth (reads subsampling)

df_barcode_count <- df_reads_subsampling %>%
  group_by(sample) %>%
  count(sample, sort = T) %>%
  mutate(barcode_nb = n, reads_nb = as.integer(sample))

plot consensus reads versus number of reads

ggplot() +
stat_smooth(data = df_barcode_count, aes(x=reads_nb, y = barcode_nb)) +
xlab(label = "number of reads")+
ylab(label = "number of conseq\n(consensus of 10 reads)")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consensus_reads_nb_vs_reads.md

consensus_reads_nb_vs_reads.md

Subsample reads file and run rmseq

Set variables in your bash terminal

Subsample reads with seqtk and run rmseq pipeline

Concatenate all amplicons.effect files obtained from different reads subsamplings

Plot the number of consensus reads (from 10 reads) versus the number of reads (depth of sequencing) with Rstudio

Install and load the tidyverse package

Import the concatenated consensus amplicon table as a dataframe

Count the number of consensus amplicon obtained at the different sequencing depth (reads subsampling)

plot consensus reads versus number of reads

Files

consensus_reads_nb_vs_reads.md

Latest commit

History

consensus_reads_nb_vs_reads.md

File metadata and controls

Subsample reads file and run rmseq

Set variables in your bash terminal

Subsample reads with seqtk and run rmseq pipeline

Concatenate all amplicons.effect files obtained from different reads subsamplings

Plot the number of consensus reads (from 10 reads) versus the number of reads (depth of sequencing) with Rstudio

Install and load the tidyverse package

Import the concatenated consensus amplicon table as a dataframe

Count the number of consensus amplicon obtained at the different sequencing depth (reads subsampling)

plot consensus reads versus number of reads