PHYLOFLASH

Examining diversity of rRNA genes within the quality filtered reads.

# Define path to reads
PATH_READS2="01_QC"

# Define path to the database
PATH_DB="/export/dahlefs/work/Databases/Phyloflash_db/138/"

# Run phyloseq on the reads
# I wanted to simultaneously change the names of the files from here too... 
for file in ${PATH_READS2}/*R1.fastq;
    do   
    nam=$(basename ${file} | rev | cut -f2- -d"-" | rev | cut -f2- -d"-"); \
    firstbit=$(basename ${file} | cut -f1 -d"-");
    
    echo "analyzing...."${nam} 
    
    phyloFlash.pl -lib ${nam} -read1 ${PATH_READS2}/$firstbit-${nam}-QUALITY_PASSED_R1.fastq \
    -read2 ${PATH_READS2}/$firstbit-${nam}-QUALITY_PASSED_R2.fastq -almosteverything -CPUs 10 \
    -readlength 150 -dbhome ${PATH_DB}

    done


# Compare different samples to one another, either in a heatmap
# or a barplot format.
phyloFlash_compare.pl --allzip --task heatmap, barplot

# Can also create long table outputs of the phyloFlash data from various comparisons. 
# Here I compared groups of samples I picked out using my pick_input3.py program
while read groups; \
    do set=$(echo $groups | cut -f2 -d' '); \
    nam=$(echo $groups | cut -f1 -d'\'); \
    phyloFlash_compare.pl --csv $set --task ntu_table --out $nam_SpecLvl --level 7; \
done < for_pfcompare_groups

Note:

I repeated the phyloflash analysis after human decontamination

Fishing for a certain phylogenetic group in your phyloflash data? Say no more!


# Example with Zetaproteobacteria:

# Taxa match & Abundance
# First I got the info about Zetaproteobacteria that were in some samples. 
grep "Zeta" {GS19-ROV16*extracted*,GS19-ROV14*extracted*,GS19-ROV17*extracted*,GS19-ROV18*extracted*} > Zetas_good.txt

In each line in Zetas_good.txt it looks like:
GS19-ROV17-BS06.PFspades_27,453,5.795771,JQLW01000012.87223.88743,Bacteria;Proteobacteria;Zetaproteobacteria ....
In this case above, the number 453 after the OTU name (PFspades_27) is the abundance of this OTU in the sample.

# OTU seqs
### Define each "OTU" name in the Zetas_good file as something to grep on, extract between two different delimiters (: and ,)
### Grep on that name, save the result to the file.
### In grep: Passing the -A flag fetches the line after that you got a grep match for (just the first line down for us, since that line has the sequence).
### In grep: Passing the -h flag prevents grep from including the name of the file the match came from in its output.
while read line; do OTU=$(echo $line | cut -d":" -f2 | cut -d"," -f1); grep $OTU"_" {GS19-ROV16*rRNAs.final*,GS19-ROV14*rRNAs.final*,GS19-ROV17*rRNAs.final*,GS19-ROV18*rRNAs.final*} -A 1 -h; done < Zetas_good.txt > Zeta_OTUs.fa

In 2020 Dahle group sent 60 samples for sequencing from various chimneys across the AMOR. The wiki here is to share the pipeline I used to process this dataset. The intent is to be specific about all steps involved, and to provide other lab members with this information so that they do not have to repeat the same time-consuming processes. By using my Git page, there is an added benefit of accountability and having someone to email if something doesn't work for you. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PHYLOFLASH

Examining diversity of rRNA genes within the quality filtered reads.

Note:

Fishing for a certain phylogenetic group in your phyloflash data? Say no more!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally