Skip to content

PHYLOFLASH

eolesin edited this page Mar 24, 2021 · 13 revisions

Examining diversity of rRNA genes within the quality filtered reads.

# Define path to reads
PATH_READS2="01_QC"

# Define path to the database
PATH_DB="/export/dahlefs/work/Databases/Phyloflash_db/138/"

# Run phyloseq on the reads
# I wanted to simultaneously change the names of the files from here too... 
for file in ${PATH_READS2}/*R1.fastq;
    do   
    nam=$(basename ${file} | rev | cut -f2- -d"-" | rev | cut -f2- -d"-"); \
    firstbit=$(basename ${file} | cut -f1 -d"-");
    
    echo "analyzing...."${nam} 
    
    phyloFlash.pl -lib ${nam} -read1 ${PATH_READS2}/$firstbit-${nam}-QUALITY_PASSED_R1.fastq \
    -read2 ${PATH_READS2}/$firstbit-${nam}-QUALITY_PASSED_R2.fastq -almosteverything -CPUs 10 \
    -readlength 150 -dbhome ${PATH_DB}

    done


# Compare different samples to one another, either in a heatmap
# or a barplot format.
phyloFlash_compare.pl --allzip --task heatmap, barplot

# Can also create long table outputs of the phyloFlash data from various comparisons.
files=$(ls *.NTUfull_abundance.csv| tr '\n' ',') 
phyloFlash_compare.pl --csv ${files::-1} --task ntu_table --out PhyloFlash_To_Genus --level 6

# Here I compared groups of samples I picked out using my pick_input3.py program

while read groups; 
    do set=$(echo $groups | cut -f2 -d' '); 
    nam=$(echo $groups | cut -f1 -d'\'); 
    phyloFlash_compare.pl --csv $set --task ntu_table --out $nam --level 7; 
    done < for_pfcompare_groups

Note:

I repeated the phyloflash analysis after human decontamination. For the Loki 2019 samples, I just created the human decontaminated files, then used grep to count the reads in the human reads file in a loop as shown below. Then deleted the cleaned and human files to save space on the server because the outcomes showed so few reads of human. Therefore we will move forward with the already existing co-assembly without any human cleaning step.

for k in *-human.fq; 
    do human_tot=$(grep -c "@" "$k"); \
    echo $k" "$human_tot >> Human_Totals.txt; 
done

Fishing for a certain phylogenetic group in your phyloflash data? Say no more!


# Example with Zetaproteobacteria:

# Taxa match & Abundance
# First I got the info about Zetaproteobacteria that were in some samples. 
grep "Zeta" {GS19-ROV16*extracted*,GS19-ROV14*extracted*,GS19-ROV17*extracted*,GS19-ROV18*extracted*} > Zetas_good.txt

In each line in Zetas_good.txt it looks like:
GS19-ROV17-BS06.PFspades_27,453,5.795771,JQLW01000012.87223.88743,Bacteria;Proteobacteria;Zetaproteobacteria ....
In this case above, the number 453 after the OTU name (PFspades_27) is the abundance of this OTU in the sample.

# OTU seqs
### Define each "OTU" name in the Zetas_good file as something to grep on, extract between two different delimiters (: and ,)
### Grep on that name, save the result to the file.
### In grep: Passing the -A flag fetches the line after that you got a grep match for (just the first line down for us, since that line has the sequence).
### In grep: Passing the -h flag prevents grep from including the name of the file the match came from in its output.
while read line; do OTU=$(echo $line | cut -d":" -f2 | cut -d"," -f1); grep $OTU"_" {GS19-ROV16*rRNAs.final*,GS19-ROV14*rRNAs.final*,GS19-ROV17*rRNAs.final*,GS19-ROV18*rRNAs.final*} -A 1 -h; done < Zetas_good.txt > Zeta_OTUs.fa

Clone this wiki locally