-
Notifications
You must be signed in to change notification settings - Fork 0
PHYLOFLASH
# Define path to reads
PATH_READS2="01_QC"
# Define path to the database
PATH_DB="/export/dahlefs/work/Databases/Phyloflash_db/138/"
# Run phyloseq on the reads
# I wanted to simultaneously change the names of the files from here too...
for file in ${PATH_READS2}/*R1.fastq;
do
nam=$(basename ${file} | rev | cut -f2- -d"-" | rev | cut -f2- -d"-"); \
firstbit=$(basename ${file} | cut -f1 -d"-");
echo "analyzing...."${nam}
phyloFlash.pl -lib ${nam} -read1 ${PATH_READS2}/$firstbit-${nam}-QUALITY_PASSED_R1.fastq \
-read2 ${PATH_READS2}/$firstbit-${nam}-QUALITY_PASSED_R2.fastq -almosteverything -CPUs 10 \
-readlength 150 -dbhome ${PATH_DB}
done
# Compare different samples to one another, either in a heatmap
# or a barplot format.
phyloFlash_compare.pl --allzip --task heatmap, barplot
# Can also create long table outputs of the phyloFlash data from various comparisons.
files=$(ls *.NTUfull_abundance.csv| tr '\n' ',')
phyloFlash_compare.pl --csv ${files::-1} --task ntu_table --out PhyloFlash_To_Genus --level 6
# Here I compared groups of samples I picked out using my pick_input3.py program
while read groups;
do set=$(echo $groups | cut -f2 -d' ');
nam=$(echo $groups | cut -f1 -d'\');
phyloFlash_compare.pl --csv $set --task ntu_table --out $nam --level 7;
done < for_pfcompare_groups
I repeated the phyloflash analysis after human decontamination. For the Loki 2019 samples, I just created the human decontaminated files, then used grep to count the reads in the human reads file in a loop as shown below. Then deleted the cleaned and human files to save space on the server because the outcomes showed so few reads of human. Therefore we will move forward with the already existing co-assembly without any human cleaning step.
for k in *-human.fq;
do human_tot=$(grep -c "@" "$k"); \
echo $k" "$human_tot >> Human_Totals.txt;
done
# Example with Zetaproteobacteria:
# Taxa match & Abundance
# First I got the info about Zetaproteobacteria that were in some samples.
grep "Zeta" {GS19-ROV16*extracted*,GS19-ROV14*extracted*,GS19-ROV17*extracted*,GS19-ROV18*extracted*} > Zetas_good.txt
In each line in Zetas_good.txt it looks like:
GS19-ROV17-BS06.PFspades_27,453,5.795771,JQLW01000012.87223.88743,Bacteria;Proteobacteria;Zetaproteobacteria ....
In this case above, the number 453 after the OTU name (PFspades_27) is the abundance of this OTU in the sample.
# OTU seqs
### Define each "OTU" name in the Zetas_good file as something to grep on, extract between two different delimiters (: and ,)
### Grep on that name, save the result to the file.
### In grep: Passing the -A flag fetches the line after that you got a grep match for (just the first line down for us, since that line has the sequence).
### In grep: Passing the -h flag prevents grep from including the name of the file the match came from in its output.
while read line; do OTU=$(echo $line | cut -d":" -f2 | cut -d"," -f1); grep $OTU"_" {GS19-ROV16*rRNAs.final*,GS19-ROV14*rRNAs.final*,GS19-ROV17*rRNAs.final*,GS19-ROV18*rRNAs.final*} -A 1 -h; done < Zetas_good.txt > Zeta_OTUs.fa
In 2020 Dahle group sent 60 samples for sequencing from various chimneys across the AMOR. The wiki here is to share the pipeline I used to process this dataset. The intent is to be specific about all steps involved, and to provide other lab members with this information so that they do not have to repeat the same time-consuming processes. By using my Git page, there is an added benefit of accountability and having someone to email if something doesn't work for you. :)