-
Notifications
You must be signed in to change notification settings - Fork 0
02_HUMAN_REF_DECONTAM
The low DNA samples especially suffered from human contamination, though phyloflash results give sort of unclear indication about the extent to which the sequences are contaminated.
We turn to JGI bioinformatician Brian Bushnell's BBTools package for a genome mask he developed. He has made this reference available for download here: https://drive.google.com/u/0/uc?id=0B3llHR93L14wd0pSSnFULUlhcUk&export=download
He also explains how he developed the mask here: http://seqanswers.com/forums/showthread.php?t=42552
# Setting up where the executables and reference are.
bbmap='/export/dahlefs/work/Emily/myApps/bbmap/'
human_ref='/export/dahlefs/work/Emily/myApps/bbmap/hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz'
path_qc='/export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/01_QC'
# The process. From Brian's SEQAnswers page:
# You first have to index the reference, like this:
$bbmap/bbmap.sh ref=$human_ref -Xmx23g # might want to use 'screen'.
# It takes some time....
#So, this is the final command line:
for file in /export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/01_QC/*R1.fastq; \
do fname=$(basename ${file} | rev | cut -f2- -d"-" | rev ); \
$bbmap/bbmap.sh minid=0.95 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 \
path=. qtrim=rl trimq=10 untrim -Xmx23g in1=$path_qc/$fname-QUALITY_PASSED_R1.fastq \
in2=$path_qc/$fname-QUALITY_PASSED_R2.fastq out1=$fname-cleanR1.fq \
out2=$fname-cleanR2.fq outm=$fname-human.fq; done
# After the process finishes, we can look at the file sizes of the human data files
# to see how much was cleaned, and save this output to a file to integrate into our
# sample sheets later.
ls -l 02_HUMAN_Decontam/*human.fq | awk -v OFS='\t' '{print $5, $9}' > humanfilesizes.csv
# We later wanted to look at reads of human vs. the rest of the sample.
# Set up two loops, one for human and cleaned data.
for clean in 02_HUMAN_Decontam/*cleanR1.fq; do reads=$(grep -c "@" $clean); \
echo $clean $reads; done > clean_read_numbers.txt
# Went through the same check on the 2019 Loki samples just to make sure we didn't have high human contamination there.
screen
bbmap='/export/dahlefs/work/Emily/myApps/bbmap/'
path_qc='/export/dahlefs/work/Shotgun/Metagenomes_chimneys_2019/01_QC'
human_ref='/export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/02_HUMAN_Decontam/hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz'
for file in /export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/01_QC/*R1.fastq; do fname=$(basename ${file} | rev | cut -f2- -d"-" | rev ); $bbmap/bbmap.sh minid=0.95 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=/export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/02_HUMAN_Decontam/ qtrim=rl trimq=10 untrim -Xmx23g in1=$path_qc/$fname-QUALITY_PASSED_R1.fastq in2=$path_qc/$fname-QUALITY_PASSED_R2.fastq out1=/export/dahlefs/work/Metagenomes_chimneys_2019_workfolder/HUMAN_DECONTAM/$fname-cleanR1.fq out2=/export/dahlefs/work/Metagenomes_chimneys_2019_workfolder/HUMAN_DECONTAM/$fname-cleanR2.fq outm=/export/dahlefs/work/Metagenomes_chimneys_2019_workfolder/HUMAN_DECONTAM/$fname-human.fq; done
In 2020 Dahle group sent 60 samples for sequencing from various chimneys across the AMOR. The wiki here is to share the pipeline I used to process this dataset. The intent is to be specific about all steps involved, and to provide other lab members with this information so that they do not have to repeat the same time-consuming processes. By using my Git page, there is an added benefit of accountability and having someone to email if something doesn't work for you. :)