Skip to content

02_HUMAN_REF_DECONTAM

eolesin edited this page Mar 23, 2021 · 9 revisions

We'd like to remove apparent human contamination from our samples.

The low DNA samples especially suffered from human contamination, though phyloflash results give sort of unclear indication about the extent to which the sequences are contaminated.

We turn to JGI bioinformatician Brian Bushnell's BBTools package for a genome mask he developed. He has made this reference available for download here: https://drive.google.com/u/0/uc?id=0B3llHR93L14wd0pSSnFULUlhcUk&export=download

He also explains how he developed the mask here: http://seqanswers.com/forums/showthread.php?t=42552

# Setting up where the executables and reference are.
bbmap='/export/dahlefs/work/Emily/myApps/bbmap/'
human_ref='/export/dahlefs/work/Emily/myApps/bbmap/hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz'
path_qc='/export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/01_QC'

# The process. From Brian's SEQAnswers page:
# You first have to index the reference, like this:
$bbmap/bbmap.sh ref=$human_ref -Xmx23g # might want to use 'screen'.
# It takes some time....

#So, this is the final command line:

for file in /export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/01_QC/*R1.fastq; \
do fname=$(basename ${file} | rev | cut -f2- -d"-" | rev ); \
$bbmap/bbmap.sh minid=0.95 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 \
path=. qtrim=rl trimq=10 untrim -Xmx23g in1=$path_qc/$fname-QUALITY_PASSED_R1.fastq \
in2=$path_qc/$fname-QUALITY_PASSED_R2.fastq out1=$fname-cleanR1.fq \
out2=$fname-cleanR2.fq outm=$fname-human.fq; done

# After the process finishes, we can look at the file sizes of the human data files
# to see how much was cleaned, and save this output to a file to integrate into our 
# sample sheets later.

ls -l 02_HUMAN_Decontam/*human.fq  | awk -v OFS='\t' '{print $5, $9}' > humanfilesizes.csv

# We later wanted to look at reads of human vs. the rest of the sample. 
# Set up two loops, one for human and cleaned data.
for clean in 02_HUMAN_Decontam/*cleanR1.fq; do reads=$(grep -c "@" $clean); \
echo $clean $reads; done > clean_read_numbers.txt



# Went through the same check on the 2019 Loki samples just to make sure we didn't have high human contamination there.
screen 

bbmap='/export/dahlefs/work/Emily/myApps/bbmap/'
path_qc='/export/dahlefs/work/Shotgun/Metagenomes_chimneys_2019/01_QC'
human_ref='/export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/02_HUMAN_Decontam/hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz'


for file in /export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/01_QC/*R1.fastq; do fname=$(basename ${file} | rev | cut -f2- -d"-" | rev ); $bbmap/bbmap.sh minid=0.95 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 path=/export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/02_HUMAN_Decontam/ qtrim=rl trimq=10 untrim -Xmx23g in1=$path_qc/$fname-QUALITY_PASSED_R1.fastq in2=$path_qc/$fname-QUALITY_PASSED_R2.fastq out1=/export/dahlefs/work/Metagenomes_chimneys_2019_workfolder/HUMAN_DECONTAM/$fname-cleanR1.fq out2=/export/dahlefs/work/Metagenomes_chimneys_2019_workfolder/HUMAN_DECONTAM/$fname-cleanR2.fq outm=/export/dahlefs/work/Metagenomes_chimneys_2019_workfolder/HUMAN_DECONTAM/$fname-human.fq; done

Clone this wiki locally