-
Notifications
You must be signed in to change notification settings - Fork 0
02_HUMAN_REF_DECONTAM
The low DNA samples especially suffered from human contamination, though phyloflash results give sort of unclear indication about the extent to which the sequences are contaminated.
We turn to JGI bioinformatician Brian Bushnell's BBTools package for a genome mask he developed. He has made this reference available for download here: https://drive.google.com/u/0/uc?id=0B3llHR93L14wd0pSSnFULUlhcUk&export=download
He also explains how he developed the mask here: http://seqanswers.com/forums/showthread.php?t=42552
# Setting up where the executables and reference are.
bbmap='/export/dahlefs/work/Emily/myApps/bbmap/'
human_ref='/export/dahlefs/work/Emily/myApps/bbmap/hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz'
path_qc='/export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/01_QC'
# The process. From Brian's SEQAnswers page:
# You first have to index the reference, like this:
$bbmap/bbmap.sh ref=$human_ref -Xmx23g # might want to use 'screen'.
# It takes some time....
#So, this is the final command line:
for file in /export/dahlefs/work/Metagenomes_chimneys_2020_workfolder/01_QC/*R1.fastq; \
do fname=$(basename ${file} | rev | cut -f2- -d"-" | rev ); \
$bbmap/bbmap.sh minid=0.95 maxindel=3 bwr=0.16 bw=12 quickmatch fast minhits=2 \
path=. qtrim=rl trimq=10 untrim -Xmx23g in1=$path_qc/$fname-QUALITY_PASSED_R1.fastq \
in2=$path_qc/$fname-QUALITY_PASSED_R2.fastq out1=$fname-cleanR1.fq \
out2=$fname-cleanR2.fq outm=$fname-human.fq; done
In 2020 Dahle group sent 60 samples for sequencing from various chimneys across the AMOR. The wiki here is to share the pipeline I used to process this dataset. The intent is to be specific about all steps involved, and to provide other lab members with this information so that they do not have to repeat the same time-consuming processes. By using my Git page, there is an added benefit of accountability and having someone to email if something doesn't work for you. :)