Skip to content

02_HUMAN_REF_DECONTAM

eolesin edited this page Jan 11, 2021 · 9 revisions

We'd like to remove apparent human contamination from our samples.

The low DNA samples especially suffered from human contamination, though phyloflash results give sort of unclear indication about the extent to which the sequences are contaminated.

We turn to JGI bioinformatician Brian Bushnell's BBTools package for a genome mask he developed. He has made this reference available for download here: https://drive.google.com/u/0/uc?id=0B3llHR93L14wd0pSSnFULUlhcUk&export=download

He also explains how he developed the mask here: http://seqanswers.com/forums/showthread.php?t=42552

# Setting up where the executables and reference are.
bbmap_path='/export/dahlefs/work/Emily/myApps/bbmap/'
human_ref='/export/dahlefs/work/Emily/myApps/bbmap/hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz'

# The process. From Brian's SEQAnswers page:
# You first have to index the reference, like this:
bbmap.sh ref=hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz -Xmx23g #might want to use 'screen'.
# It takes some time....

#So, this is the final command line:
bbmap.sh minid=0.95 maxindel=3 bwr=0.16 bw=12 quickmatch fast \
minhits=2 path=/path/to/hg19masked/ qtrim=rl trimq=10 untrim \
-Xmx23g in=reads.fq outu=clean.fq outm=human.fq

Clone this wiki locally