DNA Sequence K-mer-Based Search for Microbiome Identification

We implemented a K-mer search for DNA sequences and analyzed its effectiveness comparing to a full alignment search in identifying microbiome DNA strands in Pittsburgh rivers.

DNA Preparation

Water samples were collected at six different locations (including pure water as control). DNAs were extracted and sequenced beforehand in the wet-lab.

The sequence reads are available in data/Fall2018CleanReads.fa. DNA reads quality control data and reports are available in dna_quality_control/.

We also used bacterial_16s_genes.fa, which contains bacterial 16s genes as the database that we search against.

Usage

Simply run microbiome_test.py and bacterial phylum plots will be generated in the fraction_plots/ directory with our pre-tuned parameters (K=8, thresh=.6 -> Acc 0.973) for k-mer search (see details here).

An analysis report on parameter search has been provided in kmer_size_acc_reports/ with accuracy data included.

We have run a full alignment (consult alignment.py for its implementation details) on the DNA reads against the s16 database on Google Cloud and the result is saved in data/*.txt. Runtime generated calculations have been saved in cache/*.pickle. You may perform further analysis on these.

Contributors

Anupam Pokharel and Dr. Kangas

License & Rules

MIT

CMU Academic Policies

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
cache		cache
data		data
dna_quality_control		dna_quality_control
fraction_plots		fraction_plots
kmer_size_acc_reports		kmer_size_acc_reports
.gitattributes		.gitattributes
.gitignore		.gitignore
Introduction Slides v1.pdf		Introduction Slides v1.pdf
Microbiome collection data.xlsx		Microbiome collection data.xlsx
README.md		README.md
alignment.py		alignment.py
microbiome.code-workspace		microbiome.code-workspace
microbiome_test-bak.py		microbiome_test-bak.py
microbiome_test.py		microbiome_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DNA Sequence K-mer-Based Search for Microbiome Identification

DNA Preparation

Usage

Contributors

License & Rules

About

Releases

Packages

Languages

johnzhang1999/microbiome

Folders and files

Latest commit

History

Repository files navigation

DNA Sequence K-mer-Based Search for Microbiome Identification

DNA Preparation

Usage

Contributors

License & Rules

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages