Fungi Identify the Geographic Origin of Dust Samples

This repository provides R code to conduct spatial source prediction of dust samples relying solely on their dust-associated fungal communities. These methods mark a new approach to forensic biology that could be used by scientists to identify the origin of dust or soil samples found on objects, clothing, or archaeological artifacts.

For more information, please see our associated publication:

Grantham NS, Reich BJ, Pacifici K, Laber EB, Menninger HL, Henley JB, Barberán A, Leff JW, Fierer N, Dunn RR (2015) Fungi Identify the Geographic Origin of Dust Samples, PLOS ONE

Get started

Fork or clone this repository onto your computer.
Open R and set the working directory to this directory. e.g., setwd("path/to/fungi-identify")
Run get-data.R to download the data from the 1000homes figshare repository (thanks Albert!) and munge it into csv.

Note: You only need to run get-data.R once to download the files locally. This fills the subdirectory data with S.csv (lon, lat coordinates of each home), X.csv (covariate info for each home), and Y.csv (presence/absence of taxa per home). A further subdirectory raw is created that holds the pre-munged txt, biom, and fa data files.

To set up one's working environment, run set-workspace.R. This file loads pertinent R packages, sources user-defined functions in functions.R, and loads and (slightly) reformats the csv data files in data.

Plot estimated fungi occurrence probabilities

Produce taxon-specific "hot spot" maps via kernel smoothing using plot-occurrence.R.

Demonstrate the model

demonstrate-model.R showcases the statistical analysis using a small subset of the taxa occurrence data over a single fold of the cross-validation.

Note: The purpose of this file is to demonstrate the steps behind our predictions in a computationally feasible manner. Unsurprisingly, the predictions produced by operating on the full data in cross-validate.R are much better than those produced here.

Replicate full analysis

The full analysis is conducted by cross-validate.R. It is recommended that this file be run on a server with many cores available. Make sure to set the number of available cores ncore. With the current size of the data (n = 1331 samples, m = 57304 fungi taxa), five-fold cross-validation across 10 cores required nearly 5 hours to complete. (Note: individual folds are not run in parallel; rather, the species are split into ncore many groups to ensure the size of M, the kernel smoothed matrix of estimated occurrence probabilities, and llike, the log-likelihood values, are not prohibitively large.)

This file produces results.RData containing Tgrid, a matrix of prediction points, and a list results of length nfold. Each element of results contains:

pmf.test and pmf.test2, probability mass function values over Tgrid for the locations relegated to the test and test2 sets,
Stest and Stest2, the true origin of the samples in test and test2, and
Stest.hat and Stest2.hat, the predicted geographic origin of sample in test and test2.

Analyze predictions

After results.RData is produced, analyze-predictions.R loads and analyzes the predictions of the statistical model overall and across several covariates.

Questions or comments?

We would love to hear from you. If you wish to speak about the motivation, scope, and direction of this project, consider contacting our corresponding author Robert R. Dunn (Rob_Dunn@ncsu.edu).

For questions regarding the specifics of the code provided here, please contact Neal S. Grantham (ngranth@ncsu.edu). If you would instead like to discuss the molecular sequencing methods and data provided at 1000homes figshare repository, please contact Albert Barberán (albert.barberan@colorado.edu).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fungi Identify the Geographic Origin of Dust Samples

Get started

Plot estimated fungi occurrence probabilities

Demonstrate the model

Replicate full analysis

Analyze predictions

Questions or comments?

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
figs		figs
.gitignore		.gitignore
README.md		README.md
analyze-predictions.R		analyze-predictions.R
cross-validate.R		cross-validate.R
demonstrate-model.R		demonstrate-model.R
functions.R		functions.R
get-data.R		get-data.R
plot-occurrence.R		plot-occurrence.R
set-workspace.R		set-workspace.R

nsgrantham/fungi-identify

Folders and files

Latest commit

History

Repository files navigation

Fungi Identify the Geographic Origin of Dust Samples

Get started

Plot estimated fungi occurrence probabilities

Demonstrate the model

Replicate full analysis

Analyze predictions

Questions or comments?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages