Earth microbial network project
The repository provides R script for the microbial network analysis of the Earth Microbiome Project (EMP).
- The combined network file is stored as entire_network/entire.graphml
- The original subnetworks for 14 environments are stored in network_original
- The subnetworks for 12 environments inferred from trimmed dataset are stored in network_trimmed
Abundance table from the EMP
- We then extracted 14 count matrices for 14 environmental categories at level-3 of the EMP ontology (Table S1) from the 90-bp Deblur BIOM table. We filtered the OTUs with relative abundance less than 0.001% and presenting in less than 10% of samples in corresponding count matrices of environments.
- The corresponding R script is rscript/otutable_subset.R.
- We kept 400 top-abundant ESVs and randomly selected 360 samples in the trimmed microbial community matrices.
- The corresponding R script is rscript/trim.R.
- Microbial taxon-taxon association networks were constructed by selecting Spearman correlation and Bray-Curtis dissimilarity measures.
- The corresponding R script is rscript/netinfer.R.
- The impact of environmental categories on the Spearman correlation of each edge in the network was assessed through dividing the absolute omission score (OS) (Spearman correlation without the environmental categories) by the absolute original Spearman score.
- The corresponding R script is rscript/os.R.
- Taxon–taxon counts at high taxonomic ranks were assessed for significance using the hypergeometric distribution in the R stats::phyper.
- Mutual exclusion versus co-presence analysis was performed using the binomial distribution implemented in the R stats::pbinom, with background probability estimated by the frequency of edges in the network.
- The corresponding R script is rscript/overrepresentation.R.
- Topological features were estimated with igraph package.
- The corresponding R script is rscript/nettopo.R.
Generalist and specialist edges
- Edges present in only one subnetwork were specialist edges, which were further clustered into two groups: a specialist edge linking a specialist vertex pair or the same linking a generalist vertex pair.
- The corresponding R script is rscript/generalist.R.
- We identified ten hubs at the top-degree from each subnetwork inferred from the 12 trimmed datasets.
- The corresponding R script is rscript/hub.R.
- We counted the number and percentage of negative edges in the subnetworks inferred from the 12 trimmed datasets
- The corresponding R script is rscript/negative.R.