Microbiomes are inherently linked by their structural similarity, yet the global patterns and features of such similarity are not clear. Here we propose a search-based microbiome transition network to probe the microbiome similarity globally. By traversing a composition-similarity based network of 177,022 microbiomes, we show that although their compositions are distinct by habitat, each microbiome is on-average only seven neighbors from any other microbiome on Earth, indicating the inherent homology of microbiome at global scale. This network is scale-free, suggesting a high degree of stability and robustness in microbiome transition. By tracking the minimum spanning tree in this network, a global roadmap of microbiome dispersal was derived that tracks the potential formulation of microbial diversity. Such search-based global microbiome networks, reconstructed within hours on just one computing node, provide a readily expanded reference for tracing the origin and evolution of existing or new microbiome datasets.
Microbiome Search Engine (MSE) is a microbiome database platform for searching query microbiomes against the global metagenome data space based on the whole-community-level similarity using Meta-Storms algorithm and it contains 177,022 samples in total. We consider that direct transition possibly exists between sample pairs with significant similarities that cause permutation p-value < 0.01, so that the Meta-Storms similarity of 0.868 is defined as the threshold for direct transition between microbiomes. The search-based microbiome network is built using MSE which can be freely accessible as an online service via http://mse.ac.cn.
For each sample of the input 177,022 microbiomes, we searched it against all other samples for the top 100 matches and connected it with the matched samples that have similarity higher than the threshold of direct transition (0.868), whose output file is "query.out". Moreover, for standalone searches of customized microbiome databases, the kernel and tutorial of MSE are provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms).
The meta-data of the 177,022 samples is available meta-data.
Sample type | Habitat | Number of samples |
---|---|---|
Human associated | Gut | 51,076 |
Skin | 19,455 | |
Oral | 10,896 | |
Ohter human body-site | 3,018 | |
Urogenital | 1,204 | |
Nose | 489 | |
Animal associated | Mammal animal | 29,918 |
Non-mammal animal | 11,172 | |
Environmental | Building | 11,248 |
Soil | 10,507 | |
Marine water | 6,090 | |
Lake | 4,234 | |
Plant | 3,456 | |
Freshwater | 3,112 | |
River | 2,248 | |
Milk | 1,636 | |
Sand | 968 | |
Food | 780 | |
Other | Other | 4,074 |
Mock | 811 | |
Total | 177,022 |
This is an implementation of the Microbiomenetwork. This folder contains all of scripts for Closure, Dijkstra and MST( Minimum-cost Spanning Tree) analysis.
- g++ (GCC) >= 4.8.5
- Python3
A closure is a set of nodes (microbiomes), in which each microbiome can traverse to any other one by direct or indirect transitions (with finite steps).
a. Compile
g++ closure.cpp -o closure
b. Run
./closure query.out closure.out 0.868
in which "query.out" is the search results from MSE, "closure.out" the closure result and "0.868" is the the statistical threshold of the significant high value to define the direct transition
Dijkstra algorithm is used to compute the pairwise shortest transition steps of all sample pairs in the main closure.
a. Python Environment
For statistical analysis of the microbiome transition network, the python scripts requires python3 and "igraph" package (https://igraph.org/python/) which can be installed using pip:
pip install python-igraph
b. Run
python get_diameter.py query.out diameter.txt
in which "query.out" is the search results from MSE, the first line of diameter.txt is the diameter (the maximum number of edges in the shortest path between any pair of its nodes) of the microbiome transition network, and the next line is the nodes in the shortest path.
python Dijkstra.py query.out shortest_path
in which "query.out" is the search results from MSE. It will produce two result files, "shortest_path.info" and "shortest_path.value", which respectively includes a matrix represents the shortest path between every pair of nodes in the network and its length. If a pair of nodes are unconnected, it will be represented by "oo" and "inf" in the two files.
The “microbial dispersal” roadmap can be derived by parsing the Minimum Spanning Tree (MST) of the main closure using the Kruskal algorithm.
a. Compile
g++ Kruskal.cpp -o Kruskal -std=c++11
b. Run
python graph-query.py query.out sample.graph
Kruskal sample.graph sample.mst
python mst-habitat.py sample.mst meta.txt habitat.graph
Kruskal habitat.graph habitat.mst
in which "query.out" is the search results from MSE;
"sample.graph" is the search-based microbiome network, of which every line shows the start and end node of an edge with its length (similarity of the pair of samples);
"sample.mst" is the first level MST on "sample resolution";
"meta.txt" is the meta-data of samples;
"habitat.graph" is the habitat-based network generated by "sample.mst", in which each node represents one habitat and distance between two habitats is the average distance of all edges that linked the two habitats in the MST;
"habitat.mst" is the second MST on "habitat resolution".
This folder includes all the data necessary for generating the Figures.