A scale-free, fully connected global transition network underlies known microbiome diversity

Description

Microbiomes are inherently linked by their structural similarity, yet the global patterns and features of such similarity are not clear. Here we propose a search-based microbiome transition network to probe the microbiome similarity globally. By traversing a composition-similarity based network of 177,022 microbiomes, we show that although their compositions are distinct by habitat, each microbiome is on-average only seven neighbors from any other microbiome on Earth, indicating the inherent homology of microbiome at global scale. This network is scale-free, suggesting a high degree of stability and robustness in microbiome transition. By tracking the minimum spanning tree in this network, a global roadmap of microbiome dispersal was derived that tracks the potential formulation of microbial diversity. Such search-based global microbiome networks, reconstructed within hours on just one computing node, provide a readily expanded reference for tracing the origin and evolution of existing or new microbiome datasets.

About the `Data` folder

Microbiome Search Engine (MSE) is a microbiome database platform for searching query microbiomes against the global metagenome data space based on the whole-community-level similarity using Meta-Storms algorithm and it contains 177,022 samples in total. We consider that direct transition possibly exists between sample pairs with significant similarities that cause permutation p-value < 0.01, so that the Meta-Storms similarity of 0.868 is defined as the threshold for direct transition between microbiomes. The search-based microbiome network is built using MSE which can be freely accessible as an online service via http://mse.ac.cn.

For each sample of the input 177,022 microbiomes, we searched it against all other samples for the top 100 matches and connected it with the matched samples that have similarity higher than the threshold of direct transition (0.868), whose output file is "query.out". Moreover, for standalone searches of customized microbiome databases, the kernel and tutorial of MSE are provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms).

The meta-data of the 177,022 samples is available meta-data.

The distribution of samples among the habitats

Sample type	Habitat	Number of samples
Human associated	Gut	51,076
	Skin	19,455
	Oral	10,896
	Ohter human body-site	3,018
	Urogenital	1,204
	Nose	489
Animal associated	Mammal animal	29,918
Animal associated	Non-mammal animal	11,172
Environmental	Building	11,248
	Soil	10,507
	Marine water	6,090
	Lake	4,234
	Plant	3,456
	Freshwater	3,112
	River	2,248
	Milk	1,636
	Sand	968
	Food	780
Other	Other	4,074
Other	Mock	811
Total		177,022

About the `Code` folder

This is an implementation of the Microbiomenetwork. This folder contains all of scripts for Closure, Dijkstra and MST( Minimum-cost Spanning Tree) analysis.

Requirements

g++ (GCC) >= 4.8.5
Python3

Closure

A closure is a set of nodes (microbiomes), in which each microbiome can traverse to any other one by direct or indirect transitions (with finite steps).

a. Compile

g++ closure.cpp -o closure

b. Run

./closure query.out closure.out 0.868

in which "query.out" is the search results from MSE, "closure.out" the closure result and "0.868" is the the statistical threshold of the significant high value to define the direct transition

Dijkstra

Dijkstra algorithm is used to compute the pairwise shortest transition steps of all sample pairs in the main closure.

a. Python Environment

For statistical analysis of the microbiome transition network, the python scripts requires python3 and "igraph" package (https://igraph.org/python/) which can be installed using pip:

pip install python-igraph

b. Run

python get_diameter.py query.out diameter.txt

in which "query.out" is the search results from MSE, the first line of diameter.txt is the diameter (the maximum number of edges in the shortest path between any pair of its nodes) of the microbiome transition network, and the next line is the nodes in the shortest path.

python Dijkstra.py query.out shortest_path

in which "query.out" is the search results from MSE. It will produce two result files, "shortest_path.info" and "shortest_path.value", which respectively includes a matrix represents the shortest path between every pair of nodes in the network and its length. If a pair of nodes are unconnected, it will be represented by "oo" and "inf" in the two files.

MST (Minimum-cost Spanning Tree)

The “microbial dispersal” roadmap can be derived by parsing the Minimum Spanning Tree (MST) of the main closure using the Kruskal algorithm.

a. Compile

 g++ Kruskal.cpp -o Kruskal -std=c++11

b. Run

python graph-query.py query.out sample.graph
Kruskal sample.graph sample.mst
python mst-habitat.py sample.mst meta.txt habitat.graph
Kruskal habitat.graph habitat.mst

in which "query.out" is the search results from MSE;

"sample.graph" is the search-based microbiome network, of which every line shows the start and end node of an edge with its length (similarity of the pair of samples);

"sample.mst" is the first level MST on "sample resolution";

"meta.txt" is the meta-data of samples;

"habitat.graph" is the habitat-based network generated by "sample.mst", in which each node represents one habitat and distance between two habitats is the average distance of all edges that linked the two habitats in the MST;

"habitat.mst" is the second MST on "habitat resolution".

About the `Figure` folder

This folder includes all the data necessary for generating the Figures.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Code		Code
Data		Data
Figures		Figures
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code

Code

Data

Data

Figures

Figures

LICENSE

LICENSE

README.md

README.md

Repository files navigation

A scale-free, fully connected global transition network underlies known microbiome diversity

Description

About the `Data` folder

The distribution of samples among the habitats

About the `Code` folder

Requirements

Closure

Dijkstra

MST (Minimum-cost Spanning Tree)

About the `Figure` folder

About

Releases

Packages

Contributors 2

Languages

License

qibebt-bioinfo/microbiomenetwork

Folders and files

Latest commit

History

Repository files navigation

A scale-free, fully connected global transition network underlies known microbiome diversity

Description

About the Data folder

The distribution of samples among the habitats

About the Code folder

Requirements

Closure

Dijkstra

MST (Minimum-cost Spanning Tree)

About the Figure folder

About

Resources

License

Stars

Watchers

Forks

Languages

About the `Data` folder

About the `Code` folder

About the `Figure` folder