Skip to content

A scale-free, fully connected global transition network underlies known microbiome diversity

License

Notifications You must be signed in to change notification settings

qibebt-bioinfo/microbiomenetwork

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A scale-free, fully connected global transition network underlies known microbiome diversity

Description

Microbiomes are inherently linked by their structural similarity, yet the global patterns and features of such similarity are not clear. Here we propose a search-based microbiome transition network to probe the microbiome similarity globally. By traversing a composition-similarity based network of 177,022 microbiomes, we show that although their compositions are distinct by habitat, each microbiome is on-average only seven neighbors from any other microbiome on Earth, indicating the inherent homology of microbiome at global scale. This network is scale-free, suggesting a high degree of stability and robustness in microbiome transition. By tracking the minimum spanning tree in this network, a global roadmap of microbiome dispersal was derived that tracks the potential formulation of microbial diversity. Such search-based global microbiome networks, reconstructed within hours on just one computing node, provide a readily expanded reference for tracing the origin and evolution of existing or new microbiome datasets.

About the Data folder

Microbiome Search Engine (MSE) is a microbiome database platform for searching query microbiomes against the global metagenome data space based on the whole-community-level similarity using Meta-Storms algorithm and it contains 177,022 samples in total. We consider that direct transition possibly exists between sample pairs with significant similarities that cause permutation p-value < 0.01, so that the Meta-Storms similarity of 0.868 is defined as the threshold for direct transition between microbiomes. The search-based microbiome network is built using MSE which can be freely accessible as an online service via http://mse.ac.cn.

For each sample of the input 177,022 microbiomes, we searched it against all other samples for the top 100 matches and connected it with the matched samples that have similarity higher than the threshold of direct transition (0.868), whose output file is "query.out". Moreover, for standalone searches of customized microbiome databases, the kernel and tutorial of MSE are provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms).

The meta-data of the 177,022 samples is available meta-data.

The distribution of samples among the habitats

Sample type Habitat Number of samples
Human associated Gut 51,076
Skin 19,455
Oral 10,896
Ohter human body-site 3,018
Urogenital 1,204
Nose 489
Animal associated Mammal animal 29,918
Non-mammal animal 11,172
Environmental Building 11,248
Soil 10,507
Marine water 6,090
Lake 4,234
Plant 3,456
Freshwater 3,112
River 2,248
Milk 1,636
Sand 968
Food 780
Other Other 4,074
Mock 811
Total 177,022

About the Code folder

This is an implementation of the Microbiomenetwork. This folder contains all of scripts for Closure, Dijkstra and MST( Minimum-cost Spanning Tree) analysis.

Requirements

  • g++ (GCC) >= 4.8.5
  • Python3

Closure

A closure is a set of nodes (microbiomes), in which each microbiome can traverse to any other one by direct or indirect transitions (with finite steps).

a. Compile

g++ closure.cpp -o closure

b. Run

./closure query.out closure.out 0.868

in which "query.out" is the search results from MSE, "closure.out" the closure result and "0.868" is the the statistical threshold of the significant high value to define the direct transition

Dijkstra

Dijkstra algorithm is used to compute the pairwise shortest transition steps of all sample pairs in the main closure.

a. Python Environment

For statistical analysis of the microbiome transition network, the python scripts requires python3 and "igraph" package (https://igraph.org/python/) which can be installed using pip:

pip install python-igraph

b. Run

python get_diameter.py query.out diameter.txt

in which "query.out" is the search results from MSE, the first line of diameter.txt is the diameter (the maximum number of edges in the shortest path between any pair of its nodes) of the microbiome transition network, and the next line is the nodes in the shortest path.

python Dijkstra.py query.out shortest_path

in which "query.out" is the search results from MSE. It will produce two result files, "shortest_path.info" and "shortest_path.value", which respectively includes a matrix represents the shortest path between every pair of nodes in the network and its length. If a pair of nodes are unconnected, it will be represented by "oo" and "inf" in the two files.

MST (Minimum-cost Spanning Tree)

The “microbial dispersal” roadmap can be derived by parsing the Minimum Spanning Tree (MST) of the main closure using the Kruskal algorithm.

a. Compile

 g++ Kruskal.cpp -o Kruskal -std=c++11

b. Run

python graph-query.py query.out sample.graph
Kruskal sample.graph sample.mst
python mst-habitat.py sample.mst meta.txt habitat.graph
Kruskal habitat.graph habitat.mst

in which "query.out" is the search results from MSE;

"sample.graph" is the search-based microbiome network, of which every line shows the start and end node of an edge with its length (similarity of the pair of samples);

"sample.mst" is the first level MST on "sample resolution";

"meta.txt" is the meta-data of samples;

"habitat.graph" is the habitat-based network generated by "sample.mst", in which each node represents one habitat and distance between two habitats is the average distance of all edges that linked the two habitats in the MST;

"habitat.mst" is the second MST on "habitat resolution".

About the Figure folder

This folder includes all the data necessary for generating the Figures.

About

A scale-free, fully connected global transition network underlies known microbiome diversity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published