gmcNet: gene module clustering Network in WGCNA

Model summary

To identify desired gene module in WGCNA, we proposed the gmcNet. gmcNet is a GNN-based clsutering algorithm, which can cluster genes according to the co-expression topology (genes in the same module should be strongly connected) and to the single-level expression (genes in the same module should have similar expression patterns). The key innovation of gmcNet is incorporating the single-expression of genes with co-expression of their neighbor genes.

Model Input

gmcNet requries four inputs to implement unsupervised clustering. Let, $n$ is the number of genes and $m$ is the number of expression sample.

$\textbf{X}\in\mathbb{R}^{n \times m}$ : Single-expression features of $n$ genes.
$\textbf{T}\in\mathbb{R}^{n \times n}$ : Topological overlap matrix, which is created using the topological overlap measure between $n$ genes.
$\textbf{T}_\textbf{p}\in\mathbb{R}^{n \times n}$ : Topological overlap matrix, which is created only with gene pairs of positive correlation coefficient.
$\textbf{T}_\textbf{n}\in\mathbb{R}^{n \times n}$ : Topological overlap matrix, which is created only with gene pairs of neagtive correlation coefficient.

Network structure

gmcNet includes a co-expression pattern recognizer (CEPR) and module classifier.

CEPR : With massage passing operation, CEPR generates the embedding feature $\bar{\textbf{X}}\in\mathbb{R}^{n \times m'}$ , which accounts for single-epxression and two diffrent co-expressions in $m'$ dimension.

Module classifier : Given CEPR-embedding feature $\bar{\textbf{X}}$ , the module classifier computes module-assignment probability $\textbf{M}\in\mathbb{R}^{n \times k}$ using a multi-layer perceptron (MLP), where $k$ is the number of modules. Finally, $i$ th-row of $\textbf{M}$ corresponds to module assifnment probability of gene $i$ . In other words, gene $i$ belongs to module $c$ if $\textbf{M}_{ic}$ is the maximum value of the $i$ th-row of $\textbf{M}$ .

Implementation

1. Preparing

our models were implemented by tensorflow 2.3 in Python 3.8.6

1.1. Requirements

Requirements can be installed through the following command in your shell.

pip install -r [CODE PATH]/requirements.txt

1.2. Input Data

expr : gene expression data. A text file with a header line, and then one line per sample with $m$ +1 columns. The first column is gene name and others are $m$ expression values. An example file format is in data folder as sample.txt.

TOM (optional) : If you already created TOM through the R library WGCNA, you can use them for gmcNet. The three TOMs ( $\textbf{T}$ , $\textbf{T}_\textbf{p}$ , $\textbf{T}_\textbf{n}$ ), required to implement gmcNet, must be located in one folder with the name of (whole.txt, positive.txt, negative.txt), repectively. TOM files must include $n$ -rows and $n$ -columns, and then the $j$ th-column of $i$ th-row is the topological overlap measure of gene $i$ and $j$ . You can find an example files in out/TOMs folder.

1.3. Configuration

Before excute gmcNet, you shuld set the configuration at main.py.

'betas' :  smoothing parameter for (whole, positive, negative) networks
'save_TOM' : save TOM or not in output path
'save_embed' : save embedding features or not in output path
'n_cluster' : number of cluster (k)
'epochs' : trainning epochs
'lr' : trainning learning rate
'mp_layers' : number of message passing layers
'CEPR_features' : CEPR_embedding demesions
'lambda' : balancing hyper-parameter
'Lo_thr' : orthogonal threshold
'tune_epoch' : first tunning epochs, which prevent the empty modules
'tune_lr' : learning rate for first tunning
'device' : used GPU device. if you don't use GPU, then write False

2. Execution

2.1. Without TOM file

python main.py --expr [expr] --out [out]

[expr] : expr file path.
[out] : Path for saving the results.

2.2. With TOM file

python main.py --expr [expr] --TOM [TOM] --out [out]

[expr] : expr file path.
[TOM] : Path for TOM folder including three diffrent TOM files (whole.txt, positive.txt, negative.txt).
[out] : Path for saving the results.

2.3. Example-Without TOM file

python main.py --expr data/sample.txt --out out

2.4. Example-With TOM file

python main.py --expr data/sample.txt --TOM out/TOMs --out out

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
out		out
LICENSE		LICENSE
README.md		README.md
gmcNet.py		gmcNet.py
main.py		main.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

out

out

LICENSE

LICENSE

README.md

README.md

gmcNet.py

gmcNet.py

main.py

main.py

requirements.txt

requirements.txt

utils.py

utils.py

Repository files navigation

gmcNet: gene module clustering Network in WGCNA

Model summary

Model Input

Network structure

Implementation

1. Preparing

1.1. Requirements

1.2. Input Data

1.3. Configuration

2. Execution

2.1. Without TOM file

2.2. With TOM file

2.3. Example-Without TOM file

2.4. Example-With TOM file

About

Releases

Packages

Languages

License

gywns6287/gmcNet

Folders and files

Latest commit

History

Repository files navigation

gmcNet: gene module clustering Network in WGCNA

Model summary

Model Input

Network structure

Implementation

1. Preparing

1.1. Requirements

1.2. Input Data

1.3. Configuration

2. Execution

2.1. Without TOM file

2.2. With TOM file

2.3. Example-Without TOM file

2.4. Example-With TOM file

About

Resources

License

Stars

Watchers

Forks

Languages