DEG.csv也可以匯入成為node的資訊
也可以試試看匯入correlation_matrix.csv

顯示細節：http://manual.cytoscape.org/en/stable/Rendering_Engine.html

Demo:
* 用logFC當作node顏色或是node大小
* 用pearson correlation值當作edge顏色或是edge粗細
* 用network analyzer
* 安裝clustermaker2:
    * Network clustering: 用MCODE clustering
    * Attribute clustering: 用pearson correlation

In [2]:
import pandas as pd
import numpy as np

In [15]:
data = pd.read_csv("../rna-seq/GTEx_vs_TCGA_raw_counts.csv", index_col=0)

# Only use GBM samples
data = data[data.types == "GBM"]
data = data.iloc[:, :-1]

# Only use genes with top 100 mean expression
# data = data.loc[:, data.mean().sort_values()[-100:].index]

# Select Top100 DEGs
deg = pd.read_csv("DEG.csv", index_col=0)
genes = deg.index.tolist()
data = data.loc[:, data.columns.isin(genes)]

# # Normalize to CPM and log-transformation
data = np.log2((data / data.sum()) * 1e6 + 1).dropna(axis=1)

In [16]:
corr = data.corr("pearson")

In [17]:
corr.head()

Unnamed: 0,CYP4Z1,GPR89C,HIST2H4A,SUMO1P3,ZBTB37,TNNT2,RBM34,EDARADD,C2orf74,MRPL30,...,CACNG8,ZCCHC3,P2RX6P,MIF,SSTR3,MAGED4B,MAGED4,RPL36A,FAM45B,PRRG3
CYP4Z1,1.0,0.078526,-0.077046,0.066479,0.108911,0.483178,0.087035,-0.016208,0.183141,0.068031,...,0.170357,0.04964,0.036492,0.005079,0.334767,0.036846,0.053851,0.118667,0.091158,0.239558
GPR89C,0.078526,1.0,0.190479,0.453393,0.213195,0.060721,0.519603,0.350566,0.305362,0.508332,...,0.093702,0.417729,0.128385,0.241792,0.026675,0.291736,0.291665,0.368789,0.332923,0.076308
HIST2H4A,-0.077046,0.190479,1.0,0.410623,0.255993,-0.08349,0.29599,0.188835,0.369818,0.329065,...,-0.176487,0.135914,0.084262,0.295933,-0.065087,0.043636,0.125632,0.163556,0.256743,0.021686
SUMO1P3,0.066479,0.453393,0.410623,1.0,0.399124,0.105301,0.829968,0.401466,0.423572,0.815887,...,0.034511,0.503121,0.102779,0.472222,0.023395,0.233421,0.257943,0.433359,0.610715,0.102984
ZBTB37,0.108911,0.213195,0.255993,0.399124,1.0,0.087558,0.465131,0.30181,0.162819,0.417913,...,0.319454,0.428504,-0.085268,0.134048,0.168158,0.383853,0.208058,0.201319,0.668006,0.208135


In [26]:
corr.to_csv("correlation_matrix.csv", index=True)

In [18]:
from itertools import combinations
colnames = list(combinations(corr.columns, 2))

In [19]:
len(colnames)

4753

In [20]:
corr_list = pd.DataFrame(colnames)

In [21]:
flatten = corr.values[np.triu_indices(corr.shape[0], k=1)]

In [22]:
flatten.shape

(4753,)

In [23]:
corr_list["pearson_correlation"] = flatten

In [24]:
corr_list.columns = ["Gene_i", "Gene_j", "pearson_correlation"]

In [25]:
corr_list.to_csv("correlation_list.csv", index=False)

# References

* WGCNA: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559
* Using RNA-seq data in WGCNA: https://www.biostars.org/p/280650/