<div style="text-align: center;">
  <span style="font-family: Arial; color: red; font-weight: bold;font-size: 36px;">基于生信分析的肾纤维化关键基因分析</span>
</div>

肾脏纤维化是一个高度动态的过程，涉及炎性细胞浸润、肌成纤维细胞的活化与增殖、细胞外基质的异常合成与沉积、肾脏固有细胞损伤，以及肾小管萎缩、毛细血管减少等众多复杂的学术问题。近年来随着研究不断深入，国内外学者对肾脏纤维化的发病机制的认识虽已取得重要进展，但临床上仍主要采用控制肾功能恶化的危险因素来防治肾脏纤维化，患者预后改善并不十分显著，仍缺乏有效延缓和干预措施。近年来，随着单细胞多组学方法以及高分辨质谱技术发展，正在以前所未有的分辨率和效率解码纤维化的细胞和分子机制，不断改变我们对疾病发病机理的理解，这项“分辨率革命”使得在单细胞水平上无偏见地对细胞状态和类型探索成为可能。
本文通过生物信息学手段对肾纤维化的单细胞测序数据进行分析，通过差异基因的筛选和WCGNA等手段找到一组可能对肾纤维化的疾病进程有关键影响的基因。

我们首先从TCGA数据库下载了一组肾纤维化模型小鼠的单细胞转录组测序的表达钜阵（GSE182256），对数据进行预处理后，对正常小鼠和UUO模型小鼠的基因表达情况以及以及各个细胞簇进行聚类分析。

In [None]:
# environment initialization
rm(list= ls())
library(dplyr)
library(Seurat)
library(patchwork)
library(monocle3)

In [6]:
# load data as seurat object
count = readRDS('GSE182256_Export_counts.rds')
meta = read.csv('GSE182256_Export_Metadata.txt',sep = '\t',row.names = 1)
meta$condition = gsub('\\d','',meta$orig.ident)
obj <- CreateSeuratObject(counts = count,meta.data = meta, project = "renal_fibrosis", min.cells = 3, min.features = 200)

In [None]:
# filter data noise and normalize
obj <- subset(obj, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj, selection.method = "vst", nfeatures = 2000)
all.genes <- rownames(obj)
obj <- ScaleData(obj, features = all.genes)
# dimension reduction utilize PCA and Umap
obj <- RunPCA(obj, features = VariableFeatures(object = obj))
obj <- FindNeighbors(obj, dims = 1:15)
obj <- FindClusters(obj, resolution = 0.5)
head(Idents(obj), 5)
obj <- RunUMAP(obj, dims = 1:15)
DimPlot(obj, reduction = "umap",label=TRUE,split.by = 'condition')
DimPlot(obj, group.by = 'condition',reduction = "umap")


<div style="display:flex; justify-content:center;">
    <img src="Rplot.png" alt="Image 2" width="400px"/>
   
</div>

对UUO和Control组的基因表达作差异分析，结果显示CD PC（collecting duct principal cell, 肾脏集合管细胞）细胞的差异基因数目最多，因此以下研究主要集中在CD PC细胞系中。

In [None]:
# cells are annoted by the metadata uploaded by the author differential analysis  for all cell clusters and 
# check the different gene numbers in each clusters, counting which cell cluster has the most differential genes
obj.markers <- FindAllMarkers(obj, only.pos = FALSE, min.pct = 0.25)
table(obj.markers$cluster)
DimPlot(obj, reduction = "umap",label=TRUE,group.by = 'Cluster',  repel = TRUE,split.by = 'condition')+ NoLegend()
DimPlot(obj, reduction = "umap", label = TRUE,group.by = 'Cluster',  repel = TRUE,pt.size = 0.5) + NoLegend()
diff = obj.markers %>% filter(cluster == '16') 

<div style="display:flex; justify-content:center;">
    <img src="Rplot01.png" alt="Image 2" width="400px"/>
    <img src="Rplot02.png" alt="Image 2" width="600px"/>
</div>

差异基因的火山图

In [None]:
# volcano plot
library(EnhancedVolcano)
EnhancedVolcano(diff,
                lab = rownames(diff),
                x = 'avg_log2FC',
                y = 'p_val',
                title = 'N061011 versus N61311',
                pCutoff = 10e-32,
                FCcutoff = 0.5,
                pointSize = 3.0,
                labSize = 6.0)


<div style="display:flex; justify-content:center;">
    <img src="Rplot10.png" alt="Image 2" width="60%"/>
</div>

对CD PC中差异表达的基因进行GO富集分析

In [None]:
# Perform GO enrichment analysis using the enrichGO function
library(clusterProfiler)
library(org.Mm.eg.db)
ego <- enrichGO(gene = diff$gene,
                OrgDb = org.Mm.eg.db,
                keyType = "SYMBOL",
                ont = "BP",
                pAdjustMethod = "BH",
                pvalueCutoff = 0.05,
                qvalueCutoff = 0.2)
dotplot(ego)

<div style="display:flex; justify-content:center;">
    <img src="renal_fibrosis/图片3.png" alt="Image 2" width="40%"/>
</div>

利用monocle3对CD PC细胞个表达时序图进行可视化。

In [None]:
# Constructing single-cell trajectories
library(SeuratWrappers)
cds  = as.cell_data_set(obj)
cds <- preprocess_cds(cds, num_dim = 50)
cds <- cluster_cells(cds, resolution=1e-5)

cds <- reduce_dimension(cds)
plot_cells(cds, label_groups_by_cluster=FALSE,  color_cells_by = "Cluster")

cds_subset <- choose_cells(cds)
cds_subset <- cluster_cells(cds_subset, resolution=1e-5)
cds_subset <- learn_graph(cds_subset)
plot_cells(cds_subset,
           color_cells_by = "Cluster",
           label_groups_by_cluster=FALSE,
           label_leaves=FALSE,
           cell_size = 0.55,
           group_label_size = 3,
           label_branch_points=FALSE)

<div style="text-align:center">
    <img src="Rplot05.png" width="40%">
</div>

利用scWGCNA对差异基因之间相关性进行分析，可以分为4个基因模块，对4模块在细胞上的表达进行分析，可以看出模块3的基因和CD PC细胞高度重合 ，具有一定的特异性。

In [None]:
# perform scWGCNA
subobj = subset(x = obj, subset = (condition == "UUO"|Cluster == "CD PC"))
subobj <- subset(subobj, features = diff$gene)
library(scWGCNA)
# Calculate the pseudocells
subobj <- JackStraw(subobj, num.replicate = 100)
subobj <- ScoreJackStraw(subobj, dims = 1:20)
subobj <- RunUMAP(subobj, dims = 1:15)
subobj <- RunTSNE(subobj, dims = 1:15)
pseudocell = calculate.pseudocells(s.cells = subobj, # Single cells in Seurat object
                                          seeds=0.2, # Fraction of cells to use as seeds to aggregate pseudocells
                                          nn = 10, # Number of neighbors to aggregate
                                          reduction = "PCA" ,# Reduction to use
                                          dims = 1:15) # The dimensions to use
scWGCNA = run.scWGCNA(p.cells = subobj, # Pseudocells (recommended), or Seurat single cells
                                 s.cells = subobj, # single cells in Seurat format
                                 is.pseudocell = F, # We are using single cells twice this time
                                 features = rownames(subobj)) # Recommended: variable genes
scW.p.dendro(scWGCNA.data = scWGCNA)

scW.p.expression(s.cells = subobj, # Single cells in Seurat format
                 scWGCNA.data = scWGCNA, # scWGCNA list dataset
                 modules = "all", # Which modules to plot?
                 reduction = "umap", # Which reduction to plot?
                 ncol=2) # How m


<div style="display:flex; justify-content:center;">
    <img src="Rplot09.png" alt="Image 1" width="300" style="margin-right: 20px;"/>
    <img src="Rplot13.png" alt="Image 2" width="300"/>
</div>

对module3的基因进行富集分析，发现主要通路是和肾代谢相关：

<div style="display:flex; justify-content:center;">
    <img src="renal_fibrosis/图片2.png" alt="Image 2" width="300"/>
</div>

FDRdb是一个手动注释的纤维化疾病相关的RNA组学数据库，该数据库初始版本包含 8 个物种的 912 个 RNA 与 92 种纤维化疾病之间的 1947 个关联；收集 764 个纤维化疾病数据集的信息。我们从FDRdb下载了和肾纤维化相关的基因232个，和module4的212个基因取交集后，共得到以下8个关键基因："Ackr3"   "Cav1"    "Col18a1" "Fos"     "Hif1a"   "Lgals3"  "Thbs1"   "Dsp"。

<div style="display:flex; justify-content:center;">
    <img src="/home/ma/image1.png" alt="Image 233" width="300"/>
</div>

基因表达热图可以看出8个基因主要在CD PC中表达。

<div style="display:flex; justify-content:center;">
    <img src="renal_fibrosis/图片4.png" alt="Image 2" width="70%"/>
</div>

GeneNet分析GeneNet 是协方差矩阵的线性收缩估计器，然后基于从收缩估计器获得的部分相关性进行高斯图形模型 (GGM) 选择。通过使用本地错误发现率的多重测试程序，
GGM 选择将错误发现率控制在预定水平以下。

<div style="display:flex; justify-content:center;">
    <img src="renal_fibrosis/图片5.png" alt="Image 2" width="30%"/>
</div>

以及利用Stringdb的蛋白互作网络分析对这几个基因的蛋白表达水平进行预测。

<div style="display:flex; justify-content:center;">
    <img src="renal_fibrosis/图片6.png" alt="Image 2" width="50%"/>
</div>