-
Notifications
You must be signed in to change notification settings - Fork 1
/
rbioapi_do_enrich.Rmd
450 lines (332 loc) · 37.7 KB
/
rbioapi_do_enrich.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
---
title: "3.A: Do with rbioapi: Enrichment (Over-Representation) Analysis in R"
author: "Moosa Rezwani"
description: >
Enrichment (Over-Representation) Analysis in R. Perform enrichment analysis using Enrichr, miEAA, PANTHER, Reactome, and STRING with rbioapi package.
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
anchor_sections: true
number_sections: true
self_contained: true
dev: png
encoding: 'UTF-8'
vignette: >
%\VignetteIndexEntry{3.A: Do with rbioapi: Enrichment (Over-Representation) Analysis in R}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE, setup, echo=FALSE, results="hide"}
knitr::opts_chunk$set(echo = TRUE,
eval = TRUE,
message = FALSE,
warning = FALSE,
collapse = TRUE,
tidy = FALSE,
cache = FALSE,
dev = "png",
comment = "#>")
```
Among the databases and web services currently supported by rbioapi [Enrichr](https://maayanlab.cloud/Enrichr/ "Enrichr"), [miEAA](https://ccb-compute2.cs.uni-saarland.de/mieaa2 "miRNA Enrichment Analysis and Annotation Tool (miEAA)"), [PANTHER](http://www.pantherdb.org "Protein Analysis THrough Evolutionary Relationships (PANTHER)"), [Reactome](https://reactome.org/), and [STRING](https://string-db.org/ "STRING: Protein-Protein Interaction Networks Functional Enrichment Analysis") offer Enrichment (Over-Representation) analysis API endpoints. In this article, after a brief introduction, we will demonstrate how to perform Enrichment (Over-Representation) analysis in R, consistently and easily, using the package rbioapi.
Before continuing, note that rbioapi is an interface to the databases and web services. Hence it is the user's responsibility to cite the services they have used. We provide suggested citations in each function's manual. However, it is the user's responsibility to fully and adequately cite the databases or web services used in their published research.
------------------------------------------------------------------------
# How to install? {#how-to-install .heading2}
You can install the stable release version of [rbioapi](rbioapi.html "rbioapi: User-Friendly R Interface to Biologic Web Services' API") from [CRAN](https://cran.r-project.org/package=rbioapi "rbioapi page on CRAN (The Comprehensive R Archive Network)") with:
```{r install_cran, eval=FALSE}
install.packages("rbioapi")
```
However, the CRAN version is released at most once every 1-2 months, You can install the most recent (development) version from [GitHub](https://github.com/moosa-r/rbioapi/ "rbioapi repository on GitHub") with:
```{r install_github, eval=FALSE}
install.packages("remotes")
remotes::install_github("moosa-r/rbioapi")
```
Now, we can load the package:
```{r load_rbioapi, echo=TRUE}
library(rbioapi)
```
```{r prevent_vignette_errors, message=FALSE, warning=FALSE, include=FALSE}
rba_options(timeout = 30, skip_error = TRUE)
```
------------------------------------------------------------------------
# What is Over-representation analysis? {#what-is-over-representation-analysis .heading2}
Enrichment (Over-Representation) analysis is commonly performed when one's analysis yields a list of genes, proteins, or other entities. We will not delve into details here and refer you to [D. W. Huang et al.: Nucleic Acids Res. 37, 1 (2009)](https://doi.org/10.1093/nar/gkn923 "Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists") for a thorough review of the subject.
In short, to perform such analysis, in addition to some algorithms, two data are required:
1) A gene list which is derived from your analysis.
2) Gene sets, where genes are grouped based on a shared property or concept (such as pathway, role in diseases, physical location, etc.).
With over-representation analysis, you can compare your gene list to find if the member of any gene set are over-represented, meaning that they are present in your list in a manner that can not be explained by chance alone.
# Enrichment (Over-Representation) analysis in R with rbioapi {#enrichment-analysis-in-R-with-rbioapi .heading2}
Many web services supported by rbioapi provide tools to perform over-representation analysis against the gene sets curated in their databases. Among the databases and web services currently supported by rbioapi, [Enrichr](https://maayanlab.cloud/Enrichr/ "Enrichr"), [miEAA](https://ccb-compute2.cs.uni-saarland.de/mieaa2 "miRNA Enrichment Analysis and Annotation Tool (miEAA)"), [PANTHER](https://www.pantherdb.org "Protein Analysis THrough Evolutionary Relationships (PANTHER)"), [Reactome](https://reactome.org/), and [STRING](https://string-db.org/ "STRING: Protein-Protein Interaction Networks Functional Enrichment Analysis") provide such services. Each service has a dedicated vignette article in rbioapi. Here we will focus on performing Enrichment (Over-Representation) analysis.
As a demonstration, we will use the results of [Galani et al.: Nat. Immunol. 22, 32 (2021)](https://doi.org/10.1038/s41590-020-00840-x "Untuned antiviral immunity in COVID-19 revealed by temporal type I/III interferon patterns and flu comparison") where they have compared the gene expression of critically ill and non-critically ill COVID patients to healthy individuals. We assign a variable with 1214 differentially expressed genes (DEGs) exclusively induced in critically ill COVID patients. Next, we will see what insights we can extract from this list.
```{r define var, echo=TRUE, message=TRUE}
covid_critical <- c("TXNDC5", "GABRR2", "MGAM2", "LOC200772", "LYPD2", "IFI27", "RPH3A", "OTOF", "NBPF8", "CLEC4F", "CNGB1", "HIST1H2BF", "HIF1A", "SDC1", "TUBB8", "FBXO39", "TPSB2", "CD177", "LRRN3", "EBLN2", "PCSK9", "ELK2AP", "UCHL1", "C22orf15", "LPO", "C3orf20", "CLRN1-AS1", "GPR75", "CA12", "RAB19", "CHRFAM7A", "CRYGN", "DLGAP5", "BTBD8", "LOC100272216", "PRG3", "CYP46A1", "LOC102723604", "PPAP2B", "C4BPA", "SPESP1", "LILRP2", "UBE2Q1-AS1", "MIR3945", "NOMO3", "MEG3", "LOC400927-CSNK1E", "MIR6732", "MIR590", "PPP1R3G", "PYCR1", "ARHGAP42", "MMP8", "HMMR", "P3H2", "ACER1", "NOG", "RAB39A", "ANTXRLP1", "LINC00266-3", "GPRC5D", "MCM10", "TSPY26P", "ANKRD36BP1", "GBP1P1", "PRL", "CYP1A1", "KIF4A", "LOC102724323", "SERPINB10", "GSTA4", "TRIM51", "MIXL1", "RNASE1", "CASC8", "MAOA", "XCL1", "ADAMTS2", "LOC101929125", "DCANP1", "BHLHA15", "ANOS1", "SLC18A1", "CCDC150", "CAV1", "SH3BP5", "LINC00398", "NCOA2", "SPATC1", "SHROOM2", "GPR27", "LRRC26", "RNF169", "USP3-AS1", "VWA7", "ROCK2", "FSTL4", "METTL7B", "CYP4F29P", "LEF1-AS1", "HORMAD2-AS1", "FBXO15", "PPTC7", "TYMS", "PPP4R2", "ZNF608", "FAM46B", "PCSK1N", "LINC00623", "CASC5", "ZNF224", "DENND2C", "WDR86", "PTGR1", "SPATA3-AS1", "LOC101927412", "KIF14", "MMP28", "PBK", "VMO1", "ADCY3", "HIST1H2BO", "FTO-IT1", "MIR342", "FANK1", "CRIP2", "TIAF1", "LOC344887", "OLFM4", "MKNK1-AS1", "ZDHHC19", "SEPT14", "EPSTI1", "FOXC1", "MIR616", "KRT72", "LINC01347", "LOC101928100", "KIAA0895", "BOK", "HIST1H2AI", "DBH-AS1", "ADORA2A-AS1", "MED12L", "SAMD15", "TARM1", "SMTNL1", "POU5F1P3", "LINC00968", "OAS3", "LOC103091866", "SH3RF3-AS1", "NBPF10", "APAF1", "SLC2A14", "SYT17", "ETV3", "SHROOM4", "AOAH-IT1", "NAIP", "ALAS2", "GLIS3", "ADAM17", "OLFM1", "PCAT29", "TNFRSF18", "DNASE1L3", "IGF2BP3", "LINC01271", "AP3B2", "TXNDC2", "CEP55", "SIGLEC1", "RAB3IL1", "PLD4", "KIFC1", "LINC00487", "ABHD12B", "ITGA7", "GJB6", "CARD14", "LRRN2", "MPO", "KLRC3", "LOC100507487", "SCGB3A1", "CD38", "LRRN1", "SYCP2L", "ANLN", "ASPM", "OAS1", "IFI44L", "CDCA3", "HLA-DQB2", "ANO9", "NUDT11", "HMP19", "DEPDC1", "GPR84", "PLEKHF1", "PI16", "RDH5", "TMED8", "LINC00824", "SEPT4", "CLIC3", "B4GALNT3", "OLAH", "ITGA1", "FOXI1", "LOC100506142", "CDC20", "HAGHL", "GTSCR1", "B3GALNT1", "SOCS3", "PCDH1", "TAS2R20", "CDC25A", "NPDC1", "LOC100505915", "XCL2", "TIAM2", "LOC100288069", "IL34", "IL18R1", "CYYR1", "ZNF888", "FAM20A", "MDS2", "ABCA13", "KCNJ1", "SLC4A9", "EXO1", "LAIR2", "IQSEC3", "SCT", "SLC26A8", "ESCO2", "ZDHHC1", "SNORA63", "FBLN5", "PRUNE2", "CPNE7", "CDHR2", "GNLY", "APOBEC3B", "MFSD9", "SYNGR3", "PTGDS", "COL17A1", "TBC1D14", "AIM2", "TMEM204", "FAM157B", "ZBED6", "EME1", "ATF3", "KBTBD4", "LUC7L2", "KIF20A", "LCNL1", "DOCK9-AS2", "KCNE1", "BTN2A3P", "KL", "IDI2-AS1", "EBF4", "SCARNA21", "CEACAM6", "KLRB1", "C5orf58", "ASXL2", "RPLP0P2", "CYTL1", "DIAPH3", "DOK7", "RFFL", "KIAA1107", "TAS2R40", "CCDC186", "COL6A2", "METRN", "SNHG25", "RRM2", "CYP1B1", "NEURL1", "MATK", "SLC28A3", "JCHAIN", "TNFRSF4", "FXYD1", "PLLP", "ARHGAP23", "SNORA4", "MYO18A", "ZNF429", "NRN1", "HJURP", "TCN1", "CDC6", "ATP5EP2", "SOAT2", "LOC101928034", "EIF2AK2", "ARG1", "SLC16A11", "TPPP3", "TMEM38A", "TOP2A", "LINC00999", "DGCR9", "RCAN3AS", "CACNG6", "LINC01550", "TDRD9", "CARD17", "GBP6", "LY6E", "RSAD2", "LOC100506258", "PRKAR2A", "LTF", "IGLL5", "FAM157A", "LOC101927018", "CCR7", "FLJ42351", "IFIT3", "GPRIN1", "ANXA9", "TMEM119", "PARGP1", "A3GALT2", "CYP4F22", "PLBD1-AS1", "SELM", "NCR3", "PNPLA1", "BMX", "LOC440461", "GAMT", "CEACAM8", "TP53I13", "C1orf226", "SKA3", "DHRS3", "VRK2", "BTBD19", "ETNK2", "LOC728323", "NRIR", "ADORA2A", "GPR162", "CMPK2", "USP18", "P4HA2", "TSSK4", "EP300", "CLEC4D", "LCN10", "GPR141", "SH2D2A", "GOLGA7B", "TMIGD2", "DLL4", "HP", "CXCR6", "MAL", "C12orf57", "CLIC5", "IL4", "IGFBP6", "ERG", "HERC3", "KEL", "MSX2P1", "INE1", "PRKCQ-AS1", "FHIT", "SLC1A7", "KIAA1958", "SARDH", "PFKFB3", "SNORD89", "IFI44", "DDX60", "TMEM238", "HAR1A", "EGR1", "VNN1", "TRIM9", "TAF13", "AP3S2", "TMEM56", "KLRC2", "CACNA1C-AS2", "ALX3", "FCGBP", "CD247", "ALDH1L2", "HIST1H2AC", "RTP5", "PPARG", "AMPH", "LINC00861", "CDKL5", "MYBL2", "LOC101927051", "TLR5", "TMEM121", "BMP8B", "AK5", "RBP5", "LINC01355", "PITRM1-AS1", "CMTM1", "BIRC5", "C10orf10", "TNFRSF25", "ZAP70", "LOC105373383", "DSP", "WDR86-AS1", "RPLP2", "KREMEN1", "LOC101927550", "PDZD4", "LOC100130451", "MKI67", "LOC728743", "CXCL10", "LINC01547", "LOC645513", "ATOH8", "S1PR5", "KISS1R", "MELK", "TCEA3", "SLC22A20", "RAMP1", "FOLR2", "GGT6", "CACNA1E", "FABP6", "RAP2C-AS1", "PRRG4", "FAM63B", "BLK", "P3H3", "WIPF2", "TROAP", "FAHD2CP", "CA6", "LINC00892", "LRCH3", "BUB1B", "LOC100996286", "NT5E", "PASK", "BATF2", "TTC39C-AS1", "ACTA2", "CHIT1", "LAMC1", "TNFAIP3", "ANKRD22", "C12orf42", "SPON2", "SLX4IP", "TNNC1", "ZNF771", "SPEG", "HIST1H4H", "HTR6", "TNFRSF17", "FSD1", "LINC00266-1", "CD6", "PRSS30P", "SFTPD", "COPZ2", "BPI", "CCIN", "CDK1", "ATP2C2", "GOLGA8H", "USP44", "SLC14A1", "ZDHHC14", "SNORD50A", "LTK", "KCNG2", "MAPK11", "ACOT4", "CTSW", "C1orf106", "CDKN3", "UPB1", "CAMK2N1", "RBM5-AS1", "DNLZ", "GZMM", "PLA2G7", "ZCCHC2", "PRSS41", "RPL13", "OLR1", "BCAS4", "EPB41", "TRABD2A", "PBX4", "GZMK", "PNPLA7", "B4GALT5", "ARHGEF17", "APOA1", "GPA33", "KLRG1", "GBP5", "LOC102723701", "CENPA", "LOC285696", "KCNK5", "SUSD4", "RPS28", "ESPL1", "ITGB4", "SPAG16", "CYP4F35P", "CD3D", "BMS1P20", "CBS", "ETV2", "SPATS2L", "DUSP13", "FAM19A1", "VSIG10L", "CD2", "NHSL2", "FAAH2", "FXYD7", "NTN5", "FICD", "GLDC", "LOC101927865", "LINC00944", "PTPRCAP", "GNB3", "TRIM22", "PRRT1", "NR1I3", "BTN3A1", "STX16-NPEPL1", "TMEM191A", "SERPINB2", "UST", "GALNT12", "BUB1", "GTF2I", "FOSL2", "COL13A1", "RGS9", "NCALD", "DNAH10", "CA11", "CKB", "HSPB1", "CDC45", "ATP7B", "WNT5B", "ZNF699", "PRRT4", "GALNT14", "ZNF319", "DNAH17", "LOC283710", "CPEB4", "LY6G6C", "PPAN", "WASH5P", "HIST1H3H", "IL32", "TEPP", "CNR1", "YJEFN3", "FAM159A", "FGFBP2", "FKBP5", "BIN1", "VPREB3", "E2F8", "HK3", "CDH2", "HFE", "BEAN1-AS1", "KIFC3", "HELB", "HLA-DPB1", "GZMH", "LMNB1", "CC2D2A", "IQCH-AS1", "CTSF", "TMEM132D", "LEF1", "REL", "CBR3", "WNT10B", "LOC100289473", "APBA2", "KIF23", "LOC646471", "PNMA6A", "EPC2", "HBEGF", "S100B", "C14orf132", "KCND1", "SIPA1L2", "NCAPG", "OAS2", "C3AR1", "CSF1", "CHCHD6", "AUTS2", "IL24", "CDC42EP2", "LTBP3", "PXT1", "ADAMTS10", "BLZF1", "TPST1", "ID3", "GYG1", "EFCAB2", "MYO10", "TLE2", "SLC30A1", "CCNI", "WNT7A", "RBL2", "ERICH6-AS1", "IGFBP3", "DTL", "SORBS3", "RPS19", "SMA4", "RPS27", "BEAN1", "MIRLET7BHG", "ABCA1", "ZKSCAN7", "GPR34", "GPR153", "TRIM52", "GRASP", "RNASE2", "CENPF", "ACTG1P20", "MMP9", "ZFP82", "RTN1", "RPS21", "PCBP1-AS1", "PVRL3", "RPS10", "FER1L4", "SEPT1", "DEPDC1B", "LINC00926", "SEC14L2", "LOC100420587", "PCSK4", "HPS3", "KIR3DL1", "SPRN", "HRH2", "FAHD2B", "P3H4", "TMEM160", "ENGASE", "RAD51AP1", "ZNF233", "RAP1GAP", "TTBK2", "PINK1", "SH2D1B", "BAIAP2L1", "BECN1", "RPL13A", "INHBB", "HPCAL4", "TPM2", "ACVR2B", "CEP126", "RPL18", "B3GNT7", "ROBO3", "MYOM2", "SLC51A", "CCL5", "PROSER3", "RPL18A", "SERINC4", "ATP11B", "SMARCD3", "STMN3", "FAM173A", "CDK5RAP2", "GGH", "EIF4G3", "C18orf32", "SHCBP1", "CCM2L", "LAT", "PRF1", "ARL15", "WHAMMP1", "NBN", "ARHGEF11", "ZWINT", "BTBD11", "MCOLN2", "EYS", "GRAP", "TAP1", "FAM157C", "ANXA3", "MCEMP1", "TCF7", "IGSF9B", "SDSL", "LNPEP", "NSUN5P1", "FAM110B", "SPNS3", "ACTL10", "GCSAM", "LOC101927482", "UGCG", "CD24", "SCO2", "GMCL1", "SAMD12", "NUSAP1", "WNT1", "NME3", "LOC728084", "RGS18", "PNMA3", "KLRF1", "ENO2", "SUPT3H", "ATP5D", "ZC3H13", "PSTPIP2", "SNORD81", "SESN2", "TK1", "LHFPL2", "SCARNA10", "SERPING1", "SYT2", "SNORA33", "JAK2", "CYB5D1", "SH3RF3", "ARL14EP", "UBASH3A", "ARL5B", "SPRED2", "NELL2", "PPP2R2B", "CDH12", "WDR54", "KLF4", "MTUS1", "ELF1", "CARNS1", "CLEC10A", "CD3E", "TTK", "KBTBD7", "TPX2", "ZNF69", "AICDA", "TAMM41", "LOC100288152", "AK1", "MMP24", "SAMD9L", "RAD51", "ACE", "COLGALT2", "NEDD4", "SPEF2", "CENPE", "CRISPLD2", "TSEN54", "RCVRN", "FRMD3", "SKAP1", "CYP4F3", "TMEM161B-AS1", "KANSL1L", "SLC29A2", "CEP97", "XAF1", "CDKN1C", "PRDM5", "HBP1", "CACNG8", "RP2", "OASL", "NPIPA1", "ZNF354A", "RPL32", "GLTSCR2", "CD52", "SBK1", "ZNF703", "GBP1", "C4orf29", "ERV3-1", "ABCC2", "EPB41L5", "FFAR3", "PLEKHB1", "LOC100507387", "RPL36", "PRC1", "C19orf60", "PRSS23", "CD160", "HOPX", "SAMD10", "RPSAP9", "CDCA2", "SWT1", "NPEPL1", "RORC", "MS4A3", "BCL2L15", "CXCR5", "EPHX2", "B3GNT9", "CDCA7L", "LMTK2", "UBAP1L", "CD79B", "OBSCN", "TMEM102", "ZNF540", "SPP1", "HABP4", "LOC100130872", "APBB1", "GPC2", "CD1C", "LONRF3", "NR2C2", "NSUN7", "SIGIRR", "SNORD38B", "FAM65C", "HELZ2", "ARL4D", "ENO3", "RPL9", "IFT81", "LSMEM2", "SCARNA2", "CD5", "AHNAK2", "RPL27A", "SOX8", "TMEM161A", "ZNF81", "PXK", "LOC441081", "RPS26", "CCNB1", "PLXDC1", "NFKBIA", "PKMYT1", "SLC27A5", "LRRC4", "TARP", "MAP2K6", "EIF1B-AS1", "SCARNA17", "NUDT8", "EAF1", "TBC1D8", "SESTD1", "CLEC12B", "ZNF776", "PDCD1", "FBXL16", "SLFN14", "ATP8B3", "ZNF396", "STOM", "CCDC30", "FAM86FP", "FAM129C", "TMEM42", "ZNF607", "IL11RA", "ECHDC2", "CCNE2", "PUS7L", "EFCAB12", "TGFA", "MAPK14", "NSG1", "C9orf142", "KLHL15", "LOC102723766", "VEGFB", "TJP3", "YAE1D1", "PROS1", "CSGALNACT2", "HLA-DOA", "TUBBP5", "RPL37A", "FAM195A", "LTB4R2", "NCAPH", "EVL", "CR1", "LTC4S", "ANKS3", "RPL35", "GLIS2", "TRAPPC6A", "LIPN", "RPS12", "RPL38", "LIME1", "PHGDH", "C4orf48", "SLED1", "BACH2", "EVA1C", "GATA3", "SLC25A38", "USP32", "RPL39", "EGR2", "USP35", "ZNF662", "PLEKHG1", "CRTAM", "AANAT", "OPLAH", "FBXO6", "LOXL1", "LOC101928786", "FRMD4B", "IL1R1", "ADM2", "DNASE1L1", "STAT4", "RPLP1", "TTC9", "BEND7", "F8", "FOSB", "SLAMF6", "STK17B", "PVT1", "ARVCF", "APOBEC3C", "CDK14", "RPS14", "POLQ", "CDC42BPG", "CCDC85B", "UBR5-AS1", "TNNT3", "TCL1A", "IL7R", "PTX3", "KLHL14", "MTRNR2L1", "ZBTB46", "RPL34", "VPS9D1-AS1", "KLF7", "RPL10A", "SLC2A11", "ACTN4", "RPSA", "SGK223", "PLSCR1", "SBF2", "LINC01420", "CDCA5", "F5", "ADGRB2", "RPL23A", "MAN1A2", "LLGL2", "GINS2", "NUDT14", "TG", "RSRC1", "TMEM256", "SYTL2", "TYSND1", "TPGS1", "ABHD14B", "LSM7", "ZNF281", "CEMP1", "M1AP", "CD82", "LOC101928150", "MRPL41", "IFIH1", "CCNB2", "IFITM10", "RPS3A", "TMEM8B", "CEP135", "SIRPB2", "IRS2", "SYCE1L", "SULT1B1", "ADAM9", "ZCCHC18", "DDIAS", "LINC01278", "MTHFD2", "TIGIT", "FUOM", "FOSL1", "RPS16", "BRAF", "CACNA1I", "EXT1", "GPR82", "LRRC45", "STAG1", "KIF11", "NCAM1", "RNF24", "KIAA2018", "SAMD9", "ABCA2", "PVRIG", "RASGRF2", "IRAK2", "ARHGEF4", "ZNF682", "RASSF7", "THAP7", "KIF1B", "GAS5", "KIF15", "RMRP", "GALM", "LYNX1", "C7orf50", "ATXN1L", "KCNN4", "LDLRAP1", "RETSAT", "RNF10", "NOXA1", "UBXN7", "UHRF1BP1L", "TESPA1", "MB21D1", "LINC00116", "FOXM1", "CCR6", "NMUR1", "RPL22", "LCK", "MEGF9", "CLSPN", "MIR4697HG", "PPP1R15A", "ADCK5", "SAMD3", "RPS27A", "TPPP", "PIWIL4", "ANKRD33B", "IL18RAP", "SNRPD2", "WASF1", "ALDH5A1", "FAM149B1", "RPS18", "SLCO4C1", "MGC16275", "ZNF837", "SAMHD1", "NSMF", "PDIA6", "SNX20", "ATF7IP2", "BPHL", "RPL35A", "NFKBIZ", "LMF1", "MGMT", "PPL", "FAM120C", "APRT", "RANGRF", "PHLDB3", "IDNK", "RPS15", "LGR6", "MOB1A", "CREB5", "THEM4", "ENDOG", "RPL7", "EDN1", "ZNF808", "SNHG15", "RPL3", "SMCR8", "FCMR", "ZNF449", "ZBTB37", "NHLRC4", "DHRS9", "C1QTNF3", "MS4A1", "AFF2", "FBXO32", "SLC6A6", "ZSCAN30", "CHPT1", "PARP14", "MZT2B", "FLT3LG", "TCTN2", "MAN1A1", "LOC728175", "ALOX5", "DUSP18", "ZC3H12D", "PECR", "LRRC37A3", "EPB41L4A-AS1", "SPAG5", "KDM7A", "DDX31", "NCOA6", "SLC6A12", "FCER1A", "FBXL13", "AKR7A2", "PRR5", "PRKCE", "PPP1R3D", "SYNM", "GLI1", "NTNG2", "EEF1G", "LOC643802", "MAGEE1", "PIK3AP1", "RPS8", "SNHG19", "C19orf70", "GAS7", "PHF12", "ATP11A", "SLC22A23", "ATP6V0E2", "UCKL1-AS1", "TIPARP", "BEX2", "FAM102A", "RPS20", "TTC16", "PLP2", "HAPLN3", "C19orf24", "ZNF397", "FOXO4", "TRAPPC2B", "QTRT1", "CDK10", "PDZD8", "GIPC1", "CD274", "C9orf139", "SPIB", "KLHDC7B", "ZNF32", "ELMO2", "AK4", "SAMD4B", "MFSD3", "EXD3", "SLC37A3", "RPL8", "TRIM56", "NMB", "PIPOX", "PXN-AS1", "SLC2A5")
```
***Please Note**: Over-representation services return large responses. To reduce the size of this vignette article, we will only display the first 10 rows of each Data Frame.*
## Enrichr {#enrichr .heading2}
[Enrichr](https://maayanlab.cloud/Enrichr/ "Enrichr") , developed in the Ma'ayan Lab, is a service to perform enrichment analysis against a considerable number of curated gene set libraries across various species.
We have thoroughly covered performing enrichment analysis with Enrichr in the vignette article [Enrichr & rbioap](rbioapi_enrichr.html "2.A: Enrichr & rbioapi")[i](https://rbioapi.moosa-r.com/articles/rbioapi_enrichr.html "2.A: Enrichr & rbioapi"). Therefore, we will only provide a brief demonstration here.
First, let us retrieve a list of available libraries in Enrichr using the `rba_enrichr_libs()` function. You can find the available Enrichr libraries in the `libraryName` column in the returned data frame.
```{r enrichr_libs, echo=TRUE, message=TRUE}
enrichr_libs <- rba_enrichr_libs()
```
```{r enrichr_libs_df, echo=FALSE}
if (is.data.frame(enrichr_libs)) {
DT::datatable(data = enrichr_libs,
autoHideNavigation = TRUE,
options = list(scrollX = TRUE,
paging = TRUE,
fixedHeader = TRUE,
keys = TRUE,
pageLength = 10))
} else {
print("Vignette building failed. It is probably because the web service was down during the building.")
}
```
While the only required argument of `rba_enrichr()` function is `gene_list`, here we use `gene_set_library` argument to request an enrichment analysis only against "KEGG_2021_Human" library.
```{r enrichr, echo=TRUE, message=TRUE}
enrichr_enrich <- rba_enrichr(gene_list = covid_critical,
gene_set_library = "KEGG_2021_Human")
```
```{r enrichr_df, echo=FALSE}
if (is.data.frame(enrichr_enrich) && hasName(enrichr_enrich, "Adjusted.P.value")) {
enrichr_enrich <- enrichr_enrich[order(enrichr_enrich$Adjusted.P.value),]
enrichr_enrich <- enrichr_enrich[1:min(10,
nrow(enrichr_enrich)),]
DT::datatable(data = enrichr_enrich,
autoHideNavigation = TRUE,
options = list(scrollX = TRUE,
paging = TRUE,
fixedHeader = TRUE,
keys = TRUE,
pageLength = 10
))
} else {
print("Vignette building failed. It is probably because the web service was down during the building.")
}
```
In the `gene_set_library` parameter, you can also provide multiple gene set libraries or leave it as the default parameter `NULL` to perform the enrichment analysis using every gene set library available in Enrichr. Also note that when `regex_library_name` parameter is TRUE (the default value), `gene_set_library` parameter will be considered as a regex pattern, so partial matchs will be also selected.
```{r enrichr_multi_libs, echo=TRUE, message=TRUE}
enrichr_enrich_kegg <- rba_enrichr(gene_list = covid_critical,
gene_set_library = "kegg",
regex_library_name = TRUE # default value
)
```
As you can see below, when than one library is selected, the results will be a list where each of its elements is the enrichment results of one library:
```{r enrichr_multi_libs_str, echo=TRUE, message=TRUE}
str(enrichr_enrich_kegg, max.level = 1)
```
## Reactome Analysis services {#reactome-analysis-services .heading2}
Reactome curates extensive and top-quality cellular pathways data across various species. Given the fact that proteins operate in organized units, which are the pathways, analyzing your list against Reactome pathways gene sets can provide valuable insights into the functional trajectories of your results.
You can refer to [Reactome & rbioapi](rbioapi_enrichr.html "2.E: Reactome & rbioapi") article for in-depth coverage of Reactome services. Here we will only demonstrate the over-representation functionality
The function `rba_reactome_analysis()` accepts gene lists or a table as input. If a table is supplied, the first column should consist of identifiers, and the next column should be numerical gene expression values. Refer to `rba_reactome_analysis()` function's manual for details. Here, we will use our gene list of critical COVID:
```{r reactome, echo=TRUE, message=TRUE}
reactome <- rba_reactome_analysis(input = covid_critical)
```
Let us examine the response's structure:
```{r reactome_str, echo=TRUE, message=TRUE}
str(reactome, max.level = 1)
```
The list's element names are self-explanatory. The results are returned as a data frame in the `pathways` element. Note that Reactome will map your provided gene identifiers and some identifiers may not be found. Hence, keep an eye on the `identifiersNotFound` element.
```{r reactome_df, echo=FALSE}
if (utils::hasName(reactome, "pathways") && is.data.frame(reactome$pathways)) {
reactome$pathways <- reactome$pathways[1:min(10,
nrow(reactome$pathways)),]
DT::datatable(data = reactome$pathways,
autoHideNavigation = TRUE,
options = list(scrollX = TRUE,
paging = TRUE,
fixedHeader = TRUE,
keys = TRUE,
pageLength = 10))
} else {
print("Vignette building failed. It is probably because the web service was down during the building.")
}
```
Naturally, you can alter the analysis parameters using the `rba_reactome_analysis()` function's arguments. The used parameters are returned in the "summary" element.
```{r reactome_summary, echo=TRUE, message=TRUE, eval=exists("reactome")&&hasName(reactome, "summary")}
str(reactome$summary)
```
In the above code chunk, note the "token" associated with your analysis. You can use this token to perform the following.
Retrieve the analysis results:
```{r reactome_token, echo=TRUE, message=TRUE, eval=FALSE}
reactome_2 <- rba_reactome_analysis_token(reactome$summary$token)
```
In addition to the main response, you can download other data associated with your request. see the argument `request` in the `rba_reactome_analysis_download()` function's manual for more information. Here, we will download a CSV file with the input identifiers which has not been found:
```{r reactome_download, echo=TRUE, message=TRUE, eval=FALSE}
rba_reactome_analysis_download(token = reactome$summary$token,
request = "not_found_ids",
save_to = "my_analysis.csv")
```
Reactome also generates a nice report of your analysis in pdf format, make sure to try it:
```{r reactome_pdf, echo=TRUE, message=TRUE, eval=FALSE}
rba_reactome_analysis_pdf(token = reactome$summary$token,
species = 9606, #Homo sapiens
save_to = "my_analysis.pdf")
```
## PANTHER {#panther .heading2}
PANTHER (protein analysis through evolutionary relationships) is a project that provides classification systems of genes and proteins. PANTHER also provides an enrichment service. In fact, the enrichment tool available on the [Gene Ontology (GO) website](https://geneontology.org/ "Gene Ontology Resource - Unifying Biology") is powered by [PANTHER](https://pantherdb.org/). Just like the previous sections, we will only review the enrichment functionality here; for an in-depth review of PANTHER refer to the vignette article "[PANTHER & rbioapi](rbioapi_panther.html "2.D: PANTHER & rbioapi")".
To perform the analysis, first, we need to choose the classifications (i.e., gene sets) that we want to compare our results against it. To retrieve a list of available gene sets, we call the following function:
```{r panther_sets, echo=TRUE, message=TRUE}
panther_sets <- rba_panther_info(what = "datasets")
```
```{r panther_sets_df, echo=FALSE}
if (is.data.frame(panther_sets)) {
panther_sets <- panther_sets[,!colnames(panther_sets) %in% c("description")]
DT::datatable(data = panther_sets,
autoHideNavigation = TRUE,
options = list(scrollX = TRUE,
paging = TRUE,
fixedHeader = TRUE,
keys = TRUE,
pageLength = 10))
} else {
print("Vignette building failed. It is probably because the web service was down during the building.")
}
```
The Gene Ontology (GO) project is one of the pinnacles of scientists' collective effort in bioinformatics. The GO Consortium provides a comprehensive model of biological systems. In short, GO curates a thoroughly designed directed acyclic graph (DAG) of ontologies. You may think of it as a tree of terms, where as it branches out, the terms become more specific). Each protein may be annotated with one or more terms. The terms are organized in three domains: "Molecular Function," "Biological Process," and "Cellular Component". GO slim datasets refer to subsets which are a cut-down version of GO terms. If you are not familiar with GO, I strongly encourage you to see this page and follow the links it provides: [About the GO resource](https://geneontology.org/docs/introduction-to-go-resource/ "About the GO resource").
Here, we demonstrate the enrichment analysis using the Biological Process domain. Note that you should enter the "id" of the datasets, not its label. For example, entering "biological_process" is incorrect, you should rather enter the following:.
```{r panther_enrich, echo=TRUE, message=TRUE}
panther_enrich <- rba_panther_enrich(genes = covid_critical,
organism = 9606, #Homo sapiens
annot_dataset = "GO:0008150" #Biological Process
)
```
In addition to the enrichment results, PANTHER returns other useful information about your analysis. The names are self-explanatory:
```{r panther_enrich_str, echo=TRUE, message=TRUE}
str(panther_enrich, 2)
```
The enrichment results are returned as a Data Frame with an element named result:
```{r panther_enrich_df, echo=FALSE}
if (utils::hasName(panther_enrich, "result") && is.data.frame(panther_enrich$result)) {
panther_enrich$result <- panther_enrich$result[1:min(10,
nrow(panther_enrich$result)),]
DT::datatable(data = panther_enrich$result,
autoHideNavigation = TRUE,
options = list(scrollX = TRUE,
paging = TRUE,
fixedHeader = TRUE,
keys = TRUE,
pageLength = 10))
} else {
print("Vignette building failed. It is probably because the web service was down during the building.")
}
```
## STRING {#string .heading2}
In addition to proteins interaction data, STRING also curates proteins/genes annotations and provides enrichment analysis services. You can perform the analysis against multiple gene sets. Some of the gene sets can be accessed using other services but two gene sets are exclusive to STRING. See "[D. Szklarczyk et al.: Nucleic Acids Res. 49, D1 (2021)](https://doi.org/10.1093/nar/gkaa1074 "The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets")" for further information. Again. We will only review the enrichment functionality here; for an in-depth review of STRING refer to the vignette article "[STRING & rbioapi](rbioapi_string.html "2.F: STRING & rbioapi")".
- Gene sets exclusively in STRING:
1. Reference Publications ([PubMed](https://pubmed.ncbi.nlm.nih.gov/))
2. [Hierarchical clustering of the STRING network](https://string-db.org/help/faq/#what-are-local-string-network-clusters "What are local STRING network clusters?")
- Gene sets also available in other services:
1. [Gene Ontology](https://geneontology.org/ "Gene Ontology Resource - Unifying Biology") domains: Molecular Function, Biological Process, and Cellular Component
2. [COMPARTMENTS](https://academic.oup.com/database/article/doi/10.1093/database/bau012/2633793 "Subcellular localization database")
3. [Human Disease Ontology](https://disease-ontology.org/ "Disease Ontology")
4. [InterPro](https://www.ebi.ac.uk/interpro/ "Classification of protein families")
5. [KEG1G pathways](https://www.genome.jp/kegg/pathway.html "Wiring diagrams of molecular interactions, reactions and relations")
6. [UniProt keyword](https://www.uniprot.org/keywords/ "UniProt keyword: a controlled vocabulary with a hierarchical structure")
7. [Pfam](https://www.ebi.ac.uk/interpro/ "InterPro: Classification of protein families")
8. [Reactome pathways](https://reactome.org/PathwayBrowser/ "free, open-source, curated and peer-reviewed pathway database")
9. [BRENDA Tissue and Enzyme Source Ontology](https://bioportal.bioontology.org/ontologies/BTO/)
10. [SMART](http://smart.embl-heidelberg.de/ "Genetically mobile domains and the analysis of domain architectures")
We can directly supply our `covid_critical` variable to STRING's enrichment function:
```{r string_enrich, echo=TRUE, message=TRUE}
string_enrich <- rba_string_enrichment(ids = covid_critical,
species = 9606 #Homo sapiens
)
```
As you can see, by default a list is returned where each element is a data Frame with enrichment analysis results against one of the gene sets listed above:
```{r string_str, echo=TRUE, message=TRUE}
str(string_enrich, max.level = 1)
```
In the "PMID" element, we see which PubMed papers are over-represented with our gene list. To keep the table tidy, we are not displaying the `inputGenes` and `preferredNames` columns.
```{r string_df, echo=FALSE}
if (utils::hasName(string_enrich, "PMID") && is.data.frame(string_enrich$PMID)) {
string_enrich$PMID <- string_enrich$PMID[, !colnames(string_enrich$PMID) %in% c("inputGenes", "preferredNames")]
string_enrich$PMID <- string_enrich$PMID[1:min(10,
nrow(string_enrich$PMID)),]
DT::datatable(data = string_enrich$PMID,
autoHideNavigation = TRUE,
options = list(scrollX = TRUE,
paging = TRUE,
fixedHeader = TRUE,
keys = TRUE,
pageLength = 10))
} else {
print("Vignette building failed. It is probably because the web service was down during the building.")
}
```
You can retrieve the annotations associated with your input protein(s). The difference here is that no statistical analysis will be performed and the full annotations of your input gene will be returned. Because of potentially unwieldy response, `allow_pubmed` is set fo FALSE by default. Here we set it to TRUE to also retrieve the PubMed paper abstracts that include our gene.
```{r string_annot, echo=TRUE, message=TRUE}
string_annot <- rba_string_annotations(ids = "CD177",
species = 9606, #Homo sapiens
allow_pubmed = TRUE
)
```
As you can see, Our input gene has a large number of annotations. Let us take a look at the PMID Data Frame, which contains the PubMed Abstract paper terms:
```{r string_annot_str, echo=TRUE, message=TRUE}
str(string_annot, max.level = 1)
```
```{r string_annot_df, echo=FALSE}
if (utils::hasName(string_annot, "PMID") && is.data.frame(string_annot$PMID)) {
string_annot$PMID <- string_annot$PMID[1:min(10,
nrow(string_annot$PMID)),]
DT::datatable(data = string_annot$PMID,
autoHideNavigation = TRUE,
options = list(scrollX = TRUE,
paging = TRUE,
fixedHeader = TRUE,
keys = TRUE,
pageLength = 10))
} else {
print("Vignette building failed. It is probably because the web service was down during the building.")
}
```
## miEAA {#mieaa .heading2}
The miRNA Enrichment Analysis and Annotation Tool ([miEAA](https://ccb-compute2.cs.uni-saarland.de/mieaa2/ "https://ccb-compute2.cs.uni-saarland.de/mieaa2")) is a service provided by the [Chair for Clinical Bioinformatics at Saarland University](https://www.ccb.uni-saarland.de/). What makes miEAA unique among other services presented here is that miEAA curates miRNA sets. Hence you can directly perform miRNA enrichment analysis across various species with the services miEAA provides.
We have in-depth covered performing enrichment analysis with miEAA in the vignette article "[miEEA & rbioapi](rbioapi_mieaa.html "2.C: miEEA & rbioapi")[i](https://rbioapi.moosa-r.com/articles/rbioapi_enrichr.html "2.A: Enrichr & rbioapi")". Therefore, we will only provide a brief demonstration here.
We can not use our `covid_critical` gene list here. We rather use a miRNA list from the paper "[A. Parray et al.: Vaccines 9, 1056 (2021)](https://doi.org/10.3390/vaccines9101056 "SnoRNAs and miRNAs Networks Underlying COVID-19 Disease Severity")". They used microarray to find differentially expressed miRNA in the severe versus mild COVID 19 patients' blood.
```{r input_mirs, echo=TRUE, message=TRUE}
covid_mirna <- c("hsa-miR-3609", "hsa-miR-199a-5p", "hsa-miR-139-5p", "hsa-miR-145-5p","hsa-miR-3651", "hsa-miR-1273h-3p", "hsa-miR-4632-5p","hsa-miR-6861-5p", "hsa-miR-6802-5p","hsa-miR-5196-5p","hsa-miR-92b-5p", "hsa-miR-6805-5p","hsa-miR-98-5p","hsa-miR-3185", "hsa-miR-572","hsa-miR-371b-5p","hsa-miR-3180", "hsa-miR-8073","hsa-miR-4750-5p","hsa-miR-6075", "hsa-let-7i-5p","hsa-miR-1231","hsa-miR-885-3p")
```
While not necessary, we can first see what enrichment categories are available for our mature human miRNA:
```{r mieaa_libs, echo=TRUE, message=TRUE}
rba_mieaa_cats(mirna_type = "mature",
species = 9606 #Homo sapiens
)
```
Next, we will use the wrapper function available to perform the necessary steps in one function call. Note that we have supplied the categories parameter. If left NULL, the enrichment will be performed against every available category.
```{r mieaa, echo=TRUE, message=TRUE}
mieaa_enrich <- rba_mieaa_enrich(test_set = covid_mirna,
mirna_type = "mature",
test_type = "ORA",
species = 9606,
categories = "miRPathDB_GO_Biological_process_mature")
```
```{r mieaa_df, echo=FALSE}
if (is.data.frame(mieaa_enrich)) {
mieaa_enrich <- mieaa_enrich[1:10,]
DT::datatable(data = mieaa_enrich,
autoHideNavigation = TRUE,
options = list(scrollX = TRUE,
paging = TRUE,
fixedHeader = TRUE,
keys = TRUE,
pageLength = 10
))
} else {
print("Vignette building failed. It is probably because the web service was down during the building.")
}
```
------------------------------------------------------------------------
# How to cite? {#how-to-cite .heading2}
rbioapi is an interface between you and other databases and services. Thus, if you have used rbioapi in published research, kindly in addition to citing rbioapi, make sure to fully and properly cite the databases/services you have used. Suggested citations have been added in the functions' manuals, under the "references" section; Nevertheless, it is the user's responsibility to check for proper citations and to properly cite the database/services that they have used.
Please see the [rbioapi's main vignette article](rbioapi.html#how-to-cite) for more details.
------------------------------------------------------------------------
# Links {#links .heading2}
- [This article in rbioapi documentation site](https://rbioapi.moosa-r.com/articles/rbioapi_do_enrich.html "Over-Representation Analysis with rbioapi")
- [Functions references in rbioapi documentation site](https://rbioapi.moosa-r.com/reference/index.html "rbioapi reference")
- [rbioapi vignette index](rbioapi.html "rbioapi: User-Friendly R Interface to Biologic Web Services' API")
------------------------------------------------------------------------
# Session info {#session-info .heading2}
```{r sessionInfo, echo=FALSE}
sessionInfo()
```