-
Notifications
You must be signed in to change notification settings - Fork 20
/
bioc_workshop.Rmd
561 lines (451 loc) · 18.7 KB
/
bioc_workshop.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
---
title: "OmniPath Bioconductor workshop"
author:
- name: Denes Turei
email: turei.denes@gmail.com
correspondance: true
- name: Alberto Valdeolivas
- name: Attila Gabor
- name: Julio Saez-Rodriguez
affiliation: Institute for Computational Biomedicine, Heidelberg University
package: OmnipathR
output:
BiocStyle::html_document:
number_sections: yes
toc: yes
toc_depth: 4
pandoc_args:
- '--lua-filter=scholarly-metadata.lua'
- '--lua-filter=author-info-blocks.lua'
pdf_document:
number_sections: yes
toc: yes
toc_depth: 4
pandoc_args:
- '--lua-filter=scholarly-metadata.lua'
- '--lua-filter=author-info-blocks.lua'
abstract: |
OmniPath is a database of molecular signaling knowledge, combining data
from more than 100 resources. It contains protein-protein and gene
regulatory interactions, enzyme-PTM relationships, protein complexes,
annotations about protein function, structure, localization and
intercellular communication. OmniPath focuses on networks with directed
interactions and effect signs (activation or inhibition) which are suitable
inputs for many modeling techniques. OmniPath also features a large
collection of proteins’ intercellular communication roles and interactions.
OmniPath is distributed by a web service at https://omnipathdb.org/. The
Bioconductor package OmnipathR is an R client with full support for all
features of the OmniPath web server. Apart from OmniPath, it provides
direct access to more than 15 further signaling databases (such as BioPlex,
InBioMap, EVEX, Harmonizome, etc) and contains a number of convenience
methods, such as igraph integration, and a close integration with the
NicheNet pipeline for ligand activity prediction from transcriptomics data.
In this demo we show the diverse data in OmniPath and the versatile and
convenient ways to access this data by OmnipathR.
vignette: |
%\VignetteIndexEntry{OmniPath Bioconductor workshop}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
fig_width: 9
fig_height: 7
---
```{r set-options, echo=FALSE, cache=FALSE}
options(width = 110)
```
# Introduction
Database knowledge is essential for omics data analysis and modeling. Despite
being an important factor, contributing to the outcome of studies, often
subject to little attention. With [OmniPath](https://omnipathdb.org/) our aim
is to raise awarness of the diversity of available resources and facilitate
access to these resources in a uniform and transparent way. OmniPath has been
developed in a close contact to mechanistic modeling applications and
functional omics analysis, hence it is especially suitable for these fields.
OmniPath has been used for the analysis of various omics data. In the
[Saez-Rodriguez group](https://saezlab.org) we often use it in a pipeline
with our footprint based methods [DoRothEA](
https://saezlab.github.io/dorothea/) and [PROGENy](
https://saezlab.github.io/progeny/) and our causal reasoning method
[CARNIVAL](https://saezlab.github.io/CARNIVAL/) to infer signaling mechanisms
from transcriptomics data.
One recent novelty of OmniPath is a collection of intercellular communication
interactions. Apart from simply merging data from existing resources,
OmniPath defines a number of intercellular communication roles, such as
ligand, receptor, adhesion, enzyme, matrix, etc, and generalizes the terms
_ligand_ and _receptor_ by introducing the terms _transmitter_, _receiver_
and _mediator_. This unique knowledge base is especially adequate for the
emerging field of cell-cell communication analysis, typically from single
cell transcriptomics, but also from other kinds of data.
# Overview
## Pre-requisites
No special pre-requisites apart from basic knowledge of R. OmniPath, the
database resource in the focus of this workshop has been published in [1,2],
however you don't need to know anything about OmniPath to benefit from the
workshop. In the workshop we will demonstrate the [R/Bioconductor package](
https://bioconductor.org/packages/release/bioc/html/OmnipathR.html)
[OmnipathR](https://github.com/saezlab/OmnipathR). If you would like to try
the examples yourself we recommend to install the latest version of the
package before the workshop:
```{r installation, eval=FALSE}
library(devtools)
install_github('saezlab/OmnipathR')
```
## Participation
In the workshop we will present the design and some important features of the
OmniPath database, so can be confident you get the most out of it. Then we
will demonstrate further useful features of the OmnipathR package, such as
accessing other resources, building graphs. Participants are encouraged to
experiment with the examples and shape the contents of the workshop by asking
questions. We are happy to recieve questions and topic suggestions **by
email** also **before the workshop**. These could help us to adjust the
contents to the interests of the participants.
## _R_ / _Bioconductor_ packages used
* OmnipathR
* igraph
* dplyr
## Time outline
Total: 45 minutes
| Activity | Time |
|------------------------------|------|
| OmniPath database overview | 5m |
| Network datasets | 10m |
| Other OmniPath databases | 5m |
| Intercellular communication | 10m |
| Igraph integration | 5m |
| Further resources | 10m |
## Workshop goals and objectives
In this workshop you will get familiar with the design and features of the
OmniPath databases. For example, to know some important details about the
datasets and parameters which help you to query the database the most
suitable way according to your purposes. You will also learn about
functionalities of the _OmnipathR_ package which might make your work
easier.
## Learning goals
* Learn about the OmniPath database, its contents and how it can be useful
* Get a picture about the OmnipathR package capabilities
* Learn about the datasets and parameters of various OmniPath query types
## Learning objectives
* Try examples of each OmniPath query type with various parameters
* Build igraph networks, search for paths
* Access some further interesting resources
# Workshop
```{r library}
library(OmnipathR)
```
## Data from OmniPath
OmniPath consists of five major databases, each combining many original
resources. The five databases are:
* Network (interactions)
* Enzyme-substrate relationships (enzsub)
* Protein complexes (complexes)
* Annotations (annotations)
* Intercellular communication roles (intercell)
The parameters for each database (query type) are available in the web
service, for example: https://omnipathdb.org/queries/interactions. The
R package supports all features of the web service and the parameter
names and values usually correspond to the web service parameters which
you would use in a HTTP query string.
### Networks
The network database contains protein-protein, gene regulatory and miRNA-mRNA
interactions. Soon more interaction types will be added. Some of these
categories can be further divided into datasets which are defined by the
type of evidences. A full list of network datasets:
* Protein-protein interactions *(post_translational)*
- **omnipath:** literature curated, directed interactions with effect signs;
corresponds to the first edition of OmniPath, hence the confusing name
is due to historical reasons
- **pathwayextra:** directed and signed interactions, without literature
references (might be literature curated, but references are not
available)
- **kinaseextra:** enzyme-PTM interactions without literature references
- **ligrecextra:** ligand-receptor interactions without literature
references
* Gene regulatory interactions *(transcriptional)*
- **dorothea:** a comprehensive collection built out of 18 resources,
contains literature curated, ChIP-Seq, gene expression derived
and TF binding site predicted data, with 5 confidence levels (A-E)
- **tf_target:** additional literature curated interactions
* miRNA interactions *(post_transcriptional and mirna_transcriptional)*
- **mirnatarget:** literature curated miRNA-mRNA interactions
- **tf_mirna:** literature curated TF-miRNA interactions (transcriptional
regulations of miRNA)
* lncRNA interactions *(lncrna_post_transcriptional)*
- **lncrna_mrna:** literature curated lncRNA-mRNA interactions
* Small molecule-protein interactions *(small_molecule_protein)*
- **small_molecule:** metabolites, intrinsic ligands or drug compounds
targeting human proteins
The functions accessing the above datasets are listed
[here](https://r.omnipathdb.org/reference/omnipath-interactions.html).
Not individual interactions but resource are classified into the datasets
above, so these can overlap. Each interaction type and dataset has its
dedicated function in `OmnipathR`, above we provide links to their help
pages. As an example, let's see the gene regulatory interactions:
```{r network}
gri <- transcriptional()
gri
```
The interaction data frame contains the UniProt IDs and Gene Symbols of
the interacting partners, the list of resources and references (PubMed IDs)
for each interaction, and whether the interaction is directed,
stimulatory or inhibitory.
#### Igraph integration
The network data frames can be converted to igraph graph objects, so you
can make use of the graph and visualization methods of igraph:
```{r network-igraph}
gr_graph <- interaction_graph(gri)
gr_graph
```
On this network we can use `OmnipathR`'s `find_all_paths` function, which
is able to look up all paths up to a certain length between two set of
nodes:
```{r paths}
paths <- find_all_paths(
graph = gr_graph,
start = c('EGFR', 'STAT3'),
end = c('AKT1', 'ULK1'),
attr = 'name'
)
```
*As this is a gene regulatory network, the paths are TFs regulating the
transcription of other TFs.*
### Enzyme-substrate relationships
Enzyme-substrate interactions are also available also in the interactions
query, but the enzyme-substrate query type provides additional information
about the PTM types and residues.
```{r enzsub}
enz_sub <- enzyme_substrate()
enz_sub
```
This data frame also can be converted to an igraph object:
```{r enzsub-igraph}
es_graph <- enzsub_graph(enz_sub)
es_graph
```
It is also possible to add effect signs (stimulatory or inhibitory) to
enzyme-PTM relationships:
```{r enzsub-signs}
es_signed <- signed_ptms(enz_sub)
```
### Protein complexes
```{r complexes}
cplx <- complexes()
cplx
```
The resulted data frame provides the constitution and stoichiometry of
protein complexes, with references.
### Annotations
The annotations query type includes a diverse set of resources (about 60 of
them), about protein function, localization, structure and expression. For
most use cases it is better to convert the data into wide data frames, as
these correspond to the original format of the resources. If you load more
than one resources into wide data frames, the result will be a list of
data frames, otherwise one data frame. See a few examples with localization
data from UniProt, tissue expression from Human Protein Atlas and
pathway information from SignaLink:
```{r uniprot-loc}
uniprot_loc <- annotations(
resources = 'UniProt_location',
wide = TRUE
)
uniprot_loc
```
The `entity_type` field can be protein, mirna or complex. Protein complexes
mostly annotated based on the consensus of their members, we should be aware
that this is an *in silico* inference.
In case of spelling mistake either in parameter names or values `OmnipathR`
either corrects the mistake or gives a warning or error:
```{r uniprot-loc-1}
uniprot_loc <- annotations(
resources = 'Uniprot_location',
wide = TRUE
)
```
Above the name of the resource is wrong. If the parameter name is wrong, it
throws an error:
```{r uniprot-loc-2, error = TRUE}
uniprot_loc <- annotations(
resuorces = 'UniProt_location',
wide = TRUE
)
```
Singular vs. plural forms and a few synonyms are automatically corrected:
```{r uniprot-loc-3}
uniprot_loc <- annotations(
resource = 'UniProt_location',
wide = TRUE
)
```
Another example with tissue expression from Human Protein Atlas:
```{r hpa-tissue}
hpa_tissue <- annotations(
resources = 'HPA_tissue',
wide = TRUE,
# Limiting to a handful of proteins for a faster vignette build:
proteins = c('DLL1', 'MEIS2', 'PHOX2A', 'BACH1', 'KLF11', 'FOXO3', 'MEFV')
)
hpa_tissue
```
And pathway annotations from SignaLink:
```{r slk-pathway}
slk_pathw <- annotations(
resources = 'SignaLink_pathway',
wide = TRUE
)
slk_pathw
```
#### Combining networks with annotations
Annotations can be easily added to network data frames, in this case both
the source and target nodes will have their annotation data. This function
accepts either the name of an annotation resource or an annotation data
frame:
```{r annotate-network}
network <- omnipath()
network_slk_pw <- annotated_network(network, 'SignaLink_pathway')
network_slk_pw
```
### Intercellular communication roles
The `intercell` database assigns roles to proteins such as ligand, receptor,
adhesion, transporter, ECM, etc. The design of this database is far from
being simple, best is to check the description in the recent OmniPath paper
[1].
```{r intercell}
ic <- intercell()
ic
```
This data frame is about individual proteins. To create a network of
intercellular communication, we provide the `intercell_network`
function:
```{r intercell-network}
icn <- intercell_network(high_confidence = TRUE)
icn
```
The result is similar to the `annotated_network`, each interacting partner
has its intercell annotations. In the `intercell` database, OmniPath aims to
ship all available information, which means it might contain quite some
false positives. The `high_confidence` option is a shortcut to stringent
filter settings based on the number and consensus of provenances. Using
instead the `filter_intercell_network` function, you can have a fine control
over the quality filters. It has many options which are described in the
manual.
```{r intercell-filter}
icn <- intercell_network()
icn_hc <- filter_intercell_network(
icn,
ligand_receptor = TRUE,
consensus_percentile = 30,
loc_consensus_percentile = 50,
simplify = TRUE
)
```
The `filter_intecell` function does a similar procedure on an intercell
annotation data frame.
### Metadata
The list of available resources for each query type can be retrieved
by the `..._resources` functions. For example, the annotation resources
are:
```{r annot-res}
annotation_resources()
```
Categories in the `intercell` query also can be listed:
```{r intercell-cat}
intercell_generic_categories()
# intercell_categories() # this would show also the specific categories
```
## Data from other resources
An increasing number of other resources (currently around 20) can be directly
accessed by `OmnipathR` (not from the omnipathdb.org domain, but from their
original providers). As an example,
## General purpose functionalities
### Identifier translation
`OmnipathR` uses UniProt data to translate identifiers. You may find a list
of the available identifiers in the manual page of `translate_ids` function.
The evaluation of the parameters is tidyverse style, and both UniProt's
notation and a simple internal notation can be used. Furthermore, it can
handle vectors, data frames or list of vectors.
```{r id-translate-vector}
d <- data.frame(uniprot_id = c('P00533', 'Q9ULV1', 'P43897', 'Q9Y2P5'))
d <- translate_ids(
d,
uniprot_id = uniprot, # the source ID type and column name
genesymbol # the target ID type using OmniPath's notation
)
d
```
It is possible to have one source ID type and column in one call, but
multiple target ID types and columns: to translate a network, two calls
are necessary. *Note: certain functionality fails recently due to changes
in other packages, will be fixed in a few days.*
```{r id-translate-df, eval = FALSE}
network <- omnipath()
network <- translate_ids(
network,
source = uniprot_id,
source_entrez = entrez
)
network <- translate_ids(
network,
target = uniprot_id,
target_entrez = entrez
)
```
### Gene Ontology
`OmnipathR` is able to look up ancestors and descendants in ontology trees,
and also exposes the ontology tree in three different formats: as a
data frame, as a list of lists or as an igraph graph object. All these
can have two directions: child-to-parent (`c2p`) or parent-to-child (`p2c`).
```{r go}
go <- go_ontology_download()
go$rel_tbl_c2p
```
To convert the relations to list or graph format, use the
`relations_table_to_list` or `relations_table_to_graph` functions. To
swap between `c2p` and `p2c` use the `swap_relations` function.
```{r go-graph}
go_graph <- relations_table_to_graph(go$rel_tbl_c2p)
go_graph
```
It can also translate term IDs to term names:
```{r go-name}
ontology_ensure_name('GO:0000022')
```
*The first call takes a few seconds as it loads the database, subsequent
calls are faster.*
## Useful tips
`OmnipathR` features a logging facility, a YML configuration file and
a cache directory. By default the highest level messages are printed to
the console, and you can browse the full log from R by calling
`omnipath_log()`. The cache can be controlled by a number of functions,
for example you can search for cache files by `omnipath_cache_search()`,
and delete them by `omnipath_cache_remove`:
```{r cache}
omnipath_cache_search('dorothea')
```
The configuration can be set by `options`, all options are prefixed with
`omnipath.`, and can be saved by `omnipath_save_config`. For example, to
exclude all OmniPath resources which don't allow for-profit use:
```{r license, eval = FALSE}
options(omnipath.license = 'commercial')
```
The internal state is contained by the `omnipathr.env` environment.
## Further information
Find more examples in the other vignettes and the manual. For example, the
NicheNet vignette presents the integratation between `OmnipathR` and
`nichenetr`, a method for prediction of ligand-target gene connections.
Another Bioconductor package `wppi` is able to add context specific scores
to networks, based on genes of interest, functional annotations and network
proximity (random walks with restart). The new `paths` vignette presents
some approaches to construct pathways from networks. The design of the
OmniPath database is described in our recent paper [1], while an in depth
analysis of the pathway resources is available in the first OmniPath
paper [2].
# Session info {.unnumbered}
```{r sessionInfo, echo=FALSE}
sessionInfo()
```
# References {.unnumbered}
[1] D Turei, A Valdeolivas, L Gul, N Palacio-Escat, M Klein, O Ivanova,
M Olbei, A Gabor, F Theis, D Modos, T Korcsmaros and J Saez-Rodriguez (2021)
Integrated intra- and intercellular signaling knowledge for multicellular
omics analysis. _Molecular Systems Biology_ 17:e9923
[2] D Turei, T Korcsmaros and J Saez-Rodriguez (2016) OmniPath: guidelines and
gateway for literature-curated signaling pathway resources. _Nature Methods_
13(12)