/
box.rmd
224 lines (148 loc) · 10.9 KB
/
box.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
---
title: "Get started"
author: Konrad Rudolph
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: true
md_document:
variant: gfm
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Get started}
%\VignetteEncoding{UTF-8}
---
```{r include = FALSE}
devtools::load_all(export_all = FALSE, helpers = FALSE, attach_testthat = FALSE)
detach('package:box') # No need to keep the package attached.
detach('devtools_shims')
# Hide (unused) module alias from subsequent `ls()` call output.
box::use(.sf = ./source_file[...])
```
## Using modules
For the purpose of this tutorial, we are going to use the example module `bio/seq`. The module implements some very basic mechanisms for dealing with DNA sequences (= character strings consisting of the letters `A`, `C`, `G` and `T`).
First, we load the module:
```{r}
box::use(./bio/seq)
```
The function `box::use` accepts a list of *unquoted*, *qualified* module names. Each of these module names will load a single module and make it available to the caller in some form. In the code above, we’ve loaded a single module, `bio/seq`. `bio` serves as a *parent module* that may group several submodules. Since the module name inside `box::use` starts with `./`, the module location is resolved *locally*, i.e. relative to the path of the currently running code.
In the above, `seq` is the module’s *proper name*. `bio/seq` is its *fully qualified name*. And `./bio/seq` is its *`use` declaration*.
To see the effect of this `use` declaration, let’s inspect our workspace:
```{r}
ls()
seq
```
We have used the module’s fully qualified name to load it. But, as shown by `ls`, loading the module this way only introduces a single new name into the current scope, the module itself, identified by its proper (non-qualified) name.
To see which names a module exports, we use `ls` again, this time on the module itself:
```{r}
ls(seq)
```
It appears that `seq` exports `r length(ls(seq))` different names. To access exported names, we use the `$` operator: <code>`r paste0('seq$', lsf.str(seq)[[1L]])`</code> allows us to use the first function in the list of exported names. We can also display the interactive help for individual names using the `box::help` function, e.g.:
```{r eval = FALSE}
box::help(seq$revcomp)
```
Now let’s actually *use* the module. The `seq` function inside the `bio/seq` module constructs a set of (optionally named) biological sequences:
```{r}
s = seq$seq(
gene1 = 'GATTACAGATCAGCTCAGCACCTAGCACTATCAGCAAC',
gene2 = 'CATAGCAACTGACATCACAGCG'
)
seq$is_valid(s)
s
```
Note how we automatically get pretty-printed ([FASTA][]) output because the `print` method (which gets called implicitly here) is specialised for the `'bio/seq'` S3 class in the `bio/seq` module (prefixing S3 classes inside modules with the full module name is a convention to avoid name clashes of S3 classes):
```{r}
getS3method('print', 'bio/seq')
```
The source code for `` `print.bio/seq` `` contains an interesting `use` declaration: it showcases an alternative way of invoking `box::use`, which we’ll explore now.
## Attaching modules
Let’s have a look at alternative ways of using modules.
To start, let’s unload the `bio/seq` module …
```{r}
box::unload(seq)
```
… and load it again, via a different route:
```{r}
options(box.path = getwd())
box::use(bio/seq[revcomp, is_valid])
```
After unloading the already loaded module, `options(box.path = …)` sets the module search path: this is where `box::use` searches for modules. If more than one path is given, `box::use` searches them all until a module of matching name is found. This works analogously to how `.libPaths` operates on R packages.
The `box::use` directive can now use `bio/seq` instead of `./bio/seq` as the module name: rather than a relative name we specify a *global* name. In this example we set the search path to the current working directory but in normal usage it would be a global library location, e.g. (following the [XDG base directory specification][XDG]) `~/.local/share/R/modules` on Linux.
[FASTA]: https://en.wikipedia.org/wiki/FASTA_format
[XDG]: https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html
**Note** that non-local module names *must* be fully qualified, nested modules: `box::use(foo/bar)` works, `box::use(bar)` does not (instead, it is assumed that `bar` refers to a *package*)!
In the declaration above we use `[revcomp, is_valid]` to specify that the names `revcomp` and `is_valid` from the `bio/seq` module should be attached in the calling environment. The `[…]` part is an *attach specification*: a comma-separated list of names inside the parentheses specifies which names to attach. The special symbol `...` can be used to specify that *all exported names* should be attached. This has an effect similar to conventional package loading via `library` (or `attach`ing an environment): all the attached names are now available for direct use without necessitating the `seq$` qualifier:
```{r}
is_valid(s)
revcomp(s)
```
However, unlike the `attach` function, module attachment happens in the *current, local scope* only.
Since the above code was executed in the global environment, there’s no distinction between local and global scope:
```{r}
search()
```
Note the second item, which reads “<code>`r search()[2L]`</code>”. But let’s now undo that, to attach (and use) the module locally instead:
```{r}
detach()
seq_table = function (s) {
box::use(./bio/seq[...])
table(s)
}
seq_table(s)
```
Unlike above, we are now attaching *all* exported names instead of specifying individual names. The subsequent line of code uses the `seq$table` function rather than `base::table` (which would have a different output). And note that the `seq` module’s `table` function is *not* attached outside the local scope:
```{r}
search()
table(s)
```
This is very powerful, as it isolates separate scopes more effectively than the `attach` function. What is more, modules which are used and attached inside another module *remain* inside that module and are not visible outside the module by default.
Nevertheless, the normal, recommended usage of a module is without an attach specification, as this makes it clearer which names are being referring to.
## Writing modules
The module `bio/seq`, which we have used in the previous section, is implemented in the file [`bio/seq.r`](bio/seq.r). The file `seq.r` is, by and large, a regular R source file, which happens to live in a directory named `bio`.
In fact, there are only three things worth mentioning:
1. Documentation: functions in the module file can be documented using ‘[roxygen2](https://cran.r-project.org/package=roxygen2)’ syntax. It works the same as for packages. The ‘box’ package parses the documentation and makes it available via `box::help`. *Displaying module help requires that ‘roxygen2’ is installed.*
2. Export declarations: similar to packages, modules explicitly need to declare which names they export; they do this using the annotation comment `#' @export` in front of the name assignment. Again, this works similarly to ‘roxygen2’ (but does *not* require having that package installed).
3. [S3 functions](https://adv-r.hadley.nz/s3.html): ‘box’ registers and exports such functions automatically as necessary, but this only works for *user generics* that are defined inside the same module. When overriding “known generics” (such as `print`), we need to register these manually via `register_S3_method` (this is necessary since these functions are inherently ambiguous and there is no automatic way of finding them).
## Nesting modules
Modules can also form nested hierarchies. In fact, here is the implementation of `bio` (in [`bio/__init__.r`](bio/__init__.r): since `bio` is a directory rather than a file, the module implementation resides in the nested file `__init__.r`):
```{r box_file = 'bio/__init__.r'}
```
The submodule is specified as `./seq` rather than `seq`: the explicitly provided relative path prevents lookup in the import search path (that we set via `options(box.path = …)`); instead, only the current directory (that is, the directory containing the `bio` module) is considered.
When applied to a `box::use` declaration, `@export` causes all names which are imported by that declaration to also be exported: any module name created by the declaration (here, `seq`) is exported as-is. Furthermore, any attached name is likewise exported. Refer to the `box::use` documentation and examples for more details on which names are exported.
Coming back to our example module, we can now use the `bio` module:
```{r}
options(box.path = NULL) # Reset search path
box::use(./bio)
ls(bio)
ls(bio$seq)
bio$seq$revcomp('CAT')
```
We could also have implemented `bio` as follows:
```{r eval = FALSE}
#' @export
box::use(./seq[...])
```
This would have made all of `seq`’s definitions immediately available in `bio`, without having to always write `seq$…`. This is sometimes useful, but should be employed with care: being explicit about namespaces generally increases code robustness and readability.
## Code execution on loading
Modules define functions and values. To execute code when a module is loaded, put it inside a function with the name `.on_load`. This function is similar to the hook for the `.onLoad` *package* namespace event.
This function is executed the first time the module is loaded in an R session. Subsequent calls to `box::use` for that module, regardless of whether they occur in a different scope, will refer to the already loaded, cached module, and will *not* reload the module.
We can illustrate this by loading a module which has side-effects, `info`.
```{r box_file = 'info.r'}
```
Let’s use it:
```{r}
box::use(./info)
```
We have imported the module, and get the diagnostic messages. Let’s re-use the module:
```{r}
box::use(./info)
```
… no messages are displayed. However, we *can* explicitly reload a module. This clears the cache, and loads the module again. This can be useful during development and debugging:
```{r}
box::reload(info)
```
And this displays the messages again. The `reload` function is a shortcut for `unload` followed by `import` (using the exact same arguments as used on the original `import` call).
### Module helper functions
This `info` module also show-cases two important helper functions:
1. `box::name` returns the name of the module with which it was loaded. This is especially handy because, when called outside of a module, `box::name` is `NULL`. This allows testing whether a piece of code was loaded as a module, or invoked directly (e.g. via `Rscript` on the command line).
2. `box::file` is similar to `system.file`: it returns the full path to any file within the directory where a module is stored. This is useful when distributing data files with modules, which are loaded from within the module. When invoked without arguments, `box::file` returns the full path to the directory containing the module source file.