-
Notifications
You must be signed in to change notification settings - Fork 14
/
import.Rmd
606 lines (472 loc) · 23 KB
/
import.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
---
title: "The import package"
output:
rmarkdown::html_vignette:
toc: true
vignette: >
%\VignetteIndexEntry{The import package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
editor_options:
chunk_output_type: console
---
* This version: May, 2022
* Latest version at: https://import.rticulate.org/articles/import.html
* A briefer intro at: https://import.rticulate.org/
# Introduction
One of the most important aspects of the R ecosystem is the ease with which
extensions and new features can be developed and distributed in the form of
*packages.* The main distribution channel is the *Comprehensive R
Archive Network*, from which packages can be installed directly from
R. Another popular option is using GitHub repositories from which packages can
also be painlessly installed, e.g. using `install_github` from the
`devtools` package.
The `import` package provides an alternative approach to using external
functionality in R scripts; first however, it is useful to describe the standard
approach to clarify how `import` may serve as improvement.
The most common way to include the functionality provided by a package is to use
the `library` function:
```{r, eval = FALSE}
library(PackageA)
library(PackageB)
value1 <- function_a(...) # Supposedly this comes from PackageA,
value2 <- function_b(...) # and this from PackageB, but who knows?!
...
```
In some situations this is fine; however there are some subtle shortcomings:
1. Packages are *attached* and *all* of their exported objects are exposed,
2. When using more packages this way, the *order* in which they are attached
can be important,
3. It quickly becomes *unclear* to the reader of a script which package
certain functionality comes from, and
4. the terms "library" and "package" are often used incorrectly
(although a minor point, it seems to confuse somewhat).
The problem with (1) is that the search path is populated with more
objects than are needed and it is not immediately clear whether name clashes
will occur. Problem (2) refers to the case where packages export
different objects with the same names, say if `function_b` is exported in both
`PackageA` and `PackageB` above. In this case the name will
point to the object from the package attached *last*. The earlier exposed
objects are said to *masked*. Even if this is not a problem
when *writing* the script, an update of packages may cause this problem
later on when *executing* the script; and tracking down the resulting
errors may be tough and time consuming. Problem (3) may appear unimportant,
but it is not to be underestimated. Code snippets are very commonly shared
and spending time figuring out where functionality comes from is not
a very satisfying nor value-adding activity.
It is possible to unambiguously specify where a function comes from by
prefixing it with `::` every time it used, but this is often
overly verbose and does not provide an easily accessible overview of what
external functionality is used in a script. One may also import single
exported objects, one at a time, using the (double) "colon syntax",
```{r, eval = FALSE}
function_a <- PackageA::function_a
function_b <- PackageB::function_b
```
The downside of this approach is that the object is **placed
in the user's global work space**, rather than being encapsulated somewhere else
in the search path (when using `library` to load `pkg`, a namespace `package:pkg`
will be attached in the search path which will contain the exported functions
from `pkg`). Another minor point is that one can **only import one object at a
time** using this approach.
While packages form the backbone of code distribution, another option comes
in the form of *scripts*, but these are usually task specific and not
commonly used to "bundle" functionality for use in *other* scripts. In
particular, when `source` is used to include contents from one script in
another, once again *all* objects produced by the script will be "exposed"
and may "over populate" the working environment, masking other objects,
if not only producing some mental clutter. Scope management is therefore not too
comfortable when splitting functionality across files in a modular way.
The `import` package sets out to improve the way external functionality
is included in your code by alleviating some of the concerns raised above by
providing an expressive way of importing object from both packages and
scripts. The latter provides a bridge between the *package* approach to
distribution and simple stand-alone script files. This allows for the use of scripts
as *modules*, a collection of related object definitions, each of which
may be used at different places without exposing more than necessary.
The package is inspired in part by Python's
`from some_module import some_function` syntax, and solves
the two issues raised above. It is also similar to `roxygen2`s
`@importFrom package function1 function2` for packages. While `import` will
also work for package development, the intended use case is when using
external functions `R` scripts.
In addition to being able to import objects from packages, `import` also allows
you to import objects from other *scripts* (i.e. a kind of *module*). This allows
a simple way to distribute and use functionality without the need to write
a full package. One example is a Shiny app, where one can place definitions
in a script and import only the needed objects where they are used. This
avoids workspace clutter and name clashes. For more details see below.
# Basic Usage
## Importing from Packages
The most basic use case is to import a few functions from package
(here the `psych` package):
```{r, eval=FALSE}
import::from(psych, geometric.mean, harmonic.mean)
geometric.mean(trees$Volume)
```
The imported objects are placed in a separate entity in the search path which by
default is named "imports". It is therefore also easy to get rid of them again
with `detach("imports")`. The main point is that it is **clear which functions
will be used and where they come from**. It's noteworthy that there is nothing
special going on: the `import::from` function is only a convenient wrapper
around `getExportedValue` (as is `::` itself) and `assign`.
The `import` package itself should not to be attached
(don't include it via `library`, you will get a warning). Rather, it is designed
to be expressive when using the colon syntax. To import non-exported objects
one must use triple-colon syntax: `import:::from(pkg, obj)`.
If any of the `import` functions are called regularly, i.e. without preceding
`import::` or `import:::`, an error is raised. If `import` is attached, a
startup message will inform that `import` *should not* be attached.
If one of the function names conflicts with an existing function (such as `filter`
from the `dplyr` package) it is simple to rename it:
```{r, eval=FALSE}
import::from(dplyr, select, arrange, keep_when = filter)
keep_when(mtcars, hp>250)
```
This does pretty much what it says: three functions are imported from `dplyr`,
two of which will keep their original name, and one which is renamed, e.g. to
avoid name clash with `stats::filter`.
You can use `.all=TRUE` to import all functions from a package, but rename one of them:
```{r, eval=FALSE}
import::from(dplyr, keep_when = filter, .all=TRUE)
```
To omit a function from the import, use `.except` (which takes a character vector):
```{r, eval=FALSE}
import::from(dplyr, .except=c("filter", "lag"))
```
Note that `import` tries to be smart about this and assumes that if you are using the
`.except` parameter, you probably want to import everything you are _not_ explicitly omitting,
and sets the `.all` parameter to `TRUE`. You can override this in exceptional cases, but you
seldom need to.
Finally, a more complex example, combining a few different import statements:
```{r, eval = FALSE}
import::from(magrittr, "%>%")
import::from(dplyr, starwars, select, mutate, keep_when = filter)
import::from(tidyr, unnest)
import::from(broom, tidy)
ready_data <-
starwars %>%
keep_when(mass < 100) %>%
select(name, height, mass, films) %>%
unnest(films) %>%
mutate( log_mass = log(mass), films=factor(films))
linear_model <-
lm(log_mass ~ height + films, data = ready_data) %>%
tidy
```
In the above, it is clear *which* package provides *which* functions
(one could e.g. otherwise be tempted to think that `tidy` belonged to
`tidyr`). Note that ordering is irrelevant, even if `tidyr` at some point
exposes a function `tidy` after an update, as `import` is *explicit* about
importing.
It also shows that one can import multiple objects in a single
statement, and even rename objects if desired; for example, in the above
one can imagine that `filter` from `stats` is needed later on, and so
`dplyr`'s `filter` is renamed to avoid confusion. Sometimes, it is
not at all clear what purpose a package has; e.g. the name `magrittr` does
not immediately reveal that it's main purpose is to provide the pipe
operator, `%>%`.
### Importing Functions from "Module" Scripts {#module}
The `import` package allows for importing objects defined in script files,
which we will here refer to as "modules".
The module will be fully evaluated by `import` when an import is requested,
after which objects such as functions or data can be imported.
Such modules should be side-effect free, but this is
not enforced.
Attachments are detached (e.g. packages attached by `library`)
but loaded namespaces remain loaded. This means that *values* created
by functions in an attached namespace will work with `import`, but
functions to be exported *should not* rely on such functions (use function
importing in the modules instead).
For example, the file
[sequence_module.R](https://raw.githubusercontent.com/rticulate/import/master/man/examples/sequence_module.R)
contains several functions calculating terms of mathematical sequences.
It is possible to import from such files, just as one imports from packages:
```{r, eval=FALSE}
import::from(sequence_module.R, fibonacci, square, triangular)
```
Renaming, the `.all`, and the `.except` parameters work in the same way as for packages:
```{r, eval=FALSE}
import::from(sequence_module.R, fib=fibonacci, .except="square")
```
If a module is modified, `import` will
realize this and reload the script if further imports are executed or
re-executed; otherwise additional imports will not cause the script to be
reloaded for efficiency. As the script is loaded in its own environment
(maintained by `import`) dependencies are kept (except those exposed through
attachment), as the following small example shows.
**Contents of "[some_module.R](https://raw.githubusercontent.com/rticulate/import/master/man/examples/some_module.R)":**
```{r, eval=FALSE}
## Do not use library() inside a module. This results in a warning,
## and functions relying on ggplot2 will not work.
#library(ggplot2)
## This is also not recommended, because it is not clear wether recursively
## imported functions should be available after the module is imported
#import::here(qplot, .from = ggplot2)
## This is the recommended way to recursively import functions on which
## module functions depend. The qplot function will be available to
## module functions, but will not itself be available after import
import::here(qplot, .from = ggplot2)
## Note this operator overload is not something you want to `source`!
`+` <- function(e1, e2)
paste(e1, e2)
## Some function relying on the above overload:
a <- function(s1, s2)
s1 + rep(s2, 3)
## Another value.
b <- head(iris, 10)
## A value created using a recursively imported function
p <- qplot(Sepal.Length, Sepal.Width, data = iris, color = Species)
## A function relying on a function exposed through attachment:
plot_it <- function()
qplot(Sepal.Length, Sepal.Width, data = iris, color = Species)
```
**Usage:**
```{r, eval=FALSE}
import::from(some_module.R, a, b, p, plot_it)
## Works:
a("cool", "import")
## The `+` is not affecting anything here, so this won't work:
# "cool" + "import"
# Works:
b
p
plot_it()
```
Suppose that you have some related functionality that you wish to bundle, and
that authoring a full package seems excessive or inappropriate
for the specific task, for example bundling related user interface components for a `shiny`
application. One option with `import` is to author a module (script), say as outlined
below:
```{r, eval=FALSE}
# File: foo.R
# Desc: Functionality related to foos.
# Imports from other_resources.R
# When recursively importing from another module or package for use by
# your module functions, you should always use import::here() rather
# than import::from() or library()
import::here(fun_a, fun_b, .from = "other_resources.R")
internal_fun <- function(...) ...
fun_c <- function(...)
{
...
a <- fun_a(...)
i <- internal_fun(...)
...
}
fun_d <- function(...) ...
```
Then in another file we need `fun_c`:
```{r, eval = FALSE}
# File: bar.R
# Desc: Functionality related to bars.
# Imports from foo.R
import::here(fun_c, .from = "foo.R")
...
```
In the above, *only* `fun_c` is visible inside `bar.R`. The
functions on which it depends exist, but are not exposed.
Also, note that imported scripts may themselves import.
Since the desired effect of `import::from` inside a module script
is ambiguous, this results in a warning (but the functions will
still be imported into the local environment of the script, just
as with `import::here` which only imports are only exposed to
the module itself.
When importing from a module, it is sourced into an environment
managed by `import`, and will not be sourced again upon subsequent
imports (unless the file has changed). For example, in a `shiny`
application, importing some
objects in `server.R` and others in `ui.R` from the same module will not
cause it to be sourced twice.
### Choosing where import looks for packages or modules
The `import` package will by default use the current set of library paths, i.e.
the result of `.libPaths()`. It is, however, possible to specify a different set
of library paths using the `.library` argument in any of the `import` functions,
for example to import packages installed in a custom location, or to remove any
ambiguity as to where imports come from.
Note that in versions up to and including `1.3.0` this defaulted to use only the
*first* entry in the library paths, i.e. `.library=.libPaths()[1L]`. We believe
the new default is applicable in a broader set of circumstances, but if this
change causes any issues, we would very much appreciate hearing about it.
When importing from a module (.R file), the directory where `import` looks for
the module script can be specified with the with `.directory` parameter.
The default is `.` (the current working directory).
### Choosing where the imported functions are placed
One can also specify which names to use in the search path and use several to
group imports. Names can be specified either as character literals or as
variables of type `character` (for example if the environment needs to be
determined dynamically).
```{r, eval = FALSE}
import::from(magrittr, "%>%", "%$%", .into = "operators")
import::from(dplyr, arrange, .into = "datatools")
import::from(psych, describe, .into=month.name[1]) # Uses env: "January"
```
The `import::into` and `import::from` accept the same parameters and achieve the
same result. The the choice between them a matter of preference). If using
custom search path entities actively, one might prefer the alternative syntax
(which does the same but reverses the argument order):
```{r, eval = FALSE}
import::into("operators", "%>%", "%$%", .from = magrittr)
import::into("datatools", arrange, .from = dplyr)
import::into(month.name[1], describe, .from=psych)
```
Be aware that beginning in version `1.3.0` hidden objects (those with names
prefixed by a period) are supported. Take care to avoid name clashes with
argument names.
If it is desired to import objects directly into the current environment,
this can be accomplished by `import::here`. This is particularly
useful when importing inside a function definition, or module scripts as
described [here](#module).
```{r, eval = FALSE}
import::here("%>%", "%$%", .from = magrittr)
import::here(arrange, .from = dplyr)
```
Instead of specifying a named environment on the search path, by passing a
`character` to the `.into` parameter, it is possible to directly specify an
environment. The function automatically determines which use case is involved,
based on the `mode()` of the `.into` parameter (either `character` or
`environment`).
Prior to version `1.3.0`, non-standard evaluation (NSE) was applied to the
`.into` parameter, and it was necessary to surround it with `{}`, in order for
it to be treated as an `environment`. This is no longer needed, although it is
still allowed (curly brackets are simply ignored).
Examples include:
```{r, eval = FALSE}
# Import into the local environment
import::into(environment(), "%>%", .from = magrittr)
# Import into the global environment, curlies are optional
import::into({.GlobalEnv}, "%>%", "%$%", .from = magrittr)
# Import into a new environment, mainly useful for python-style imports
# (see below)
x = import::into(new.env(), "%<>%", .from = magrittr)
```
# Advanced usage
## Advanced usage and the .character_only parameter
The `import` package uses non-standard evaluation (NSE) on the `.from` and `...`
parameters, allowing the names of packages and functions to be listed without
quoting them. This makes some common use-cases very straightforward, but can get
in the way of more programmatic usages.
This is where the `.character_only` parameter comes in handy. By setting
`.character_only=TRUE`, the non-standard evaluation of the `.from` and the `...`
parameters is disabled. Instead, the parameters are processed as character
vectors containing the relevant values.
(Previously, NSE was also applied to the `.into` parameter, but as of version
`1.3.0` this is no longer the case, and all parameters except `.from` and `...`
are always evaluated in a standard way.)
It is useful to examine some examples of how specifying `.character_only=TRUE`
can be helpful.
## Programmatic selection of objects to import
It is not always know in advance which objects to import from a given
package. For example, assume we have a list of objects from the
`broom` package that we need to import, we can do it as follows:
```{r, eval = FALSE}
objects <- c("tidy", "glance", "augment")
import::from("broom", objects, .character_only=TRUE)
```
This will import the three functions specified in the `objects` vector.
It is worth noting that because `.character_only` disables non-standard
evaluation on *all* parameters, the name of the package must now be quoted.
One common use case is when one wants to import all objects except one
or a few, because of conflicts with other packages. Should one, for
example, want to use the `stats` versions of the `filter()` and `lag()`
functions, but import all the other functions in the `dplyr` package,
one could do it like this:
```{r, eval = FALSE}
objects <- setdiff(getNamespaceExports("dplyr"), c("filter","lag"))
import::from("dplyr", objects, .character_only=TRUE)
```
## Programmatic selection of module location
The same approach can be used when the directory of the source
file for a module is not known in advance. This can be useful
when the original source file is not always run with the original
working directory, but one still does not want to specify a
hard-coded absolute path, but to determine it at run time:
```{r, eval = FALSE}
mymodule <- file.path(mypath, "module.R")
import::from(mymodule, "myfunction", .character_only=TRUE)
```
Again, note that now the name of the function must be quoted because
non-standard evaluation is disabled on all parameters.
The `here` package is useful in many circumstances like this; it
allows the setting of a "root" directory for a project and by using
the `here::here()` function to figure out the correct directory,
regardless of the working directory.
```{r, eval = FALSE}
import::from(here::here("src/utils/module.R")), "myfunction", .character_only=TRUE)
```
Alternatively, if the file name is always the same and it is only the directory
that differs, you could use the `.directory` parameter, which always expects standard
evaluation arguments.
```{r, eval = FALSE}
import::from(module.R, "myfunction", here::here("src/utils"))
```
Note that `here::here()` has no relation to `import::here()` despite
the similarity in names.
## Importing from a URL
Another case where `.character_only` comes in handy is when one
wants to import some functions from a URL. While `import` does
not allow direct importing from a URL (because of difficult
questions about when a URL target has changes, whether to
download a file and other things), it easy to achieve the desired
result by using the `pins` package (whose main purpose is to
resolve such difficult questions). A simple example follows, which
directly imports the `myfunc()` function, which is defined in the
[`plusone_module.R`](https://raw.githubusercontent.com/rticulate/import/master/man/examples/plusone_module.R):
```{r, eval = FALSE}
url <- "https://raw.githubusercontent.com/rticulate/import/master/man/examples/plusone_module.R"
import::from(pins::pin(url), "myfunc", .character_only=TRUE)
myfunc(3)
#> [1] 4
```
## Python-like imports
A frequent pattern in python imports packages under an alias; all subsequent use of the imported objects then explicitly includes the alias:
```python
import pandas as pd
import numpy as np
import math as m
print(m.pi)
print(m.e)
```
In order to achieve this functionality with the import package, use `.into={new.env()}` which assign to a new environment without attaching it. `import::from()` returns this environment, so it can be assigned to a variable:
```{r, eval = FALSE}
# Import into a new namespace, use $ to access
td <- import::from(tidyr, spread, pivot_wider, .into={new.env()})
dp <- import::from(dplyr, .all=TRUE, .into={new.env()})
dp$select(head(cars),dist)
#> dist
#> 1 2
#> 2 10
#> 3 4
#> 4 22
#> 5 16
#> 6 10
# Note that functions are not visible without dp$ prefix
select(head(cars),dist)
#> Error in select(head(cars), dist): could not find function "select"
```
## Importing S3 methods
![](lifecycle-experimental.svg)
S3 methods work well in local context, but when the method is called from a different environment, it must be registered (in packages, this is done in the `NAMESPACE` file). `import` can now register methods of form `generic.class` or `generic.class.name` automatically using the new `.S3` argument. By specifying `.S3=TRUE`, `import` will automatically detect methods for existing or new generics. No need to export and/or register them manually!
Consider following script `foo.r` with a generic and two methods:
```{r, eval = FALSE}
# foo.r
# functions with great foonctionality
foo = function(x){
UseMethod("foo", x)
}
foo.numeric <- function(x){
x + 1
}
foo.character <- function(x){
paste0("_", x, "_")
}
```
Now, all we need is to import the `foo` generic:
```{r, eval = FALSE}
import::from("foo.r", foo, .S3=TRUE)
foo(0) # 1
foo("bar") # _bar_
```
*This is an experimental feature. We think it should work well and you are
encouraged to use it and report back – but the syntax and semantics may change
in the future to improve the feature.*