-
Notifications
You must be signed in to change notification settings - Fork 6
/
local-irods.Rmd
215 lines (157 loc) · 8.25 KB
/
local-irods.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
---
title: "Accessing data locally and in iRODS"
author: "Mariana Montes"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Accessing data locally and in iRODS}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
library(rirods)
library(httptest2)
start_vignette("local-irods")
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval=FALSE
)
root_dir <- tempdir()
knitr::opts_knit$set(
root.dir = root_dir
)
```
```{r setup2, include=FALSE}
# set user config directory to temporary location
withr::local_envvar(
R_USER_CONFIG_DIR = root_dir
)
eval(substitute(create_irods(x, overwrite = TRUE), list(x = rirods:::.irods_host)))
iauth("rods", "rods")
if (length(ils())> 0) {
for (file in ils()$logical_path) {
irm(file, recursive = TRUE, force=TRUE)
}
}
dir.create(file.path(root_dir, "data"))
imkdir("data")
```
If you are not familiar with iRODS, understanding how to access and manipulate data with it may be less than intuitive. In this vignette, we'll go through the main functions for setting and changing the working directory and for creating, saving, reading and removing data, comparing R functions for manipulation of local files and the `{rirods}` counterparts.
The main point to understand is that the iRODS server is not simply another location that you can access by editing a path. While you can use `file.remove()` to remove any file in your computer, there is no path you can provide that will remove a data object in iRODS. Instead, you need to use `irm()`, which connects to the iRODS server to apply the same action. This is the sort of comparison we will see in this vignette.
A second point to keep in mind is that, normally, you need to stage and unstage your data in order to manipulate it, rather than modifying your iRODS data directly. This is always the case with other clients, such as iCommands: if you want to read a dataframe you have in iRODS, you first need to copy it to your local computer and then open _that_ file; if you want to save a modified version of that file you have to copy the local (modified) version back to iRODS. `{rirods}` offers one exception to this by allowing to save R objects in RDS format (only) directly into iRODS and read them back, with `isaveRDS()` and `ireadRDS()` respectively.
Finally, most of the functions in `{rirods}` are inspired by iCommands, which are themselves modelled after Unix commands and prefixed by an `i`. So, for example, the Unix command to **c**hange a **d**irectory is `cd`, its iCommands counterpart is `icd`, and then the `{rirods}` equivalent is `icd()`.
## Set and change working directory
In R we can check the working directory with `getwd()` and change it with `setwd(dir)`, where `dir` is the path we want to set as the new working directory. Both functions return the current working directory; before the change and invisibly in the case of `setwd()`.
The `{rirods}` counterparts are `ipwd()` ("print working directory") and `icd(dir)` ("change directory") respectively.
For the purposes of this vignette, we'll use a temporary directory locally. This is the current output of `getwd()` and `ipwd()` respectively:
```{r}
getwd()
ipwd()
```
We can see their contents with `dir(path)` or `list.files(path)` and `ils(path)` respectively. If `path` is not provided, the current working directory is used as default:
```{r}
dir()
ils()
```
We can focus on the "data" local directory with `setwd("data")`^[Which is [not recommended](https://github.com/jennybc/here_here) in any case.] and on the "data" iRODS collection with `icd("data")`. Then the output of `getwd()` and `ipwd()`, respectively, are updated, and `dir()` and `ils()` will show the contents of "data" by default.
```{r, include=FALSE}
change_state()
```
```{r}
old_local <- setwd("data")
dir()
old_irods <- icd("data")
ils()
```
We can reset our working directories by providing the old path to `setwd()` and `icd()` respectively. Note that moving upwards in the file system is also possible by providing "../" for each level up you want to go: `icd("../")` changes the iRODS working directory to its parent collection.
```{r, include=FALSE}
change_state()
```
```{r}
setwd(old_local)
getwd()
icd(old_irods)
ipwd()
```
## Create directories
Directories can be created in R with `dir.create(path)`; collections can be created in iRODS with `imkdir(path)` ("**m**a**k**e **dir**ectory"), providing a path relative to the working directory. For example, the code below creates an "analysis" directory under our working directory, first locally and then in iRODS.
```{r, include=FALSE}
change_state()
```
```{r}
dir.create("analysis")
dir()
imkdir("analysis")
ils()
```
## Save data
R and several R packages (such as `{readr}`) provide a number of functions to save data locally. For example, `writeLines(some_vector, path)` can be used to write a vector into a text file with one item per line; `write.csv(dataframe, path)` can be used to write a dataframe as a comma-separated file; `saveRDS(R_object, path)` can be used to write any R object into an RDS file. This path can be relative to the working directory or absolute paths.
For example, let's simulate some data and store it in our "data" directory with `write.csv()`.
```{r}
set.seed(1234)
fake_data <- data.frame(x = rnorm(20, mean = 1))
fake_data$y <- fake_data$x * 2 + 3 - rnorm(20, sd = 0.6)
write.csv(fake_data, file.path("data", "data.csv"), row.names = FALSE)
dir("data")
```
When saving data in iRODS, we don't have these kinds of options. Instead, we can either transfer a file of any type from our local system to iRODS with `iput(local_path, irods_path)` or save an R object as an RDS file with `isaveRDS(some_object, irods_path)`. In the case of our simulated data, we use the first option:
```{r, include=FALSE}
change_state()
```
```{r}
iput("data/data.csv", "data/data_from_local.csv")
ils("data")
```
Note that the file name need not stay the same in the local and iRODS systems.
Now, let's say that we have processed our data with some linear regression modelling.
```{r}
m <- lm(y ~ x, data = fake_data)
m
```
We could certainly store the output locally, but we could also decide to only store it in iRODS if we save it in RDS format. So let's save it in the "analysis" collection.
```{r, include=FALSE}
change_state()
```
```{r}
isaveRDS(m, "analysis/linear_model.rds")
ils("analysis")
dir("analysis") # nothing was saved locally
```
## Read data
Just like we have many different R functions to save files to different formats, there are specific functions to read files in different formats. And just like with `{rirods}` we either save in RDS format or transfer files from a local system to iRODS, we either read RDS files or transfer files back from iRODS to the local system. If we want to read "data_from_local.csv", we first need to retrieve it with `iget(irods_path, local_path)` and then open it with an appropriate R function.
```{r}
iget("data/data_from_local.csv", "data/data_from_irods.csv")
dir("data")
read.csv("data/data_from_irods.csv") # same as fake_data
```
For the RDS files, we could also use `iget()` if we wanted to store them locally, or simply `ireadRDS(irods_path)` to read the file directly.
```{r}
# copy locally first
iget("analysis/linear_model.rds", "analysis/linear_model_in_local.rds")
dir("analysis")
readRDS("analysis/linear_model_in_local.rds")
# or read directly from iRODS
ireadRDS("analysis/linear_model.rds")
```
## Remove data
Finally, local data can be removed with `unlink(path)` or `file.remove()`, whereas iRODS data can be **r**e**m**oved with `irm(path)`. Both `unlink()` and `irm()` take an optional argument `recursive` that should be `TRUE` if we want to remove a directory/collection and all its contents. In the case of `irm()`, the `force` argument also defines whether the item should be deleted permanently or, if `FALSE`, sent to the "trash" collection.
```{r, include=FALSE}
change_state()
```
```{r}
unlink("analysis", recursive = TRUE)
dir()
irm("data", recursive = TRUE, force = TRUE)
ils()
```
```{r cleanup, include=FALSE}
change_state()
for (file in ils()$logical_path) {
irm(file, recursive = TRUE, force=TRUE)
}
unlink(file.path(root_dir, "data"), recursive=TRUE)
unlink(file.path(root_dir, "analysis"), recursive=TRUE)
end_vignette()
unlink(rirods:::path_to_irods_conf())
```