-
Notifications
You must be signed in to change notification settings - Fork 6
/
dual_inlet.Rmd
328 lines (253 loc) · 13.6 KB
/
dual_inlet.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
---
title: "Dual Inlet Examples"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
html_document:
code_folding: show
df_print: paged
number_sections: yes
toc: yes
toc_depth: 3
toc_float: yes
editor_options:
chunk_output_type: console
vignette: >
%\VignetteIndexEntry{Dual Inlet Examples}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# Introduction
Isoreader supports several dual inlet IRMS data formats. This vignette shows some of the functionality for dual inlet data files. For additional information on operations more generally (caching, combining read files, data export, etc.), please consult the [operations vignette](http://isoreader.isoverse.org/articles/operations.html). For details on downstream data processing and visualization, see the [isoprocessor package](https://isoprocessor.isoverse.org).
```{r, message=FALSE}
# load isoreader package
library(isoreader)
```
# Reading files
Reading dual inlet files is as simple as passing one or multiple file or folder paths to the `iso_read_dual_inlet()` function. If folders are provided, any files that have a recognized continuous flow file extensions within those folders will be processed (e.g. all `.did` and `.caf`). Here we read several files that are bundled with the package as examples (and whose paths can be retrieved using the `iso_get_reader_example()` function).
```{r, message=FALSE}
# all available examples
iso_get_reader_examples() %>% rmarkdown::paged_table()
```
```{r}
# read dual inlet examples
di_files <-
iso_read_dual_inlet(
iso_get_reader_example("dual_inlet_example.did"),
iso_get_reader_example("dual_inlet_example2.did"),
iso_get_reader_example("dual_inlet_example.caf"),
iso_get_reader_example("dual_inlet_nu_example.txt"),
nu_masses = 49:44
)
```
# File summary
The `di_files` variable now contains a set of isoreader objects, one for each file. Take a look at what information was retrieved from the files using the `iso_get_data_summary()` function.
```{r}
di_files %>% iso_get_data_summary() %>% rmarkdown::paged_table()
```
## Problems
In case there was any trouble with reading any of the files, the following functions provide an overview summary as well as details of all errors and warnings, respectively. The examples here contain no errors but if you run into any unexpected file read problems, please file a bug report in the [isoreader issue tracker](https://github.com/isoverse/isoreader/issues).
```{r}
di_files %>% iso_get_problems_summary() %>% rmarkdown::paged_table()
di_files %>% iso_get_problems() %>% rmarkdown::paged_table()
```
# File Information
Detailed file information can be aggregated for all isofiles using the `iso_get_file_info()` function which supports the full [select syntax](https://dplyr.tidyverse.org/reference/select.html) of the [dplyr](https://dplyr.tidyverse.org/) package to specify which columns are of interest (by default, all file information is retrieved). Additionally, file information from different file formats can be renamed to the same column name for easy of downstream processing. The following provides a few examples for how this can be used (the names of the interesting info columns may vary between different file formats):
```{r}
# all file information
di_files %>% iso_get_file_info(select = c(-file_root)) %>% rmarkdown::paged_table()
# select file information
di_files %>%
iso_get_file_info(
select = c(
# rename sample id columns from the different file types to a new ID column
ID = `Identifier 1`, ID = `Sample Name`,
# select columns without renaming
Analysis, Method, `Peak Center`,
# select the time stamp and rename it to `Date & Time`
`Date & Time` = file_datetime,
# rename weight columns from the different file types
`Sample Weight`, `Sample Weight` = `Weight [mg]`
),
# explicitly allow for file specific rename (for the new ID column)
file_specific = TRUE
) %>% rmarkdown::paged_table()
```
## Select/Rename
Rather than retrieving specific file info columns using the above example of `iso_get_file_info(select = ...)`, these information can also be modified across an entire collection of isofiles using the `iso_select_file_info()` and `iso_rename_file_info()` functions. For example, the above example could be similarly achieved with the following use of `iso_select_file_info()`:
```{r}
# select + rename specific file info columns
di_files2 <- di_files %>%
iso_select_file_info(
ID = `Identifier 1`, ID = `Sample Name`, Analysis, Method,
`Peak Center`, `Date & Time` = file_datetime,
`Sample Weight`, `Sample Weight` = `Weight [mg]`,
file_specific = TRUE
)
# fetch all file info
di_files2 %>% iso_get_file_info() %>% rmarkdown::paged_table()
```
## Filter
Any collection of isofiles can also be filtered based on the available file information using the function `iso_filter_files`. This function can operate on any column available in the file information and supports full [dplyr](https://dplyr.tidyverse.org/reference/filter.html) syntax.
```{r}
# find files that have 'CIT' in the new ID field
di_files2 %>% iso_filter_files(grepl("CIT", ID)) %>%
iso_get_file_info() %>%
rmarkdown::paged_table()
# find files that were run in 2017
di_files2 %>%
iso_filter_files(`Date & Time` > "2017-01-01" & `Date & Time` < "2018-01-01") %>%
iso_get_file_info() %>%
rmarkdown::paged_table()
```
## Mutate
The file information in any collection of isofiles can also be mutated using the function `iso_mutate_file_info`. This function can introduce new columns and operate on any existing columns available in the file information (even if it does not exist in all files) and supports full [dplyr](https://dplyr.tidyverse.org/reference/mutate.html) syntax.
```{r}
di_files3 <- di_files2 %>%
iso_mutate_file_info(
# update existing column
ID = paste("ID:", ID),
# introduce new column
`Run in 2017?` = `Date & Time` > "2017-01-01" & `Date & Time` < "2018-01-01"
)
di_files3 %>%
iso_get_file_info() %>%
rmarkdown::paged_table()
```
## Add
Additionally, a wide range of new file information can be added in the form of a data frame with any number of columns (usually read from a comma-separated-value/csv file or an Excel/xlsx file) using the function `iso_add_file_info` and specifying which existing file information should be used to merge in the new information. It is similar to [dplyr's left_join](https://dplyr.tidyverse.org/reference/join.html) but with additional safety checks and the possibility to join the new information sequentially as illustrated below.
```{r}
# this kind of information data frame is frequently read in from a csv or xlsx file
new_info <-
dplyr::bind_rows(
# new information based on new vs. old samples
dplyr::tribble(
~Analysis, ~`Run in 2017?`, ~process, ~info,
NA, TRUE, "yes", "2017 runs",
NA, FALSE, "yes", "other runs"
),
# new information for a single specific file
dplyr::tribble(
~Analysis, ~process, ~note,
"16068", "no", "did not inject properly"
)
)
new_info %>% rmarkdown::paged_table()
# adding it to the isofiles
di_files3 %>%
iso_add_file_info(new_info, by1 = "Run in 2017?", by2 = "Analysis") %>%
iso_get_file_info(select = names(new_info)) %>%
rmarkdown::paged_table()
```
## Parse
Most file information is initially read as text to avoid cumbersome specifications during the read process and compatibility issues between different IRMS file formats. However, many file info columns are not easily processed as text. The isoreader package therefore provides several parsing and data extraction functions to facilitate processing the text-based data (some via functionality implemented by the [readr](http://readr.tidyverse.org) package). See code block below for examples. For a complete overview, see the `?extract_data` and `?iso_parse_file_info` documentation.
```{r}
# use parsing and extraction in iso_mutate_file_info
di_files2 %>%
iso_mutate_file_info(
# change type of Peak Center to logical
`Peak Center` = parse_logical(`Peak Center`),
# retrieve first word of Method column
Method_1st = extract_word(Method),
# retrieve second word of Method column
Method_2nd = extract_word(Method, 2),
# retrieve file extension from the file_id using regular expression
extension = extract_substring(file_id, "\\.(\\w+)$", capture_bracket = 1)
) %>%
iso_get_file_info(select = c(extension, `Peak Center`, matches("Method"))) %>%
rmarkdown::paged_table()
# use parsing in iso_filter_file_info
di_files2 %>%
iso_filter_files(parse_integer(Analysis) > 1500) %>%
iso_get_file_info() %>%
rmarkdown::paged_table()
# use iso_parse_file_info for simplified parsing of column data types
di_files2 %>%
iso_parse_file_info(
integer = Analysis,
number = `Sample Weight`,
logical = `Peak Center`
) %>%
iso_get_file_info() %>%
rmarkdown::paged_table()
```
# Resistors
Additionally, some IRMS data files contain resistor information that are useful for downstream calculations (see e.g. section on signal conversion later in this vignette):
```{r}
di_files %>% iso_get_resistors() %>% rmarkdown::paged_table()
```
# Reference values
As well as isotopic reference values for the different gases:
```{r}
# reference delta values without ratio values
di_files %>% iso_get_standards(file_id:reference) %>% rmarkdown::paged_table()
# reference values with ratios
di_files %>% iso_get_standards() %>% rmarkdown::paged_table()
```
# Raw Data
The raw data read from the IRMS files can be retrieved similarly using the `iso_get_raw_data()` function. Most data aggregation functions also allow for inclusion of file information using the `include_file_info` parameter, which functions identically to the `select` parameter of the `iso_get_file_info` function discussed earlier.
```{r}
# get raw data with default selections (all raw data, no additional file info)
di_files %>% iso_get_raw_data() %>% head(n=10) %>% rmarkdown::paged_table()
# get specific raw data and add some file information
di_files %>%
iso_get_raw_data(
# select just time and the two ions
select = c(type, cycle, v28.mV, v29.mV),
# include the Analysis number fron the file info and rename it to 'run'
include_file_info = c(run = Analysis)
) %>%
# look at first few records only
head(n=10) %>% rmarkdown::paged_table()
```
# Data Processing
The isoreader package is intended to make raw stable isotope data easily accessible. However, as with most analytical data, there is significant downstream processing required to turn these raw signal intensities into properly referenced isotopic measurement. This and similar functionality as well as data visualization is part of the [isoprocessor package](https://isoprocessor.isoverse.org) which takes isotopic data through the various corrections in a transparent, efficient and reproducible manner.
That said, most vendor software also performs some of these calculations and it can be useful to be able to compare new data reduction procecures against those implemented in the vendor software. For this purpose, isoreader retrieves vendor computed data tables whenver possible, as illustrated below.
## Vendor Data Table
As with most data retrieval funtions, the `iso_get_vendor_data_table()` function also allows specific column selection (by default, all columns are selected) and easy addition of file information via the `include_file_info` parameter (by default, none is included).
```{r}
# entire vendor data table
di_files %>% iso_get_vendor_data_table() %>% rmarkdown::paged_table()
# get specific parts and add some file information
di_files %>%
iso_get_vendor_data_table(
# select cycle and all carbon columns
select = c(cycle, matches("C")),
# include the Identifier 1 fron the file info and rename it to 'id'
include_file_info = c(id = `Identifier 1`)
) %>% rmarkdown::paged_table()
```
# For expert users: retrieving all data
For users familiar with the nested data frames from the [tidyverse](https://www.tidyverse.org/) (particularly [tidyr](https://tidyr.tidyverse.org/)'s `nest` and `unnest`), there is an easy way to retrieve all data from the iso file objects in a single nested data frame:
```{r}
all_data <- di_files %>% iso_get_all_data()
# not printed out because this data frame is very big
```
# Saving collections
Saving entire collections of isofiles for retrieval at a later point is easily done using the `iso_save` function which stores collections or individual isoreader file objects in the efficient R data storage format `.rds` (if not specified, the extension `.di.rds` will be automatically appended). These saved collections can be convientiently read back using the same `iso_read_dual_inlet` command used for raw data files.
```{r}
# export to R data archive
di_files %>% iso_save("di_files_export.di.rds")
# read back the exported R data storage
iso_read_dual_inlet("di_files_export.di.rds")
```
# Data Export
At the moment, isoreader supports export of all data to Excel and the [Feather file format](https://blog.rstudio.com/2016/03/29/feather/) (a Python/R cross-over format). Note that both export methods have similar syntax and append the appropriate file extension for each type of export file (`.di.xlsx` and `.di.feather`, respectively).
```{r}
# export to excel
di_files %>% iso_export_to_excel("di_files_export")
# data sheets available in the exported data file:
readxl::excel_sheets("di_files_export.di.xlsx")
```
```{r}
# export to feather
di_files %>% iso_export_to_feather("di_files_export")
# exported feather files
list.files(pattern = ".di.feather")
```