Skip to content

Targets for Notebooks #469

nviets started this conversation in Ideas
Targets for Notebooks #469
May 7, 2021 · 4 comments · 47 replies

Are there any plans for recommended strategies to integrate targets with Rmd Notebooks? Python has dataflow, and Julia has Pluto. Targets seems like a natural starting point for similar functionality in Rmd.

You must be logged in to vote

Replies

4 comments
·
47 replies

Through tarchetypes, targets seamlessly integrates with R Markdown, including parameterized reports. This setup allows you to run reports as part of a pipeline, knit them interactively, and easily go back and forth between these two options. In this scenario, a good R Markdown report is 99% prose and 1% R code, taking advantage of targets you already computed upstream in the pipeline. Details:

You must be logged in to vote
20 replies
@nviets

Thanks for the reply! We love targets' integration with Markdown in the sense of generating parametrized reports from the targets cache. In the above, I was thinking of targets more as a backend to power an interactive document experience in the sense of Pluto and dataflow. A lot of users program interactively Rmd notebooks, and there isn't really a solution today in Rmd similar to Pluto and Dataflow. If we could turn rmd chunks into the targets and have chunks activate a tar_make command, we'd get all the benefits of targets paired with the interactive experience of notebooks.

@wlandau

If we could turn rmd chunks into the targets...

I usually push back against R Markdown-driven development because literate programming is not designed for computationally intense work. That said, at the interface level, it is possible to represent a pipeline as an R Markdown file.


---
title: "Example report"
output: tarchetypes::tar_render_pipeline
---

```{r, memory = "transient"}
library(tidyverse)
get_data <- function() {
  # ...
}
run_analysis <- function(data) {
  # ...
}
```

Prose for the first target:

```{r analysis_target, format = "qs"}
data <- get_data()
run_analysis(data)
```

Prose for the second target:

```{r summary_target, format = "parquet"}
analysis_target %>%
  group_by(model, method) %>%
  summarize(accuracy = hits / total) %>%
  ungroup()
```

A preprocessor, maybe a new function in tarchetypes, could convert this R Markdown file into a _targets.R file.

# _targets.R
targets::tar_option_set(memory = "transient") # from the options of the unnamed chunks

# Unnamed chunks:

library(targets)
library(tidyverse)

get_data <- function() {
  # ...
}
run_analysis <- function(data) {
  # ...
}

# Named chunks are targets:
list(
  tar_target(
    analysis_target, {
      data <- get_data()
      run_analysis(data)
    },
    format = "qs" # from the chunk options
  ),
  tar_target(
    analysis_target, {
      analysis_target %>%
        group_by(model, method) %>%
        summarize(accuracy = hits / total) %>%
        ungroup()
    },
    format = "parquet" # from the chunk options
  )
)

Then all the functions in targets would be available, from tar_make() to tar_visnetwork(). And maybe it might be possible to configure the knitr engine to run the preprocessor and then tar_make().

Is that pretty much what you had in mind?

@wlandau

Edit: IIRC, the output field of the YAML front matter lets us take control of how knitr runs the report. I added a placeholder above for output: tarchetypes::tar_render_pipeline.

A few more thoughts on this R Markdown interface idea:

Rendered reports

What should the actual rendered report do? Since most code chunks will be individual targets, one obvious choice is to print all the targets just below their respective code chunks. This would flow nicely an naturally from a literate programming perspective, but we should limit the targets printed in case the data is too large. We could either disable reading and printing by default, or we could only print small non-dynamic targets. I like the latter.

How to implement the rendered report

I need to read up on custom language engines and custom chunk engines.

Naively, we could create a temporary copy of the report that runs tar_read_raw() in each target chunk instead of the command.

Code chunk behavior

When run interactively in notebook mode, a chunk should run with no side effects (maybe inside local()) assign the return value to an object with the same name as the chunk, and print the object to the screen. These guardrails would enforce the purity and immutability requirements of targets and counteract the dangerous R-Markdown-oriented looseness that folks would not otherwise be wary of.

Other files

Besides the output HTML, we know we need to generate a _targets.R file. Beyond that, there is an opportunity to enforce some degree of modularity. But I hesitate to do so because there is no one-size-fits-all way to organize files, the report does not provide enough structure anyway, and the unpredictable file names might disrupt projects. At the risk of creating a monolithic dumping ground for messy code (a vice which literate programming enables with impunity) I think it would be okay to go with a single _targets.R file, at least to start.

You must be logged in to vote
3 replies
@wlandau

Guardrails

  1. The generated _targets.R file should be treated as read-only in the RStudio IDE, e.g.
    "# Generated by targets::tar_renv(): do not edit by hand",
    .
  2. We should have some way to nudge users away from creating multiple pipeline-generating notebooks in the same working directory.
@nviets

Thanks @wlandau ! For generating _targets.R, it might be worth looking into knitr's purl, which allows the code from chunks to selectively be written out to an external file.

@wlandau

Yeah, on reflection this feature may end up looking something like a glorified purl, depending on what turns out to be possible through output and language engines.

Some chunk options (e.g. pattern) should be taken as language objects and not evaluated as R code. I can make that happen in non-interactive mode using the eval.after knit option and knitr::engine_output(). But if I run chunks in the RStudio IDE, options like pattern are executed like ordinary R code. Any ideas on how to suppress this? Filed an issue at rstudio/rstudio#9407.

Screen Shot 2021-05-25 at 10 32 21 AM

You must be logged in to vote
21 replies
@cderv

How does pattern is processed ?

I believe chunk options in knitr are supposed to be evaluated. Using eval.after is kind of a trick as this is made just to delay evaluation (e.g fig caption using an object defined in chunk) - not really cancelled it by suppressing the option in code processing. Chunk option are just R code so, option that should evaluate to language object should be language object in R also, shouldn't it ?

Trying to understand better how this should work at low level.

# error like in chunk
option1 <- map(data)
#> Error in map(data): impossible de trouver la fonction "map"
# object
option2 <- expression(map(data))
option2
#> expression(map(data))
is.language(option2)
#> [1] TRUE
# rlang ? 
option3 <- rlang::expr(map(data))
option3
#> map(data)
is.language(option3)
#> [1] TRUE

Created on 2021-05-25 by the reprex package (v2.0.0)

what you're looking feels like a NSE chunk option ? Something like that ?

Idea you may have tried and maybe not helpful : Does options hook could help you process the option the way you want ?

I'll look closer into the engine you wrote to see how this is processed.

By the way, what you encounter with the IDE is an IDE thing: eval.after is not working as you would expect with knitr. I believe chunk option are evaluated as code by interactive chunk mechanism as you saw in the code.

@wlandau

The pattern argument drives dynamic branching: https://books.ropensci.org/targets/dynamic.html. It is a DSL that allows users to define new targets while the pipeline is running. We do not know in advance which targets or how many targets will be created because that all depends on upstream targets. In the following example, y is a dynamic target that depends on target x.

# _targets.R file
library(targets)
list(
  tar_target(x, seq_len(round(runif(10)))),
  tar_target(y, 100 * x, pattern = map(x))
)

We cannot predict precisely how many y branches we will have, but we do know that we will end up with one branch of y for every vector element of x.

# R console
tar_make()
#> • start target x
#> • built target x
#> • start branch y_29239c8a
#> • built branch y_29239c8a
#> • start branch y_7cc32924
#> • built branch y_7cc32924
#> • start branch y_bd602d50
#> • built branch y_bd602d50
#> • built pattern y
#> • end pipeline

When a target is defined, pattern = map(x) is quoted and turned into an expression object.

target_object <- tar_target(y, 100 * x, pattern = map(x))
target_object$settings$pattern
#> expression(map(x))

The expression map(x) does not actually run until the pipeline is running and it is time to process target y (only after processing target x). And when it does run, it runs only in a special environment that defines special versions of map(), cross(), etc. that act on special representations of upstream dependency targets (each dependency target is represented as a one-column data frame of branch/bud names).

So the pattern argument is a DSL that needs special delayed processing.

For Target Markdown code chunks, I thought I had a workaround for interactive mode:

  1. Set pattern in eval.after.
  2. Take control of how output$code is evaluated, so evaluation behaves like a target.

envir_knitr <- knitr::knit_global()
envir <- new.env(parent = envir_knitr)
expr <- parse(text = options$code)
tidy_eval <- options$tidy_eval %|||% TRUE
expr <- tar_tidy_eval(expr = expr, envir = envir, tidy_eval = tidy_eval)
value <- eval(expr, envir = envir)
assign(x = name, value = value, envir = envir_knitr)

  1. Set eval = FALSE for further processing so the eval.after chunk options like pattern never get evaluated.

options$eval <- FALSE

  1. Use knitr::engine_output() to wrap up:

knitr::engine_output(options = options, code = options$code, out = out)

But as you know from rstudio/rstudio#9407, IDEs may ignore eval.after and incorrectly evaluate pattern.

Maybe this snag is minor: not all targets use dynamic branching, and non-interactive mode is totally unaffected. Still, other target factories may have quoted arguments, and this same issue could present other problems. With pattern and potential arguments like it, I have to make a decision:

  1. Go forward as planned, with the understanding that pattern will break, with no ability to throw an informative error message (because the engine breaks). Or,
  2. Encourage users to put language arguments in quotes, e.g. {tar_target y, pattern = "map(x)"}. Or,
  3. Avoid chunk options altogether and choose a different interface for supplying target arguments.

I don't like (3) because chunk options are idiomatic and mostly convenient, despite the bugs and limitations. And I don't like (2) because the pattern argument of the real tar_target() function does not accept character strings, and this little inconsistency could confuse users. At least with (1), users will see an error in the console with interactive mode, and there may yet still be some sort of workaround. If we could at least get ahead of the chunk options processing just to prevent errors, then we could put the engines in control and be home free.

@wlandau

Idea you may have tried and maybe not helpful: Does options hook could help you process the option the way you want?

I tried, but it does not seem to help. I am having trouble getting option hooks to run in interactive mode. In the following report, chunk b dos not call browser() as I would have expected based on chunk a.

---
title: "test"
output: html_document
---

```{r a}
knitr::opts_hooks$set(
  opt = function(options) {
    browser()
  }
)
```

```{r b, opt = "value"}
1
```

I love how this feature turned out! Resources:

My last question is about syntax highlighting. Is there a way to use the same syntax highlighting for the {targets} engine as the {r} engine?

You must be logged in to vote
3 replies
@cderv

In the RStudio IDE, this is a matter of IDE support. I am not sure how they detect the language to highlight in the chunk but I suspect the chunk engine.

Regarding the HTML output for the source code, it is derived from the engine name. So his is a matter of how your engine outputs. I need to look at your code to know specifically how in your case. But basically, you may be able to modify the correct option after you engine has done its work.

Or we need to teach knitr about your engine name 😉

@cderv

So I believe your engine is this one

targets/R/tar_engine.R

Lines 68 to 71 in 82ead80

tar_engine_output <- function(options, out) {
code <- paste(options$code, collapse = "\n")
knitr::engine_output(options = options, code = code, out = out)
}

So you could do

tar_engine_output <- function(options, out) { 
  code <- paste(options$code, collapse = "\n") 
+ # to get the correct markup later in knitr source hook
+ options$engine <- 'r'
  knitr::engine_output(options = options, code = code, out = out) 
} 

This after your engine has done its work, so I don't think changing the engine will have impact on how the code is evaluated. It will just be used for further processing, and which class to apply on fenced code attribute in Markdown output is one of them.

I am a bit lazy as it is Friday night here in France so I did not try :) I let you try ?

Other solution would be for you to set the class.source option to r, through your engine or using an option hook (but it seems the targets chunk option is not to be provided each time)
https://bookdown.org/yihui/rmarkdown-cookbook/chunk-styling.html#chunk-styling

Adding the class is what triggers the Pandoc highlighting (https://pandoc.org/MANUAL.html#fenced-code-blocks)

If this is not working as I thought, we'll surely be able to do something (maybe adding a language options or similar.

@wlandau

Thanks so much, Christophe! options$engine <- "r" is perfect! Works out of the box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Ideas
Labels
None yet
4 participants