Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What should make_with_*() return? #2

Closed
kinto-b opened this issue Apr 23, 2021 · 12 comments
Closed

What should make_with_*() return? #2

kinto-b opened this issue Apr 23, 2021 · 12 comments
Labels
question Further information is requested
Milestone

Comments

@kinto-b
Copy link
Owner

kinto-b commented Apr 23, 2021

The options are:

  • NULL invisibly
  • TRUE/FALSE invisibly depending on whether or not the targets were up-to-date
  • NULL or the result of executing the recipe. Note that this only applies to make_with_recipe, since whether the result(s) of sourcing the source are attached to the global environment depends on the arguments passed through to source()
@kinto-b kinto-b added the question Further information is requested label Apr 23, 2021
@kinto-b
Copy link
Owner Author

kinto-b commented May 20, 2021

I just had a thought: introduce an on_exit argument which stipulates what is returned or an if_uptodate argument which stipulates what is returned if the targets are up to date. Example:

make_with_recipe(
  dependencies = RAW_DATA, 
  targets = FINAL_DATA,
  recipe = {
    raw_dat <- readRDS(RAW_DATA)
    out <- do_stuff(raw_dat)
    saveRDS(out, FINAL_DATA)
    out
  },
  if_uptodate = {
    readRDS(FINAL_DATA)
  }
)

@kinto-b kinto-b added this to the CRAN/0.0.3 milestone Sep 14, 2021
@kinto-b
Copy link
Owner Author

kinto-b commented Sep 14, 2021

@gorcha I can't make my mind up about this. Do you have any advice? Currently I've got:

  • make_with_recipe() invisibly returns the result of evaluating the recipe if the recipe is executed and NULL otherwise
  • make_with_source() invisibly returns NULL always, though by default whatever names are bound in the sourced script will be attached to the global environment.

@gorcha
Copy link

gorcha commented Sep 14, 2021

Yo! It seems to me like this package is mostly about flow control, so I think it makes more sense to be TRUE or FALSE (TRUE if the data is out of date and the step runs).

A couple of associated things:

  • Instead of using the usual source() default you should add an envir argument to make_with_source(), so the signature is the same as make_as_recipe() and they both have the same default evaluation behaviour.
  • Would it be better to run in a fresh environment by default so there are no side effects from a make call and it's not affected by other goings on in the calling context?

@kinto-b
Copy link
Owner Author

kinto-b commented Sep 15, 2021

@gorcha Thanks for your input! Great point about the envir argument. It doesn't make sense for make_with_source() to evaluate in the global by default while make_with_recipe() evaluates in the parent.frame().

I've just had a thought, what if make_*() invisibly returns the evaluation environment? That way, if any evaluation goes on, the names bound will all be available to the user should they need them.

The one nice thing about the current behavior is that you can get a comparison object only when it's needed:

cmp <- make_with_recipe(
  targets = blah,
  dependencies = bleh,
  recipe = {
    dat <- do_stuff()
    srcutils::write_df_rds(dat, blah, 'key')
  }
)

@gorcha
Copy link

gorcha commented Sep 15, 2021

That would be cool, but it potentially keeps a bunch of stuff in memory unnecessarily.
For e.g. the example below, you suddenly have all objects produced from the 4 script runs in memory...

res1 <- make_with_source(...)
res2 <- make_with_source(...)
res3 <- make_with_source(...)
res4 <- make_with_source(...)

Maybe it would be good to have a "return evaluation environment" TRUE/FALSE kind of argument for debugging?

You could possibly just return the final value like a function (would that work with source()?) instead, but if you want to be able to pass specific things back to the calling environment I think it'd be better thinking about a more formal mechanism (maybe having a function that can be called from inside the script?).

This begs the question - would it be better to have a more defined return type, like a make result object that stores result metadata? It'd be good for storing more detailed info about the outcome of the run, like:

  • Was it run (outdated targets or not)?
  • Did it run successfully? (e.g. no errors)
  • The aforementioned additional objects somehow returned from the executing expression/script

You could also attach this to the pipeline somehow so you can show status kind of like targets 😉

@kinto-b
Copy link
Owner Author

kinto-b commented Sep 16, 2021

Well, you've always got the option of not binding the returned object. So, if the default behavior is to run the sourced script in a fresh environment whose parent is parent.frame(), then

make_with_source(...)
make_with_source(...)
make_with_source(...)
make_with_source(...)

would be cleaned up by the garbage collector. Or is there some memory allocation detail I'm forgetting?

Good idea about attaching execution information to the pipeline! I'm going to open up a separate issue for that as I'll do it no matter what.

Implementing a returner is intriguing. By default source() appears to return whatever is evaluated on the final line of the sourced file, which is not ideal.

I suppose the make returner could work like this:

make_return <- function(x) {
    signalCondition(rlang::cnd("makepipe_return_cnd", res = x))
    invisible(x)
}

make_with_source <- function(...) {
  ...
  out <- tryCatch(
    source(...),
    makepipe_return_cnd = function(x) {
        x$res
    }
  )
 ... 
 
  out
}

This would replicate the early exit behaviour of return()

@kinto-b
Copy link
Owner Author

kinto-b commented Sep 16, 2021

Yeah, I reckon make_return() is the best go. It unifies both _recipe() and _source() variants in a neat way

@kinto-b
Copy link
Owner Author

kinto-b commented Sep 16, 2021

@gorcha Thanks for your help!

@gorcha
Copy link

gorcha commented Sep 16, 2021

Hey @kinto-b,

What I meant with the environments keeping stuff in memory is that a pretty natural pattern is to store the results of the make in an object to check whether it ran, and if you return the evaluation environment it's easy to unwittingly hold onto it.

Re: passing things back, I was thinking about passing back multiple objects rather than a single return value, like a make_register() or make_keep() style thing.

E.g. if this is my source script "blah.R"

x <- readRDS("blah.rds")
x %<>% do_things()

x %>% check_me() %>% make_register("x_check")
x %>% check_me_more() %>% make_register("x_check2")

x %>% do_more_things %>% writeRDS("blah_out.rds")

Then I run it, and have access to the objects as part of the result:

res <- make_with_source("blah.R", ...)

res$objects$x_check
res$objects$x_check2

This could be handled via an environment, so make_with_* attaches an environment to the executing environment, which is used by make_register().

make_with_source <- function(...) {
  ...
  out_envir <- new.env(parent = emptyenv())
  assign("__make_register__", out_envir, envir)
  source(...)
  ... 

  out$objects <- out_envir
  out
}

make_register <- function(value, name) {
  # check if __make_register__ exists first and throw a warning if not for non-make contexts
  assign(name, value, `__make_register__`)
  invisible(value)
}

@kinto-b
Copy link
Owner Author

kinto-b commented Sep 16, 2021

@gorcha Yo! I did consider that approach, but it seems to me that a return()-like function would be preferable since:

a) It means the object returned by make_*() is a bit easier to work with in the case when you just want to return one value:

dat <- make_with_source(...)
dat %>% mutate(...) %>% etc()

b) It still allows you to return multiple objects in the usual way:

# This is my source script, blah.R
...
make_return(list(x = x, y = y))

c) It cleanly separates metadata (held in the Pipeline) from data

But what are your thoughts? One advantage I can think of for make_register() is that it makes it more obvious that what is registered isn't necessarily related to what the targets are

@gorcha
Copy link

gorcha commented Sep 16, 2021

I like the register kind of approach because it's more explicit and localised (i.e. you can signal the registration when the object is created rather than having to do it all at the bottom of the script).

And it makes more sense to me to return a result object from the make call that prints nicely so you can look at the other metadata as well (I guess basically the stuff that would be attached to the pipeline). Returning the final line of the file seems eminently reasonable in make_with_recipe() but a bit weird in make_with_source() since there's so much separation between the make call and the actual script.

@kinto-b
Copy link
Owner Author

kinto-b commented Sep 17, 2021

Yeah, that's a good point. I might have a play around with a few options and see what feels most natural. Going to reopen this issue for now.

@kinto-b kinto-b reopened this Sep 17, 2021
kinto-b added a commit that referenced this issue Oct 17, 2021
##General cleanup

* Expanded CI
* Expand tests (closes #12) 
* Rename package (closes #11)

## Execution and return behaviour

* Implement `makepipe_result` class and `make_register()` (closes #2, closes #14)
* Add `envir` argument to `make_with_source()` (closes #16)
* Force `make_*()` evaluation to, by default, occur in a fresh environment which is a child of the execution environment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants