Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for python objects #37

Merged
merged 5 commits into from
Jun 29, 2022
Merged

Add support for python objects #37

merged 5 commits into from
Jun 29, 2022

Conversation

t-kalinowski
Copy link
Contributor

This PR adds an as_iterator() method for a base python object. This enables usage like this with TensorFlow Datasets:

ds <- ...

loop(for(batch in ds) {
  ...
})

Without this, users need to instead explicitly call reticulate::as_iterator(ds), before passing to coro::loop(), which can be confusing because of the name clash between coro::as_iterator(), and also not necessary for most other iterables that are automatically converted to an iterator.

R/iterator.R Outdated
#' @exportS3Method
as_iterator.python.builtin.iterator <- function(x) {
force(x)
function() reticulate::iter_next(x, completed = quote(.__exhausted__.))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be created using the exhaustion constructor. At the minimum it should be sym(".__exhausted__."). The .__exhausted__. symbol should never appear in code, in case the objects of a namespace are iterated over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a good reason I hadn't considered. I generally try to avoid micro-optimizations like this, but introducing to as.symbol() overhead into tight for loops seemed like an easy win:

plot(print(bench::mark(
  quote(.__exhausted__.), 
  as.symbol(".__exhausted__."))))
#> # A tibble: 2 × 13
#>   expression                       min median `itr/sec` mem_alloc `gc/sec` n_itr
#>   <bch:expr>                   <bch:t> <bch:>     <dbl> <bch:byt>    <dbl> <int>
#> 1 quote(.__exhausted__.)             0    1ns    1.29e9        0B        0 10000
#> 2 as.symbol(".__exhausted__.")    82ns  164ns    5.68e6        0B        0 10000
#> # … with 6 more variables: n_gc <dbl>, total_time <bch:tm>, result <list>,
#> #   memory <list>, time <list>, gc <list>
#> Loading required namespace: tidyr
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 2678 rows containing missing values (geom_point).

Created on 2022-06-29 by the reprex package (v2.0.1)

Would you like me to change back to as.symbol(".__exhausted__.")? Or maybe create the symbol once in the closure then use that?

as_iterator.python.builtin.iterator <- function(x) {
  force(x)
  exhausted <- as.symbol(".__exhausted__.")
  function() reticulate::iter_next(x, completed = exhausted)
}

Unfortunately, reticulate::iter_next() forces the completed argument on each call.

Copy link
Member

@lionel- lionel- Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe create the symbol once in the closure then use that?

This should also be avoided for the same reason. I'd probably just use coro::exhausted(). In R, generators and iterators are not expected to be used for tight loops.

R/iterator.R Outdated
}))


#' @exportS3Method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not #' @export?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong reason, happy to change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup I'd prefer @export for coding style consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@@ -136,12 +136,16 @@ as_iterator.default <- function(x) {
}
}

on_load(on_package_load("reticulate", {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it reticulate doesn't plan to add these methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think {coro} is a better home for these methods. reticulate already provides an as_iterator() function which is not a generic and does not adhere to the coro iterator protocol. Exporting a method for a same-named generic in reticulate invites confusion, and also the complexity of an awkward delayed registration dance that can be completely avoided if these methods live here.

@t-kalinowski
Copy link
Contributor Author

I think we don't even need a separate method for python.builtin.iterator, the method for python.builtin.object covers all cases. I'm going to remove it.

@lionel-
Copy link
Member

lionel- commented Jun 29, 2022

Now this is just missing a NEWS bullet :)

@t-kalinowski
Copy link
Contributor Author

Done!

@lionel- lionel- merged commit 8f4eec2 into r-lib:main Jun 29, 2022
@lionel-
Copy link
Member

lionel- commented Jun 29, 2022

Thanks! Do you need that on CRAN? If yes, I'll probably release right away, otherwise this might sit on github for months. Do you have any other planned changes for coro?

@t-kalinowski
Copy link
Contributor Author

t-kalinowski commented Jun 29, 2022

This came out of working on updates to the Tensorflow website, which is still ongoing work. I don’t have any other coro enhancements planned presently, but other things might still shake out in the next two weeks while we’re updating the TF examples. @dfalbel what do you think?

@t-kalinowski
Copy link
Contributor Author

@lionel- If it's ok with you, let's wait ~ 2 weeks. I'll ping you when we need this on CRAN.

@lionel-
Copy link
Member

lionel- commented Jun 29, 2022

That sounds good.

@t-kalinowski
Copy link
Contributor Author

Hi @lionel-, the dev version is working great for us and I don't think we'll have any additional PRs to coro in the near term. When you get a chance, can you please publish to CRAN?

@lionel-
Copy link
Member

lionel- commented Jul 19, 2022

Great, I'll do that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants