New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Right way to use 'fit_generator' in R? #41

Closed
haven-jeon opened this Issue Jun 14, 2017 · 6 comments

Comments

Projects
None yet
4 participants
@haven-jeon

haven-jeon commented Jun 14, 2017

R doesn't have yield and iterator, how can I make generator function for fit_generator?

@jjallaire

This comment has been minimized.

Show comment
Hide comment
@jjallaire

jjallaire Jun 14, 2017

Member

I think we could do something to automatically create a Python iterator/generator out of an itertools iterator (https://cran.r-project.org/web/packages/itertools/itertools.pdf). @hadley Do you think that's a reasonable approach? Are there other iterator/generator packages for R we should consider?

Member

jjallaire commented Jun 14, 2017

I think we could do something to automatically create a Python iterator/generator out of an itertools iterator (https://cran.r-project.org/web/packages/itertools/itertools.pdf). @hadley Do you think that's a reasonable approach? Are there other iterator/generator packages for R we should consider?

@hadley

This comment has been minimized.

Show comment
Hide comment
@hadley

hadley Jun 14, 2017

Member

Another option that I've considered is that it might be possible to make our own generators using a special function to replace function that sets up the environment correctly so that yield could work like a python generator. But that requires quite a bit of thinking and experimenting, and might ultimately fail, so I'd say using itertools is a reasonable place to start.

Member

hadley commented Jun 14, 2017

Another option that I've considered is that it might be possible to make our own generators using a special function to replace function that sets up the environment correctly so that yield could work like a python generator. But that requires quite a bit of thinking and experimenting, and might ultimately fail, so I'd say using itertools is a reasonable place to start.

@jjallaire

This comment has been minimized.

Show comment
Hide comment
@jjallaire

jjallaire Jun 15, 2017

Member

Looking a bit closer at this, I think that for now fit_generator is restricted to the keras built-in generators (e.g. flow_images_from_directory). This is because Keras currently runs generators in either background threads (the default) or using multiprocessing with pickle serialization. Neither of these will be compatible with R based generators.

Member

jjallaire commented Jun 15, 2017

Looking a bit closer at this, I think that for now fit_generator is restricted to the keras built-in generators (e.g. flow_images_from_directory). This is because Keras currently runs generators in either background threads (the default) or using multiprocessing with pickle serialization. Neither of these will be compatible with R based generators.

@haven-jeon

This comment has been minimized.

Show comment
Hide comment
@haven-jeon

haven-jeon Jun 16, 2017

I also work with itertools and other package by making R iterator. But It did not succeed in any way. One of the method to think, makes python generator wrapper which can include R code. It may not be a beautiful way.

haven-jeon commented Jun 16, 2017

I also work with itertools and other package by making R iterator. But It did not succeed in any way. One of the method to think, makes python generator wrapper which can include R code. It may not be a beautiful way.

@jjallaire jjallaire closed this Jun 19, 2017

@dselivanov

This comment has been minimized.

Show comment
Hide comment
@dselivanov

dselivanov Jul 5, 2017

Little bit off-topic. @jjallaire, my experience with itertools and iterators is not great at all (see for example this SO question). I ended up by reimplementing them with R6 as @hadley suggested (thanks a lot!). I believe R deserves redesigned iterators2 package.

dselivanov commented Jul 5, 2017

Little bit off-topic. @jjallaire, my experience with itertools and iterators is not great at all (see for example this SO question). I ended up by reimplementing them with R6 as @hadley suggested (thanks a lot!). I believe R deserves redesigned iterators2 package.

@jjallaire

This comment has been minimized.

Show comment
Hide comment
@jjallaire

jjallaire Jul 6, 2017

Member

I just figured out how to surmount the threading issues associated with how Keras calls Python generators. In short, Keras calls generators on a background thread which won't work with R because all R code must run in the same thread. The workaround is to marshal the calls from the background thread to the foreground thread (this has been implemented in the reticulate package).

Updated documentation on using R functions as generators is here: https://rstudio.github.io/keras/articles/faq.html#how-can-i-use-keras-with-datasets-that-dont-fit-in-memory

Note that generators are just R functions that return a value: https://rstudio.github.io/reticulate/articles/introduction.html#generators. The scheme is similar to that described here: https://cartesianfaith.com/2013/01/05/infinite-generators-in-r/ (using <<- to update state within the closure where the generator is defined).

We use a NULL return value (or any sentinel value you specify) to indicate that iteration is complete. As it so happens this doesn't actually come into play for Keras since it expects generators to yield data infinitely.

@hadley If we do decide to implement a yield based generator as you described we can also easily make that compatible with reticulate/keras.

Member

jjallaire commented Jul 6, 2017

I just figured out how to surmount the threading issues associated with how Keras calls Python generators. In short, Keras calls generators on a background thread which won't work with R because all R code must run in the same thread. The workaround is to marshal the calls from the background thread to the foreground thread (this has been implemented in the reticulate package).

Updated documentation on using R functions as generators is here: https://rstudio.github.io/keras/articles/faq.html#how-can-i-use-keras-with-datasets-that-dont-fit-in-memory

Note that generators are just R functions that return a value: https://rstudio.github.io/reticulate/articles/introduction.html#generators. The scheme is similar to that described here: https://cartesianfaith.com/2013/01/05/infinite-generators-in-r/ (using <<- to update state within the closure where the generator is defined).

We use a NULL return value (or any sentinel value you specify) to indicate that iteration is complete. As it so happens this doesn't actually come into play for Keras since it expects generators to yield data infinitely.

@hadley If we do decide to implement a yield based generator as you described we can also easily make that compatible with reticulate/keras.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment