Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

py_to_r() fails when when importing a python module with an alias #319

Closed
hitfuture opened this issue Jul 23, 2018 · 12 comments
Closed

py_to_r() fails when when importing a python module with an alias #319

hitfuture opened this issue Jul 23, 2018 · 12 comments

Comments

@hitfuture
Copy link

hitfuture commented Jul 23, 2018

I am using the reticulate package to integrate Python into an R package I'm building (hitfuture/Neo4jDriveR). One of the capabilities I need is to return R data.frames from a method in the R6 based object model. I utilize Python Pandas package to create a DataFrame in the reticulate Python environment. My objective is to return an R data.frame. The issue I'm seeing is that when I used reticulate::py_to_r(df) it does not convert to R and instead it returns a python DataFrame object. The problem is based on the way reticulate::import() is utilized.

I had forked reticulate into my github repository so I am using the latest version (1.9.0.9001). I see that there are well defined S3 methods to handle pandas DataFrame conversion in the reticulate py_to_r() S3 class (e.g. py_to_r.pandas.core.frame.DataFrame). The following test executes correctly in a new R session.

library(reticulate)
library(testthat)
 
pd <- import("pandas")
py_config()
## python:         /usr/local/bin/python3
## libpython:      /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/config-3.6m-darwin/libpython3.6.dylib
## pythonhome:     /Library/Frameworks/Python.framework/Versions/3.6:/Library/Frameworks/Python.framework/Versions/3.6
## version:        3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31)  [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
## numpy:          /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/numpy
## numpy_version:  1.14.5
## pandas:         /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas
## 
## NOTE: Python version was forced by RETICULATE_PYTHON
before <- iris
expect_is(before,class = "data.frame")
convert <- r_to_py(before)
expect_is(convert,class = "pandas.core.frame.DataFrame")
print(class(convert))
## [1] "pandas.core.frame.DataFrame"       
## [2] "pandas.core.generic.NDFrame"       
## [3] "pandas.core.base.PandasObject"     
## [4] "pandas.core.base.StringMixin"      
## [5] "pandas.core.accessor.DirNamesMixin"
## [6] "pandas.core.base.SelectionMixin"   
## [7] "python.builtin.object"
after  <- py_to_r(convert)

expect_is(after,class = "data.frame")

The failure occurs when I utilize the function 'reticulate::import("pandas", as="pd")' with the as parameter. You can see below that the pandas.DataFrame is not converted into an R data.frame. The problem is due to the S3 method for the pandas DataFrame not matching based on the name of the python module.

pd <- import("pandas",as = "pd")
py_config()
## python:         /usr/local/bin/python3
## libpython:      /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/config-3.6m-darwin/libpython3.6.dylib
## pythonhome:     /Library/Frameworks/Python.framework/Versions/3.6:/Library/Frameworks/Python.framework/Versions/3.6
## version:        3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 05:52:31)  [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
## numpy:          /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/numpy
## numpy_version:  1.14.5
## pandas:         /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas
## 
## NOTE: Python version was forced by RETICULATE_PYTHON
before <- iris 
expect_is(before,class = "data.frame")

convert <- r_to_py(before)
expect_is(convert,class = "pd.core.frame.DataFrame")

after  <- py_to_r(convert)
print(class(after))
## [1] "pd.core.frame.DataFrame"        "pd.core.generic.NDFrame"       
## [3] "pd.core.base.PandasObject"      "pd.core.base.StringMixin"      
## [5] "pd.core.accessor.DirNamesMixin" "pd.core.base.SelectionMixin"   
## [7] "python.builtin.object"
#expect_is(after,class = "data.frame")

I have tested this on two different Docker containers, and on my MacBook Pro where the same error occurs. The problem is based on the fact that if you assign an Alias in the import() function, the S3 methods will not be correctly selected. This will apply to all py_to_r conversions if the alias is not equivalent to the default name. There is not an easy solution to solve this with the S3 methodology. It would be good if the S3 methods are exported in the reticulate package so this can be handled manually. This should also be documented.

Thanks,
Brett

@jjallaire
Copy link
Member

jjallaire commented Jul 23, 2018

The as parameter is there for normalizing Python module names if they could be implemented by multiple sources (specifically, we use this to have a shared set of R S3 methods for the base keras implementation and the implementation built into tensorflow). I would say don't use as at all if it's causing you problems, and if you do need to use it and there are further issues, just define multiple S3 methods that handle all the cases you need.

If I'm misunderstanding the issue and you can see your way to another resolution do let me know as I'm not sure I follow the issue you are seeing fully!

@hitfuture
Copy link
Author

hitfuture commented Jul 23, 2018

I fixed the problem by removing the as parameter within the package I'm building once I discovered this issue. The main problem I see is that most Python users import these libraries with an alias and they need to understand what you just stated. The py_to_r function will not apply if you use an alias. This applies to any S3 based methods in your reticulate. I think adding this information into your vignettes or other documentation would be useful. This package is incredibly useful and I am very happy with it overall. You can see my opinion here Python in R Thanks for your quick response.

@jjallaire
Copy link
Member

jjallaire commented Jul 24, 2018

The docs on the as parameter is as follows: "Alias for module name (affects names of R classes)"

This certainly does allude to the fact (albeit somewhat obliquely) that S3 method dispatch will be affected.

Nevertheless, I added some additional docs cautioning users against using this parameter outside of package development.

@kevinushey
Copy link
Collaborator

kevinushey commented Jul 24, 2018

I suspect naming the parameter as might give Python users the impression that it operates the same as one might write in Python with e.g.

import pandas as pd

whereas that construct is more accurately reflected on the R side with a plain old

pd <- import("panadas")

It might be more appropriate to call that parameter something like alias just to make it clear it's not intended to be some R analogue to existing Python behavior (of course it might be too late for that now)

@jjallaire
Copy link
Member

jjallaire commented Jul 24, 2018

Yeah, if I rename it alias then it will break existing clients :-(

@hitfuture
Copy link
Author

hitfuture commented Jul 24, 2018

FYI - made a change to reticulate::import() function which renames the S3 methods associated with the module that is being renamed to the alias. This allowed all related methods to function. There is much more testing and I'm sure a better way of doing this that would have to occur yet I think the idea is correct. Take a look if you want at the reticulate::import method and the additional function add_functions_by_alias() I added in.

https://raw.githubusercontent.com/hitfuture/reticulate/fix/alias_functions/R/python.R

@jjallaire
Copy link
Member

jjallaire commented Jul 25, 2018

I don't think I want to be dynamically renaming methods though.

The real question is what does the R user that specifies as hope to accomplish by doing so? It doesn't have the equivalent semantics as Python does (as that is handled by the assignment as @kevinushey points out). We explicitly say in the docs that you are changing S3 dispatch when you use the parameter and that it's an advanced parameter that should only be used in packages. I think this should suffice!

@hitfuture
Copy link
Author

hitfuture commented Jul 25, 2018

Your answer makes sense. I'm not having any problems with reticulate in the package I'm building since I realized what was happening. It is interesting having been trained in Python over the last 3 months where I was required to use pd for pandas and np for numpy which I also used in reticulate because of all of the online education from DataCamp. Thanks again for the feedback and documentation is all that matters here. Reticulate is a huge win for the language R!

@jjallaire
Copy link
Member

jjallaire commented Jul 25, 2018

Okay, great. Glad it's working so well for you!

@alyamahmoud
Copy link

alyamahmoud commented Dec 6, 2019

I run into the same error even though I import the module without the use of 'as' or 'alias'

reticulate::import("pandas")
dat <- py_load_object('dat.p') # pickle object
dat2List <- reshape2::melt(dat)

Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ‘c("python.builtin.set", "python.builtin.object")’ to a data.frame

any suggestions ?
reticulate is absolutely awesome !

@kevinushey
Copy link
Collaborator

kevinushey commented Dec 6, 2019

What kind of Python object is returned by:

dat <- py_load_object('dat.p') # pickle object

? I suspect you'll need to use py_to_r() and then possibly transform it to ensure it is indeed an R data.frame.

@skeydan
Copy link
Collaborator

skeydan commented Dec 6, 2019

(((Was just about to answer as well :-)))

I recently used py_load_object "in the wild" and the returned object was a list (it couldn't even have been a data.frame because the object contained columns of different length...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants