New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot coerce pandas dataframe to R dataframe #101

Open
VyshaliEnukonda opened this Issue Sep 21, 2017 · 7 comments

Comments

Projects
None yet
6 participants
@VyshaliEnukonda
Copy link

VyshaliEnukonda commented Sep 21, 2017


> use_virtualenv("C:\\Users\\venukond\\AppData\\Local\\Continuum\\Anaconda3\\envs\\py35\\")
> py_config()
python:         C:\Users\venukond\AppData\Local\CONTIN~1\ANACON~1\python.exe
libpython:      C:/Users/venukond/AppData/Local/CONTIN~1/ANACON~1/python35.dll
pythonhome:     C:\Users\venukond\AppData\Local\CONTIN~1\ANACON~1
version:        3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
Architecture:   64bit
numpy:          C:\Users\venukond\AppData\Local\CONTIN~1\ANACON~1\lib\site-packages\numpy
numpy_version:  1.11.1
pandas:         C:\Users\venukond\AppData\Local\CONTIN~1\ANACON~1\lib\site-packages\pandas

python versions found: 
 C:\Users\venukond\AppData\Local\CONTIN~1\ANACON~1\python.exe
 C:\Users\venukond\AppData\Roaming\BLACKR~1\python\3420B8~1.3\python.exe
 C:\Users\venukond\AppData\Local\Continuum\Anaconda3\envs\py34\python.exe
 C:\Users\venukond\AppData\Local\Continuum\Anaconda3\envs\py34_test\python.exe
 C:\Users\venukond\AppData\Local\Continuum\Anaconda3\envs\py35\python.exe
> pd <- import("pandas")
> library(reticulate)
> pd <- import("pandas")
> np <- import("numpy")
> 
> 
> df <- pd$DataFrame(
+     list(
+         'A' = 1.,
+         'B' = pd$Timestamp('20130102'),
+         'C' = pd$Series(1, index = seq(4), dtype = 'float32'),
+         'D' = np$array(rep(3L, 4), dtype='int32'),
+         'E' = pd$Categorical(c("test","train","test","train")),
+         'F' = 'foo'
+     )
+ )
> class(df)
[1] "pandas.core.frame.DataFrame"   "pandas.core.generic.NDFrame"   "pandas.core.base.PandasObject" "pandas.core.base.StringMixin" 
[5] "python.builtin.object"        
> as.data.frame(df)
Error in as.data.frame.default(df) : 
  c("cannot coerce class \"c(\"pandas.core.frame.DataFrame\", \"pandas.core.generic.NDFrame\", \" to a data.frame", "cannot coerce class \"\"pandas.core.base.PandasObject\", \"pandas.core.base.StringMixin\", \" to a data.frame", "cannot coerce class \"\"python.builtin.object\")\" to a data.frame")
```

```
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] reticulate_1.1

loaded via a namespace (and not attached):
[1] tools_3.3.0  Rcpp_0.12.9  jsonlite_1.2
>
```
```
@jjallaire

This comment has been minimized.

Copy link
Member

jjallaire commented Sep 22, 2017

That's correct, we don't currently have support for marshaling data frames. We do plan on working on this in the future, in the meantime you can convert the slice(s) of the data frame you want access to in R into NumPy arrays: https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array-preserving-index

@kenahoo

This comment has been minimized.

Copy link

kenahoo commented Oct 5, 2017

Hi @jjallaire, I don't quite see what your suggestion means - is it the df.reset_index().values.ravel().view(dtype=[('index', int), ('A', float), ('B', float), ('C', float)]) thing in meteore's answer? Or the df_to_sarray() in Phil's answer maybe?

@mkoohafkan

This comment has been minimized.

Copy link

mkoohafkan commented Nov 25, 2017

Another workaround is to use JSON:

library(reticulate)
pandas = import('pandas')
numpy = import('numpy')
numpy.random = import('numpy.random')
p.df = pandas$DataFrame(numpy.random$randint(low = 0, high = 10, size = c(5, 5)),
  columns = c('a', 'b', 'c', 'd', 'e'))

library(rjson)
do.call(rbind,lapply(fromJSON(p.df$to_json(orient='records')), as.data.frame))

A bit clunky though.

@harryprince

This comment has been minimized.

Copy link

harryprince commented Feb 4, 2018

Wrong Code:

numpy.random$randint(low = 0, high = 10, size = c(5, 5))

Error:

Error in py_call_impl(callable, dots$args, dots$keywords) : TypeError: 'float' object cannot be interpreted as an index

Right Code:

numpy.random$randint(low = 0, high = 10, size = c(5L, 5L))

@VyshaliEnukonda

@jjallaire

This comment has been minimized.

Copy link
Member

jjallaire commented Mar 6, 2018

We've just merged support for converting to and from Pandas data frames onto master. Would love it if anyone subscribed to this thread could test it out. You can install with:

devtools::install_github("rstudio/reticulate")
@jjallaire

This comment has been minimized.

Copy link
Member

jjallaire commented Mar 6, 2018

@kevinushey

This comment has been minimized.

Copy link
Collaborator

kevinushey commented Mar 6, 2018

Looks like the example in this issue doesn't quite work yet -- I'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment