Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series Changes breaks rpy2? #5698

jankatins opened this issue Dec 13, 2013 · 63 comments

Series Changes breaks rpy2? #5698

jankatins opened this issue Dec 13, 2013 · 63 comments
API Design Regression Functionality that used to work in a prior pandas version


Copy link

It seems that the changes to to Series break the data conversion to R: running this Notebook doesn't work anymore with some dev version from last week:

The resulting error is this:

ValueError                                Traceback (most recent call last)
<ipython-input-38-74fcaa767ca0> in <module>()
----> 1 get_ipython().run_cell_magic(u'R', u'-i xydata,xycols # list object to be transferred to python here', u'install.packages("ggplot2") # Had to add this for some reason, shouldn\'t be necessary\nlibrary(ggplot2)\ndf = data.frame(xydata)\nnames(df) <- c(xycols)\nprint(head(df))\nplot = ggplot(df, aes(x = X, y = Y)) + \ngeom_point(alpha = .8, color = \'dodgerblue\',size = 5) +\ngeom_point(data=subset(df, Y >= 6.7 | X >= 4), color = \'red\',size = 6) +\ntheme(axis.text.x = element_text(size= rel(1.5),angle=90, hjust=1)) +\nggtitle(\'Distance Pairs with outliers highlighted in red\')\nprint(plot)')

C:\portabel\Python27\lib\site-packages\IPython\core\interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)
   2141             magic_arg_s = self.var_expand(line, stack_depth)
   2142             with self.builtin_trap:
-> 2143                 result = fn(magic_arg_s, cell)
   2144             return result

C:\portabel\Python27\lib\site-packages\IPython\extensions\ in R(self, line, cell, local_ns)

C:\portabel\Python27\lib\site-packages\IPython\core\magic.pyc in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    195         if callable(arg):

C:\portabel\Python27\lib\site-packages\IPython\extensions\ in R(self, line, cell, local_ns)
    585                     except KeyError:
    586                         raise NameError("name '%s' is not defined" % input)
--> 587                 self.r.assign(input, self.pyconverter(val))
    589         if getattr(args, 'units') is not None:

C:\portabel\Python27\lib\site-packages\rpy2\robjects\functions.pyc in __call__(self, *args, **kwargs)
     84                 v = kwargs.pop(k)
     85                 kwargs[r_k] = v
---> 86         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)

C:\portabel\Python27\lib\site-packages\rpy2\robjects\functions.pyc in __call__(self, *args, **kwargs)
     30     def __call__(self, *args, **kwargs):
---> 31         new_args = [conversion.py2ri(a) for a in args]
     32         new_kwargs = {}
     33         for k, v in kwargs.iteritems():

C:\portabel\Python27\lib\site-packages\rpy2\robjects\pandas2ri.pyc in pandas2ri(obj)
     26                 od[name] = StrVector(values)
     27             else:
---> 28                 od[name] = ro.conversion.py2ri(values)
     29         return DataFrame(od)
     30     elif isinstance(obj, PandasIndex):

C:\portabel\Python27\lib\site-packages\rpy2\robjects\pandas2ri.pyc in pandas2ri(obj)
     49         else:
     50             # converted as a numpy array
---> 51             res = original_conversion(obj)
     52         # "index" is equivalent to "names" in R
     53         if obj.ndim == 1:

C:\portabel\Python27\lib\site-packages\rpy2\robjects\numpy2ri.pyc in numpy2ri(o)
     56             raise(ValueError("Unknown numpy array type."))
     57     else:
---> 58         res = ro.default_py2ri(o)
     59     return res

C:\portabel\Python27\lib\site-packages\rpy2\robjects\__init__.pyc in default_py2ri(o)
    146         res = rinterface.SexpVector([o, ], rinterface.CPLXSXP)
    147     else:
--> 148         raise(ValueError("Nothing can be done for the type %s at the moment." %(type(o))))
    149     return res

ValueError: Nothing can be done for the type <class 'pandas.core.series.Series'> at the moment.

I'm not sure if this is something pandas cares, but even if not it would be nice to mention it in the release notes.

Copy link

jreback commented Dec 13, 2013

I think they might need to have a slightly different conversion function, can you post to their dev site?

Copy link
Contributor Author

Copy link

jreback commented Dec 16, 2013

ok...well make this an API issue here to 'track' it.

Copy link

ghost commented Dec 19, 2013

@jreback, this could be a big deal once 0.13.0 final is released. No word from rpy2?

Moved to 0.13 to make final decision right before release is due. Would be good to preempt
a backlash upon release in one way or another.

This also breaks the docs.

Copy link

jreback commented Dec 19, 2013

@JanSchulz can u ping rpy2?
see what their schedule is?

Copy link

ghost commented Dec 19, 2013

cc @lgautier, we're anxious to avoid releasing a point release with no mitigation for this.

Copy link

could be as simple as checking for __array__() method if necessary.

Copy link

ghost commented Dec 19, 2013

Hope it is, but it needs to happen on rpy2's side, no?

Copy link

@jreback why doesn't Series support the __array_interface__? Was that an explicit choice?

I think we might need to do that to be able to actually work with rpy2.

Copy link

jreback commented Dec 20, 2013

not necessary as array is called

but if u need it explicitly then go for it
didn't seem that hard

I don't have rpy so really can't even try it

Copy link

jreback commented Dec 20, 2013

here is an example


Copy link

@JanSchulz could you post a really simple example? I can't seem to get your notebook to work. I think the fix for rpy2 is actually really simple.

Copy link

@jreback could we just do:

def __array_interface__(self):
    return self.__array__().__array_interface__()


Copy link

jreback commented Dec 20, 2013

it's a property (don't need function call)
but might work

Copy link
Contributor Author

@jtratner I also only worked with rpy2 the first time with that notebook. :-) The notebook is from @kevindavenport
@kevindavenport: do you have some more experience with rpy?

Here is a small notebook with three cells:

--- setup ---
%load_ext rmagic
import pandas as pd
df = pd.DataFrame({"x":[1,2,3,4,5], "y":[1,2,3,2,1]})
vals = df.values
cols = df.columns
--- using df.values: works ---
%%R -i vals,cols # list object to be transferred to python here
install.packages("ggplot2") # Had to add this for some reason, shouldn't be necessary
df = data.frame(vals)
names(df) <- c(cols)
plot = ggplot(df, aes(x = x, y = y)) + 
geom_point(alpha = .8, color = 'dodgerblue',size = 5)
--- df directly: does not work ---
%%R -i df # list object to be transferred to python here
install.packages("ggplot2") # Had to add this for some reason, shouldn't be necessary
df = data.frame(df)
plot = ggplot(df, aes(x = x, y = y)) + 
geom_point(alpha = .8, color = 'dodgerblue',size = 5)

Copy link
Contributor Author

BTW: I don't think that adding a property is enough: currently the error happens because numpy2ri checks if the input is an instance of numpy.ndarray:

def numpy2ri(o):
    """ Augmented conversion function, converting numpy arrays into
    rpy2.rinterface-level R structures. """
    if isinstance(o, numpy.ndarray):
        # numpy handling
         res = ro.default_py2ri(o)

As the pandas part (pandas2ri) basically just iterates of each column and treats a datetime column differently and everything else is delegated to the numpy code, I think nothing can be done on the pandas side :-( and rpy2 needs to be adjusted :-(

I did a small change:

def numpy2ri(o):
    """ Augmented conversion function, converting numpy arrays into
    rpy2.rinterface-level R structures. """
    if isinstance(o, (numpy.ndarray, pd.Series)):
        # numpy handling
         res = ro.default_py2ri(o)

and I got the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-3-3c6069b19c27> in <module>()
----> 1 get_ipython().run_cell_magic(u'R', u'-i df # list object to be transferred to python here', u'install.packages("ggplot2") # Had to add this for some reason, shouldn\'t be necessary\nlibrary(ggplot2)\ndf = data.frame(df)\nplot = ggplot(df, aes(x = x, y = y)) + \ngeom_point(alpha = .8, color = \'dodgerblue\',size = 5)\nprint(plot)')

C:\portabel\Python27\lib\site-packages\IPython\core\interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)
   2141             magic_arg_s = self.var_expand(line, stack_depth)
   2142             with self.builtin_trap:
-> 2143                 result = fn(magic_arg_s, cell)
   2144             return result

C:\portabel\Python27\lib\site-packages\IPython\extensions\rmagic.pyc in R(self, line, cell, local_ns)

C:\portabel\Python27\lib\site-packages\IPython\core\magic.pyc in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    195         if callable(arg):

C:\portabel\Python27\lib\site-packages\IPython\extensions\rmagic.pyc in R(self, line, cell, local_ns)
    585                     except KeyError:
    586                         raise NameError("name '%s' is not defined" % input)
--> 587                 self.r.assign(input, self.pyconverter(val))
    589         if getattr(args, 'units') is not None:

C:\portabel\Python27\lib\site-packages\rpy2\robjects\functions.pyc in __call__(self, *args, **kwargs)
     84                 v = kwargs.pop(k)
     85                 kwargs[r_k] = v
---> 86         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)

C:\portabel\Python27\lib\site-packages\rpy2\robjects\functions.pyc in __call__(self, *args, **kwargs)
     30     def __call__(self, *args, **kwargs):
---> 31         new_args = [conversion.py2ri(a) for a in args]
     32         new_kwargs = {}
     33         for k, v in kwargs.iteritems():

C:\portabel\Python27\lib\site-packages\rpy2\robjects\pandas2ri.pyc in pandas2ri(obj)
     26                 od[name] = StrVector(values)
     27             else:
---> 28                 od[name] = ro.conversion.py2ri(values)
     29         return DataFrame(od)
     30     elif isinstance(obj, PandasIndex):

C:\portabel\Python27\lib\site-packages\rpy2\robjects\pandas2ri.pyc in pandas2ri(obj)
     49         else:
     50             # converted as a numpy array
---> 51             res = original_conversion(obj)
     52         # "index" is equivalent to "names" in R
     53         if obj.ndim == 1:

C:\portabel\Python27\lib\site-packages\rpy2\robjects\ in numpy2ri(o)
     35         if o.dtype.kind in _kinds:
     36             # "F" means "use column-major order"
---> 37             vec = SexpVector(o.ravel("F"), _kinds[o.dtype.kind])
     38             dim = SexpVector(o.shape, INTSXP)
     39             res = ro.r.array(vec, dim=dim)

TypeError: ravel() takes exactly 1 argument (2 given)

as o is in this case a pandas.Series it seems that the ravel() implementation is different to the numpy.ndarray one :-(

Copy link
Contributor Author

Changing pandas.Series.ravel() to this:

    def ravel(self, order=None):
        return self.values.ravel(order)

will, together with the fix for isinstance above, produce a plot.

So if pandas would add the order to pandas.Series.ravel() this would be "bearable" ("Just add this small fix in ...") until rpy2 fixes the code and produces a new release.

Copy link

jreback commented Dec 20, 2013

@JanSchulz easy enough....on ravel; you can do a run-time path for numpy2ri?

Copy link
Contributor Author

@jtratner What is a run-time path?

Copy link

jreback commented Dec 20, 2013

run-time patch.....e.g. I presume this is defined somewhere in rpy2, you can overwrite the function on pandas import (not really nice to do, but should work)

Copy link

jreback commented Dec 20, 2013

@JanSchulz ok...ravel fix is in....give another try

Copy link

I have experience with rpy, how can I be of service?

Copy link

ghost commented Dec 20, 2013

@jreback , you're not suggesting fixing by monkey patching rpy2 during numpy import are you?
This isn't ruby you know. :)

Copy link

jreback commented Dec 20, 2013

@y-p where exactly is numpy2ri defined in rpy2 right? (that IS what I am suggesting)

Copy link

jreback commented Dec 20, 2013


in 0.13 (releasing imminently), Series is no longer a sub-class of ndarray, but rather of the NDFrame (same as DataFrame for example).

so apparently their is some isinstance detection going on with rpy which fails.

Copy link

ghost commented Dec 20, 2013

We need either a way to modify pandas to keep it compatible or a reasonable PR submitted
against rpy2 which we can point users at, if we can't get something merged by launch.

As much as I'm in a position to urge against it: monkey-patching another library is not an
acceptable solution. Even shipping broken is preferable.

Copy link

I'm pretty sure I have a fix, I just didn't have an example to work with. Should be a pretty trivial fix too.

Copy link
Contributor Author

The patch is easy:

--- rpy2\robjects\   Fri Dec 20 17:35:43 2013
+++ rpy2\robjects\   Fri Dec 20 17:42:42 2013
@@ -3,6 +3,7 @@
 import rpy2.rinterface as rinterface
 from rpy2.rinterface import SexpVector, INTSXP
 import numpy
+import pandas

 from rpy2.robjects.vectors import DataFrame, Vector, ListVector

@@ -26,7 +27,7 @@
 def numpy2ri(o):
     """ Augmented conversion function, converting numpy arrays into
     rpy2.rinterface-level R structures. """
-    if isinstance(o, numpy.ndarray):
+    if isinstance(o, (numpy.ndarray, pandas.Series)):
         if not o.dtype.isnative:
             raise(ValueError("Cannot pass numpy arrays with non-native byte orders at the moment."))

Source is here:

I don't have hg, so I', not currently able to do a PR.

Copy link

I'd change it to:

obj = getattr(obj, 'array', obj)

That way there's no pandas dep and works with anything that supports the
numpy array interface.

Copy link

@lgautier if @JanSchulz keeps having trouble I can submit a pull request on bitbucket, because I have everything installed (just moving right now so I don't have a lot of free time).

@y-p @jreback and other pandas peeps - Maybe we should consider making up a script that tests packages that we know depend on pandas so we can either make changes to pandas or notify those devs so they can work on a solution pre-release.

Copy link

ghost commented Dec 27, 2013

@lgautier, I agree on most points (especially your time being your own) but we would like
to ensure rpy2 is green prior to the release and we'll gladly help take care of that.

@jtratner, would you mind handling a backport of the patch + PR for the 2.3.x branch of rpy2 + testing
rpy2 suite against pandas master? If you're too busy, I'll volunteer myself.

Copy link

@y-p okay, I'll try to get that done tonight.

Copy link

ghost commented Dec 27, 2013

Thanks! let me know if you need to hand it off after all.

Copy link

Update: fix is relatively simple now that we pushed the ravel PR. I'm just checking to make sure that it ends up with the correct results and writing up some additional test cases. I'm not totally clear on the correct behavior, so I'm switching back and forth to make sure nothing broke.

Copy link

(and I'm on the same page with @JanSchulz 's one liner fix, with a small modification to make it work without pandas) - so is the goal to have a monkey patch in a gist that we can point to for 0.13? An actual function within pandas?

Just not clear how we handle incompatibilities like this and making it less painful for people with legacy setups.

Copy link

ghost commented Dec 28, 2013

2.4.x is already patched. @lgautier asked for a PR to release a new minor release of 2.3.x. so users
can upgrade with little disruption.

The release notes already point to this issue and can and users can find the fix and discussion here
if they look up rpy2 there. I think that's enough taken all together.

Copy link


Copy link

ghost commented Dec 28, 2013

... I'm assuming that PR will come from you? :)

Copy link


Copy link

ghost commented Jan 24, 2014

0.13.0 is out, the dev version of rpy2 has been fixed. I hope @jtratner submitted that PR.


@ghost ghost closed this as completed Jan 24, 2014
Copy link

He did. I just merged it (rpy2 branch version_2.3.x, and grafted onto version_2.4.x).

Copy link

ghost commented Jan 25, 2014

excellent. Thank you both.

Copy link

ghost commented Jan 25, 2014

Is it on pypi yet?

Copy link

Not yet. Probably some time over the week-end.
(Drone is currently looking at the candidate rpy2-2.3.9: ).

Copy link

Looking fine with Python 2.7, but causing segfault with Python 3.3 and numpy 1.7.1 ( I cannot reproduce the segfault locally though.

Any chance someone else could try out ?

Copy link

ghost commented Jan 25, 2014

Rerun the build, if it consistently segfaults, I'll take a look.

Copy link

Thanks for putting that fix in @lgautier!

Copy link

@y-p The build on is consistently ending with a segfault on Python 3.3, while the same code does not locally. I'll hold the release of rpy2-2.3.9 until at least one of the 2 things happens: others report to have it working fine, or the problem on is identified (and fixed).

Copy link

ghost commented Jan 25, 2014

That makes good sense. I'll try to repro on my box.

Copy link

ghost commented Jan 25, 2014

I can reproduce the segfault, and I can reproduce it prior to @jtratner's commit,
on 3.3 with numpy 1.7.1. namely with rpy2-419ca01.

~/src/rpy2/ λ nosetests                        
ENotImplementedError: Device activation not implemented.
Closing device.
NotImplementedError: Device closing not implemented.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.NotImplementedError: Device activation not implemented.
Closing device.
NotImplementedError: Device closing not implemented.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.NotImplementedError: Device activation not implemented.
Closing device.
NotImplementedError: Device closing not implemented.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.NotImplementedError: Device activation not implemented.
Closing device.
NotImplementedError: Device closing not implemented.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.F.NotImplementedError: Device mode not implemented.
NotImplementedError: Device mode not implemented.
.E[1]    29630 segmentation fault (core dumped)  nosetests

You should make sure the CI output gives your more information.

Copy link

It might well be the case: there was no build on drone for the branch version_2.3.x prior to the merge of the pull request.

Now this is quite odd:

  • version 2.3.8 has been around for quite some time, and the only change is the pull request.
  • the message you report correspond to (broken) code that should not be in version_2.3.x. Did you try switch to the right branch ?
hg clone -b version_2.3.x;
cd rpy2


hg clone;
cd rpy2;
hg update version_2.3.x
  • what version of R are you building and trying this with ?

Copy link

ghost commented Jan 25, 2014

I probably used the wrong version of nose, I compiled for 3.3 but invoked the py2 nose.
That did produce a segfault, but honestly I can't say what did it.

Trying again:

~/src/rpy2/ λ python3 -m rpy2.tests
rpy2 version: 2.3.9   
built against R version: 3-0.2--63987
ERROR: testPandas2ri (rpy2.robjects.tests.testPandasConversions.PandasConversionsTestCase)
Traceback (most recent call last):
  File "/usr/lib64/python3.3/site-packages/rpy2/robjects/tests/", line 74, in testPandas2ri
    pandas_df = robjects.conversion.ri2py(rdataf)
  File "/usr/lib64/python3.3/site-packages/rpy2/robjects/", line 63, in ri2pandas
    raise NotImplementedError("Conversion from rpy2 DataFrame to pandas' DataFrame")
NotImplementedError: Conversion from rpy2 DataFrame to pandas' DataFrame

Ran 349 tests in 4.889s

FAILED (errors=1)

... but no segfault. This is with db6c132, the current tip of the version_2.3.x branch.
on 64bit fedora 20.

That's all I have.

Copy link

For what it's worth, I get a bunch of segfaults on the released version
with pandas 0.12 installed.

Copy link

This must be version-specific somewhere, making it OK on some machine (my computer, @y-p 's box) but not others (drone's VM, your machine).
I cannot seem to get a segfault with pandas 0.12.0 here.

C compiler ? R version ? something else ?

(The issue tracker for rpy2 on bitbucket might be a better place to follow up on this)

Copy link

ghost commented Jan 25, 2014

I've seen issues raised by differences between debian and ubuntu libc. I run fedora, what do you and
drone use?

Copy link

Ubuntu (me 13.10, not sure about the version used by drone)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
API Design Regression Functionality that used to work in a prior pandas version
None yet

No branches or pull requests

5 participants