Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series Changes breaks rpy2? #5698

Closed
jankatins opened this issue Dec 13, 2013 · 63 comments
Closed

Series Changes breaks rpy2? #5698

jankatins opened this issue Dec 13, 2013 · 63 comments
Labels
API Design Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jankatins
Copy link
Contributor

It seems that the changes to to Series break the data conversion to R: running this Notebook doesn't work anymore with some dev version from last week:
https://gist.github.com/kevindavenport/7771325/raw/87ab5603f406729c6a3866f95af9a1ebfedcf619/Mahalanobis_Outliers.ipynb

The resulting error is this:

#xydata=pandas.DataFrame(...)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-74fcaa767ca0> in <module>()
----> 1 get_ipython().run_cell_magic(u'R', u'-i xydata,xycols # list object to be transferred to python here', u'install.packages("ggplot2") # Had to add this for some reason, shouldn\'t be necessary\nlibrary(ggplot2)\ndf = data.frame(xydata)\nnames(df) <- c(xycols)\nprint(head(df))\nplot = ggplot(df, aes(x = X, y = Y)) + \ngeom_point(alpha = .8, color = \'dodgerblue\',size = 5) +\ngeom_point(data=subset(df, Y >= 6.7 | X >= 4), color = \'red\',size = 6) +\ntheme(axis.text.x = element_text(size= rel(1.5),angle=90, hjust=1)) +\nggtitle(\'Distance Pairs with outliers highlighted in red\')\nprint(plot)')

C:\portabel\Python27\lib\site-packages\IPython\core\interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)
   2141             magic_arg_s = self.var_expand(line, stack_depth)
   2142             with self.builtin_trap:
-> 2143                 result = fn(magic_arg_s, cell)
   2144             return result
   2145 

C:\portabel\Python27\lib\site-packages\IPython\extensions\rmagic.py in R(self, line, cell, local_ns)

C:\portabel\Python27\lib\site-packages\IPython\core\magic.pyc in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

C:\portabel\Python27\lib\site-packages\IPython\extensions\rmagic.py in R(self, line, cell, local_ns)
    585                     except KeyError:
    586                         raise NameError("name '%s' is not defined" % input)
--> 587                 self.r.assign(input, self.pyconverter(val))
    588 
    589         if getattr(args, 'units') is not None:

C:\portabel\Python27\lib\site-packages\rpy2\robjects\functions.pyc in __call__(self, *args, **kwargs)
     84                 v = kwargs.pop(k)
     85                 kwargs[r_k] = v
---> 86         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)

C:\portabel\Python27\lib\site-packages\rpy2\robjects\functions.pyc in __call__(self, *args, **kwargs)
     29 
     30     def __call__(self, *args, **kwargs):
---> 31         new_args = [conversion.py2ri(a) for a in args]
     32         new_kwargs = {}
     33         for k, v in kwargs.iteritems():

C:\portabel\Python27\lib\site-packages\rpy2\robjects\pandas2ri.pyc in pandas2ri(obj)
     26                 od[name] = StrVector(values)
     27             else:
---> 28                 od[name] = ro.conversion.py2ri(values)
     29         return DataFrame(od)
     30     elif isinstance(obj, PandasIndex):

C:\portabel\Python27\lib\site-packages\rpy2\robjects\pandas2ri.pyc in pandas2ri(obj)
     49         else:
     50             # converted as a numpy array
---> 51             res = original_conversion(obj)
     52         # "index" is equivalent to "names" in R
     53         if obj.ndim == 1:

C:\portabel\Python27\lib\site-packages\rpy2\robjects\numpy2ri.pyc in numpy2ri(o)
     56             raise(ValueError("Unknown numpy array type."))
     57     else:
---> 58         res = ro.default_py2ri(o)
     59     return res
     60 

C:\portabel\Python27\lib\site-packages\rpy2\robjects\__init__.pyc in default_py2ri(o)
    146         res = rinterface.SexpVector([o, ], rinterface.CPLXSXP)
    147     else:
--> 148         raise(ValueError("Nothing can be done for the type %s at the moment." %(type(o))))
    149     return res
    150 

ValueError: Nothing can be done for the type <class 'pandas.core.series.Series'> at the moment.

I'm not sure if this is something pandas cares, but even if not it would be nice to mention it in the release notes.

@jreback
Copy link
Contributor

jreback commented Dec 13, 2013

I think they might need to have a slightly different conversion function, can you post to their dev site?

@jankatins
Copy link
Contributor Author

@jreback
Copy link
Contributor

jreback commented Dec 16, 2013

ok...well make this an API issue here to 'track' it.

@ghost
Copy link

ghost commented Dec 19, 2013

@jreback, this could be a big deal once 0.13.0 final is released. No word from rpy2?

Moved to 0.13 to make final decision right before release is due. Would be good to preempt
a backlash upon release in one way or another.

This also breaks the docs.

@jreback
Copy link
Contributor

jreback commented Dec 19, 2013

@JanSchulz can u ping rpy2?
see what their schedule is?

@ghost
Copy link

ghost commented Dec 19, 2013

cc @lgautier, we're anxious to avoid releasing a point release with no mitigation for this.

@jtratner
Copy link
Contributor

could be as simple as checking for __array__() method if necessary.

@ghost
Copy link

ghost commented Dec 19, 2013

Hope it is, but it needs to happen on rpy2's side, no?

@jtratner
Copy link
Contributor

@jreback why doesn't Series support the __array_interface__? Was that an explicit choice?

I think we might need to do that to be able to actually work with rpy2.

@jreback
Copy link
Contributor

jreback commented Dec 20, 2013

not necessary as array is called

but if u need it explicitly then go for it
didn't seem that hard

I don't have rpy so really can't even try it

@jreback
Copy link
Contributor

jreback commented Dec 20, 2013

here is an example

emcconville/wand#65

@jtratner
Copy link
Contributor

@JanSchulz could you post a really simple example? I can't seem to get your notebook to work. I think the fix for rpy2 is actually really simple.

@jtratner
Copy link
Contributor

@jreback could we just do:

def __array_interface__(self):
    return self.__array__().__array_interface__()

?

@jreback
Copy link
Contributor

jreback commented Dec 20, 2013

it's a property (don't need function call)
but might work

@jankatins
Copy link
Contributor Author

@jtratner I also only worked with rpy2 the first time with that notebook. :-) The notebook is from @kevindavenport
@kevindavenport: do you have some more experience with rpy?

Here is a small notebook with three cells:

--- setup ---
%load_ext rmagic
import pandas as pd
df = pd.DataFrame({"x":[1,2,3,4,5], "y":[1,2,3,2,1]})
vals = df.values
cols = df.columns
---
--- using df.values: works ---
%%R -i vals,cols # list object to be transferred to python here
install.packages("ggplot2") # Had to add this for some reason, shouldn't be necessary
library(ggplot2)
df = data.frame(vals)
names(df) <- c(cols)
plot = ggplot(df, aes(x = x, y = y)) + 
geom_point(alpha = .8, color = 'dodgerblue',size = 5)
print(plot)
---
--- df directly: does not work ---
%%R -i df # list object to be transferred to python here
install.packages("ggplot2") # Had to add this for some reason, shouldn't be necessary
library(ggplot2)
df = data.frame(df)
plot = ggplot(df, aes(x = x, y = y)) + 
geom_point(alpha = .8, color = 'dodgerblue',size = 5)
print(plot)
---

@jankatins
Copy link
Contributor Author

BTW: I don't think that adding a property is enough: currently the error happens because numpy2ri checks if the input is an instance of numpy.ndarray:

def numpy2ri(o):
    """ Augmented conversion function, converting numpy arrays into
    rpy2.rinterface-level R structures. """
    if isinstance(o, numpy.ndarray):
        # numpy handling
    else:
         res = ro.default_py2ri(o)

As the pandas part (pandas2ri) basically just iterates of each column and treats a datetime column differently and everything else is delegated to the numpy code, I think nothing can be done on the pandas side :-( and rpy2 needs to be adjusted :-(

I did a small change:

def numpy2ri(o):
    """ Augmented conversion function, converting numpy arrays into
    rpy2.rinterface-level R structures. """
    if isinstance(o, (numpy.ndarray, pd.Series)):
        # numpy handling
    else:
         res = ro.default_py2ri(o)

and I got the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-3c6069b19c27> in <module>()
----> 1 get_ipython().run_cell_magic(u'R', u'-i df # list object to be transferred to python here', u'install.packages("ggplot2") # Had to add this for some reason, shouldn\'t be necessary\nlibrary(ggplot2)\ndf = data.frame(df)\nplot = ggplot(df, aes(x = x, y = y)) + \ngeom_point(alpha = .8, color = \'dodgerblue\',size = 5)\nprint(plot)')

C:\portabel\Python27\lib\site-packages\IPython\core\interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)
   2141             magic_arg_s = self.var_expand(line, stack_depth)
   2142             with self.builtin_trap:
-> 2143                 result = fn(magic_arg_s, cell)
   2144             return result
   2145 

C:\portabel\Python27\lib\site-packages\IPython\extensions\rmagic.pyc in R(self, line, cell, local_ns)

C:\portabel\Python27\lib\site-packages\IPython\core\magic.pyc in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

C:\portabel\Python27\lib\site-packages\IPython\extensions\rmagic.pyc in R(self, line, cell, local_ns)
    585                     except KeyError:
    586                         raise NameError("name '%s' is not defined" % input)
--> 587                 self.r.assign(input, self.pyconverter(val))
    588 
    589         if getattr(args, 'units') is not None:

C:\portabel\Python27\lib\site-packages\rpy2\robjects\functions.pyc in __call__(self, *args, **kwargs)
     84                 v = kwargs.pop(k)
     85                 kwargs[r_k] = v
---> 86         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)

C:\portabel\Python27\lib\site-packages\rpy2\robjects\functions.pyc in __call__(self, *args, **kwargs)
     29 
     30     def __call__(self, *args, **kwargs):
---> 31         new_args = [conversion.py2ri(a) for a in args]
     32         new_kwargs = {}
     33         for k, v in kwargs.iteritems():

C:\portabel\Python27\lib\site-packages\rpy2\robjects\pandas2ri.pyc in pandas2ri(obj)
     26                 od[name] = StrVector(values)
     27             else:
---> 28                 od[name] = ro.conversion.py2ri(values)
     29         return DataFrame(od)
     30     elif isinstance(obj, PandasIndex):

C:\portabel\Python27\lib\site-packages\rpy2\robjects\pandas2ri.pyc in pandas2ri(obj)
     49         else:
     50             # converted as a numpy array
---> 51             res = original_conversion(obj)
     52         # "index" is equivalent to "names" in R
     53         if obj.ndim == 1:

C:\portabel\Python27\lib\site-packages\rpy2\robjects\numpy2ri.py in numpy2ri(o)
     35         if o.dtype.kind in _kinds:
     36             # "F" means "use column-major order"
---> 37             vec = SexpVector(o.ravel("F"), _kinds[o.dtype.kind])
     38             dim = SexpVector(o.shape, INTSXP)
     39             res = ro.r.array(vec, dim=dim)

TypeError: ravel() takes exactly 1 argument (2 given)

as o is in this case a pandas.Series it seems that the ravel() implementation is different to the numpy.ndarray one :-(

@jankatins
Copy link
Contributor Author

Changing pandas.Series.ravel() to this:

    def ravel(self, order=None):
        return self.values.ravel(order)

will, together with the fix for isinstance above, produce a plot.

So if pandas would add the order to pandas.Series.ravel() this would be "bearable" ("Just add this small fix in ...") until rpy2 fixes the code and produces a new release.

@jreback
Copy link
Contributor

jreback commented Dec 20, 2013

@JanSchulz easy enough....on ravel; you can do a run-time path for numpy2ri?

@jankatins
Copy link
Contributor Author

@jtratner What is a run-time path?

@jreback
Copy link
Contributor

jreback commented Dec 20, 2013

run-time patch.....e.g. I presume this is defined somewhere in rpy2, you can overwrite the function on pandas import (not really nice to do, but should work)

@jreback
Copy link
Contributor

jreback commented Dec 20, 2013

@JanSchulz ok...ravel fix is in....give another try

@kevindavenport
Copy link

I have experience with rpy, how can I be of service?

@ghost
Copy link

ghost commented Dec 20, 2013

@jreback , you're not suggesting fixing by monkey patching rpy2 during numpy import are you?
This isn't ruby you know. :)

@jreback
Copy link
Contributor

jreback commented Dec 20, 2013

@y-p where exactly is numpy2ri defined in rpy2 right? (that IS what I am suggesting)

@jreback
Copy link
Contributor

jreback commented Dec 20, 2013

@kevindavenport

in 0.13 (releasing imminently), Series is no longer a sub-class of ndarray, but rather of the NDFrame (same as DataFrame for example).

so apparently their is some isinstance detection going on with rpy which fails.

@ghost
Copy link

ghost commented Dec 20, 2013

We need either a way to modify pandas to keep it compatible or a reasonable PR submitted
against rpy2 which we can point users at, if we can't get something merged by launch.

As much as I'm in a position to urge against it: monkey-patching another library is not an
acceptable solution. Even shipping broken is preferable.

@jtratner
Copy link
Contributor

I'm pretty sure I have a fix, I just didn't have an example to work with. Should be a pretty trivial fix too.

@jankatins
Copy link
Contributor Author

The patch is easy:

--- rpy2\robjects\numpy2ri.py   Fri Dec 20 17:35:43 2013
+++ rpy2\robjects\numpy2ri.py.new   Fri Dec 20 17:42:42 2013
@@ -3,6 +3,7 @@
 import rpy2.rinterface as rinterface
 from rpy2.rinterface import SexpVector, INTSXP
 import numpy
+import pandas

 from rpy2.robjects.vectors import DataFrame, Vector, ListVector

@@ -26,7 +27,7 @@
 def numpy2ri(o):
     """ Augmented conversion function, converting numpy arrays into
     rpy2.rinterface-level R structures. """
-    if isinstance(o, numpy.ndarray):
+    if isinstance(o, (numpy.ndarray, pandas.Series)):
         if not o.dtype.isnative:
             raise(ValueError("Cannot pass numpy arrays with non-native byte orders at the moment."))

Source is here: https://bitbucket.org/lgautier/rpy2/src/e2d25d5bd6254c5e381d87c46c90cac30f18b5b2/rpy/robjects/numpy2ri.py?at=version_2.4.x

I don't have hg, so I', not currently able to do a PR.

@jtratner
Copy link
Contributor

I'd change it to:

obj = getattr(obj, 'array', obj)

That way there's no pandas dep and works with anything that supports the
numpy array interface.

@jtratner
Copy link
Contributor

@lgautier if @JanSchulz keeps having trouble I can submit a pull request on bitbucket, because I have everything installed (just moving right now so I don't have a lot of free time).

@y-p @jreback and other pandas peeps - Maybe we should consider making up a script that tests packages that we know depend on pandas so we can either make changes to pandas or notify those devs so they can work on a solution pre-release.

@ghost
Copy link

ghost commented Dec 27, 2013

@lgautier, I agree on most points (especially your time being your own) but we would like
to ensure rpy2 is green prior to the release and we'll gladly help take care of that.

@jtratner, would you mind handling a backport of the patch + PR for the 2.3.x branch of rpy2 + testing
rpy2 suite against pandas master? If you're too busy, I'll volunteer myself.

@jtratner
Copy link
Contributor

@y-p okay, I'll try to get that done tonight.

@ghost
Copy link

ghost commented Dec 27, 2013

Thanks! let me know if you need to hand it off after all.

@jtratner
Copy link
Contributor

Update: fix is relatively simple now that we pushed the ravel PR. I'm just checking to make sure that it ends up with the correct results and writing up some additional test cases. I'm not totally clear on the correct behavior, so I'm switching back and forth to make sure nothing broke.

@jtratner
Copy link
Contributor

(and I'm on the same page with @JanSchulz 's one liner fix, with a small modification to make it work without pandas) - so is the goal to have a monkey patch in a gist that we can point to for 0.13? An actual function within pandas?

Just not clear how we handle incompatibilities like this and making it less painful for people with legacy setups.

@ghost
Copy link

ghost commented Dec 28, 2013

2.4.x is already patched. @lgautier asked for a PR to release a new minor release of 2.3.x. so users
can upgrade with little disruption.

The release notes already point to this issue and can and users can find the fix and discussion here
if they look up rpy2 there. I think that's enough taken all together.

@jtratner
Copy link
Contributor

Great

@ghost
Copy link

ghost commented Dec 28, 2013

... I'm assuming that PR will come from you? :)

@jtratner
Copy link
Contributor

Yes

@ghost
Copy link

ghost commented Jan 24, 2014

0.13.0 is out, the dev version of rpy2 has been fixed. I hope @jtratner submitted that PR.

closing.

@ghost ghost closed this as completed Jan 24, 2014
@lgautier
Copy link
Contributor

He did. I just merged it (rpy2 branch version_2.3.x, and grafted onto version_2.4.x).

@ghost
Copy link

ghost commented Jan 25, 2014

excellent. Thank you both.

@ghost
Copy link

ghost commented Jan 25, 2014

Is it on pypi yet?

@lgautier
Copy link
Contributor

Not yet. Probably some time over the week-end.
(Drone is currently looking at the candidate rpy2-2.3.9: https://drone.io/bitbucket.org/lgautier/rpy2/19 ).

@lgautier
Copy link
Contributor

Looking fine with Python 2.7, but causing segfault with Python 3.3 and numpy 1.7.1 (https://drone.io/bitbucket.org/lgautier/rpy2/20). I cannot reproduce the segfault locally though.

Any chance someone else could try out ?

@ghost
Copy link

ghost commented Jan 25, 2014

Rerun the build, if it consistently segfaults, I'll take a look.

@jtratner
Copy link
Contributor

Thanks for putting that fix in @lgautier!

@lgautier
Copy link
Contributor

@y-p The build on drone.io is consistently ending with a segfault on Python 3.3, while the same code does not locally. I'll hold the release of rpy2-2.3.9 until at least one of the 2 things happens: others report to have it working fine, or the problem on drone.io is identified (and fixed).

@ghost
Copy link

ghost commented Jan 25, 2014

That makes good sense. I'll try to repro on my box.

@ghost
Copy link

ghost commented Jan 25, 2014

I can reproduce the segfault, and I can reproduce it prior to @jtratner's commit,
on 3.3 with numpy 1.7.1. namely with rpy2-419ca01.

~/src/rpy2/ λ nosetests                        
ENotImplementedError: Device activation not implemented.
Closing device.
NotImplementedError: Device closing not implemented.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.NotImplementedError: Device activation not implemented.
Closing device.
NotImplementedError: Device closing not implemented.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.NotImplementedError: Device activation not implemented.
Closing device.
NotImplementedError: Device closing not implemented.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.NotImplementedError: Device activation not implemented.
Closing device.
NotImplementedError: Device closing not implemented.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.Closing device.
--> skipping PyMem_Free(((PyGrDevObject *)self)->grdev) 
.F.NotImplementedError: Device mode not implemented.
NotImplementedError: Device mode not implemented.
.E[1]    29630 segmentation fault (core dumped)  nosetests

You should make sure the CI output gives your more information.

@lgautier
Copy link
Contributor

It might well be the case: there was no build on drone for the branch version_2.3.x prior to the merge of the pull request.

Now this is quite odd:

  • version 2.3.8 has been around for quite some time, and the only change is the pull request.
  • the message you report correspond to (broken) code that should not be in version_2.3.x. Did you try switch to the right branch ?
hg clone -b version_2.3.x https://bitbucket.org/lgautier/rpy2;
cd rpy2

or

hg clone https://bitbucket.org/lgautier/rpy2;
cd rpy2;
hg update version_2.3.x
  • what version of R are you building and trying this with ?

@ghost
Copy link

ghost commented Jan 25, 2014

I probably used the wrong version of nose, I compiled for 3.3 but invoked the py2 nose.
That did produce a segfault, but honestly I can't say what did it.

Trying again:

~/src/rpy2/ λ python3 -m rpy2.tests
rpy2 version: 2.3.9   
built against R version: 3-0.2--63987
.................................................................................................................................................................................................................................................................................................E...........................................................
======================================================================
ERROR: testPandas2ri (rpy2.robjects.tests.testPandasConversions.PandasConversionsTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib64/python3.3/site-packages/rpy2/robjects/tests/testPandasConversions.py", line 74, in testPandas2ri
    pandas_df = robjects.conversion.ri2py(rdataf)
  File "/usr/lib64/python3.3/site-packages/rpy2/robjects/pandas2ri.py", line 63, in ri2pandas
    raise NotImplementedError("Conversion from rpy2 DataFrame to pandas' DataFrame")
NotImplementedError: Conversion from rpy2 DataFrame to pandas' DataFrame

----------------------------------------------------------------------
Ran 349 tests in 4.889s

FAILED (errors=1)

... but no segfault. This is with db6c132, the current tip of the version_2.3.x branch.
on 64bit fedora 20.

That's all I have.

@jtratner
Copy link
Contributor

For what it's worth, I get a bunch of segfaults on the released version
with pandas 0.12 installed.

@lgautier
Copy link
Contributor

This must be version-specific somewhere, making it OK on some machine (my computer, @y-p 's box) but not others (drone's VM, your machine).
I cannot seem to get a segfault with pandas 0.12.0 here.

C compiler ? R version ? something else ?

(The issue tracker for rpy2 on bitbucket might be a better place to follow up on this)

@ghost
Copy link

ghost commented Jan 25, 2014

I've seen issues raised by differences between debian and ubuntu libc. I run fedora, what do you and
drone use?

@lgautier
Copy link
Contributor

Ubuntu (me 13.10, not sure about the version used by drone)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

5 participants