Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Series Changes breaks rpy2? #5698
Comments
|
I think they might need to have a slightly different conversion function, can you post to their dev site? |
|
ok...well make this an API issue here to 'track' it. |
|
@jreback, this could be a big deal once 0.13.0 final is released. No word from rpy2? Moved to 0.13 to make final decision right before release is due. Would be good to preempt This also breaks the docs. |
|
@janschulz can u ping rpy2? |
|
cc @lgautier, we're anxious to avoid releasing a point release with no mitigation for this. |
|
could be as simple as checking for |
|
Hope it is, but it needs to happen on rpy2's side, no? |
|
@jreback why doesn't I think we might need to do that to be able to actually work with rpy2. |
|
not necessary as array is called but if u need it explicitly then go for it I don't have rpy so really can't even try it |
|
here is an example |
|
@janschulz could you post a really simple example? I can't seem to get your notebook to work. I think the fix for rpy2 is actually really simple. |
|
@jreback could we just do: def __array_interface__(self):
return self.__array__().__array_interface__()? |
|
it's a property (don't need function call) |
|
@jtratner I also only worked with rpy2 the first time with that notebook. :-) The notebook is from @kevindavenport Here is a small notebook with three cells: --- setup ---
%load_ext rmagic
import pandas as pd
df = pd.DataFrame({"x":[1,2,3,4,5], "y":[1,2,3,2,1]})
vals = df.values
cols = df.columns
---
--- using df.values: works ---
%%R -i vals,cols # list object to be transferred to python here
install.packages("ggplot2") # Had to add this for some reason, shouldn't be necessary
library(ggplot2)
df = data.frame(vals)
names(df) <- c(cols)
plot = ggplot(df, aes(x = x, y = y)) +
geom_point(alpha = .8, color = 'dodgerblue',size = 5)
print(plot)
---
--- df directly: does not work ---
%%R -i df # list object to be transferred to python here
install.packages("ggplot2") # Had to add this for some reason, shouldn't be necessary
library(ggplot2)
df = data.frame(df)
plot = ggplot(df, aes(x = x, y = y)) +
geom_point(alpha = .8, color = 'dodgerblue',size = 5)
print(plot)
--- |
|
BTW: I don't think that adding a property is enough: currently the error happens because def numpy2ri(o):
""" Augmented conversion function, converting numpy arrays into
rpy2.rinterface-level R structures. """
if isinstance(o, numpy.ndarray):
# numpy handling
else:
res = ro.default_py2ri(o)As the pandas part ( I did a small change: def numpy2ri(o):
""" Augmented conversion function, converting numpy arrays into
rpy2.rinterface-level R structures. """
if isinstance(o, (numpy.ndarray, pd.Series)):
# numpy handling
else:
res = ro.default_py2ri(o)and I got the following error: ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-3c6069b19c27> in <module>()
----> 1 get_ipython().run_cell_magic(u'R', u'-i df # list object to be transferred to python here', u'install.packages("ggplot2") # Had to add this for some reason, shouldn\'t be necessary\nlibrary(ggplot2)\ndf = data.frame(df)\nplot = ggplot(df, aes(x = x, y = y)) + \ngeom_point(alpha = .8, color = \'dodgerblue\',size = 5)\nprint(plot)')
C:\portabel\Python27\lib\site-packages\IPython\core\interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)
2141 magic_arg_s = self.var_expand(line, stack_depth)
2142 with self.builtin_trap:
-> 2143 result = fn(magic_arg_s, cell)
2144 return result
2145
C:\portabel\Python27\lib\site-packages\IPython\extensions\rmagic.pyc in R(self, line, cell, local_ns)
C:\portabel\Python27\lib\site-packages\IPython\core\magic.pyc in <lambda>(f, *a, **k)
191 # but it's overkill for just that one bit of state.
192 def magic_deco(arg):
--> 193 call = lambda f, *a, **k: f(*a, **k)
194
195 if callable(arg):
C:\portabel\Python27\lib\site-packages\IPython\extensions\rmagic.pyc in R(self, line, cell, local_ns)
585 except KeyError:
586 raise NameError("name '%s' is not defined" % input)
--> 587 self.r.assign(input, self.pyconverter(val))
588
589 if getattr(args, 'units') is not None:
C:\portabel\Python27\lib\site-packages\rpy2\robjects\functions.pyc in __call__(self, *args, **kwargs)
84 v = kwargs.pop(k)
85 kwargs[r_k] = v
---> 86 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
C:\portabel\Python27\lib\site-packages\rpy2\robjects\functions.pyc in __call__(self, *args, **kwargs)
29
30 def __call__(self, *args, **kwargs):
---> 31 new_args = [conversion.py2ri(a) for a in args]
32 new_kwargs = {}
33 for k, v in kwargs.iteritems():
C:\portabel\Python27\lib\site-packages\rpy2\robjects\pandas2ri.pyc in pandas2ri(obj)
26 od[name] = StrVector(values)
27 else:
---> 28 od[name] = ro.conversion.py2ri(values)
29 return DataFrame(od)
30 elif isinstance(obj, PandasIndex):
C:\portabel\Python27\lib\site-packages\rpy2\robjects\pandas2ri.pyc in pandas2ri(obj)
49 else:
50 # converted as a numpy array
---> 51 res = original_conversion(obj)
52 # "index" is equivalent to "names" in R
53 if obj.ndim == 1:
C:\portabel\Python27\lib\site-packages\rpy2\robjects\numpy2ri.py in numpy2ri(o)
35 if o.dtype.kind in _kinds:
36 # "F" means "use column-major order"
---> 37 vec = SexpVector(o.ravel("F"), _kinds[o.dtype.kind])
38 dim = SexpVector(o.shape, INTSXP)
39 res = ro.r.array(vec, dim=dim)
TypeError: ravel() takes exactly 1 argument (2 given)as o is in this case a |
|
Changing def ravel(self, order=None):
return self.values.ravel(order)will, together with the fix for isinstance above, produce a plot. So if pandas would add the order to |
|
@janschulz easy enough....on ravel; you can do a run-time path for |
|
@jtratner What is a run-time path? |
|
run-time patch.....e.g. I presume this is defined somewhere in |
|
@janschulz ok...ravel fix is in....give another try |
kevindavenport
commented
Dec 20, 2013
|
I have experience with rpy, how can I be of service? |
|
@jreback , you're not suggesting fixing by monkey patching rpy2 during numpy import are you? |
|
@y-p where exactly is |
|
in 0.13 (releasing imminently), Series is no longer a sub-class of so apparently their is some isinstance detection going on with |
|
We need either a way to modify pandas to keep it compatible or a reasonable PR submitted As much as I'm in a position to urge against it: monkey-patching another library is not an |
|
I'm pretty sure I have a fix, I just didn't have an example to work with. Should be a pretty trivial fix too. |
|
The patch is easy: --- rpy2\robjects\numpy2ri.py Fri Dec 20 17:35:43 2013
+++ rpy2\robjects\numpy2ri.py.new Fri Dec 20 17:42:42 2013
@@ -3,6 +3,7 @@
import rpy2.rinterface as rinterface
from rpy2.rinterface import SexpVector, INTSXP
import numpy
+import pandas
from rpy2.robjects.vectors import DataFrame, Vector, ListVector
@@ -26,7 +27,7 @@
def numpy2ri(o):
""" Augmented conversion function, converting numpy arrays into
rpy2.rinterface-level R structures. """
- if isinstance(o, numpy.ndarray):
+ if isinstance(o, (numpy.ndarray, pandas.Series)):
if not o.dtype.isnative:
raise(ValueError("Cannot pass numpy arrays with non-native byte orders at the moment."))Source is here: https://bitbucket.org/lgautier/rpy2/src/e2d25d5bd6254c5e381d87c46c90cac30f18b5b2/rpy/robjects/numpy2ri.py?at=version_2.4.x I don't have hg, so I', not currently able to do a PR. |
|
I'd change it to: obj = getattr(obj, 'array', obj) That way there's no pandas dep and works with anything that supports the |
|
Edit: |
|
The below patch "works", but rpy2 is a beast I don't really want to touch anymore (no git, fails to compile...) I would suggest that pandas adds a note to the release file with the suggested patch: --- rpy2\robjects\numpy2ri.py Fri Dec 20 17:35:43 2013
+++ rpy2\robjects\numpy2ri.py.new Fri Dec 20 17:42:42 2013
def numpy2ri(o):
""" Augmented conversion function, converting numpy arrays into
rpy2.rinterface-level R structures. """
- if isinstance(o, numpy.ndarray):
+ if hasattr(o, "__array__"):
if not o.dtype.isnative:
raise(ValueError("Cannot pass numpy arrays with non-native byte orders at the moment."))
|
|
Updated the release notes to reference this issue. Bumping the issue to 0.14 for tracking. |
|
@janschulz : rpy2 has a build bot for continuous integration, and according to it, it is both building and passing all unit tests. https://drone.io/bitbucket.org/lgautier/rpy2/15 . Regarding the usage of a specific version control system, be very wary of using Python. I heard that the primary repository is not Git. ;-) |
|
@y-p I share your general reluctance to monkey-patch. There might justifiable usage of it, if documented. Ideally there is no backward incompatible changes with a release series for rpy2 (current being 2.3.x). This is a problem for pandas, but not unexpected when pandas is itself making changes (one cannot expect third-party software to be forward-compatible with then-unannounced changes). I am fine with having the next series of rpy2 (2.4.x) compatible with the latest release of pandas, but if you really wish to have pandas work with rpy2-2.3.x monkey patching is going to be the only way. You could test the rpy2 version, and if older than 2.4.0, monkey patch with something like:
|
|
@lgautier glad to see the patch merged. When is the next release out? Note your CI service seems to be testing against pypi pandas, rather then git master. Documenting the monkey patch as a workaround for users to apply knowingly is totally fine. |
|
@y-p I hoped to have a beta around now, but I have been tremendously busy with other things. The current rpy2-2.4.x does not yet have everything I hoped to have, but is rather robust. Just missing stuff. Testing against released pandas is to keep the parameters under control. The development of pandas and rpy2 is not specifically coordinated, and I am busy enough with keeping rpy2's own codebase passing the tests without the need for a fast moving pandas target. As soon as you release pandas on pypi, the rpy2 CI will pick it up (and force the rpy2 devs to look at any problem arising from that). In the meantime, we might have to rely on communications such as the one we are having (an alternative would to create a pandas-dev automated build, but drone does not seem to allow multiple build bots per repository). Patching older rpy2 releases happens as time permits, but pull requests (preferably with additional unit tests where relevant) are welcome. |
|
@lgautier if @janschulz keeps having trouble I can submit a pull request on bitbucket, because I have everything installed (just moving right now so I don't have a lot of free time). @y-p @jreback and other pandas peeps - Maybe we should consider making up a script that tests packages that we know depend on pandas so we can either make changes to pandas or notify those devs so they can work on a solution pre-release. |
|
@lgautier, I agree on most points (especially your time being your own) but we would like @jtratner, would you mind handling a backport of the patch + PR for the 2.3.x branch of rpy2 + testing |
|
@y-p okay, I'll try to get that done tonight. |
|
Thanks! let me know if you need to hand it off after all. |
|
Update: fix is relatively simple now that we pushed the ravel PR. I'm just checking to make sure that it ends up with the correct results and writing up some additional test cases. I'm not totally clear on the correct behavior, so I'm switching back and forth to make sure nothing broke. |
|
(and I'm on the same page with @janschulz 's one liner fix, with a small modification to make it work without pandas) - so is the goal to have a monkey patch in a gist that we can point to for 0.13? An actual function within pandas? Just not clear how we handle incompatibilities like this and making it less painful for people with legacy setups. |
|
2.4.x is already patched. @lgautier asked for a PR to release a new minor release of 2.3.x. so users The release notes already point to this issue and can and users can find the fix and discussion here |
|
Great |
|
... I'm assuming that PR will come from you? :) |
|
Yes |
|
0.13.0 is out, the dev version of rpy2 has been fixed. I hope @jtratner submitted that PR. closing. |
y-p
closed this
Jan 24, 2014
|
He did. I just merged it (rpy2 branch version_2.3.x, and grafted onto version_2.4.x). |
|
excellent. Thank you both. |
|
Is it on pypi yet? |
|
Not yet. Probably some time over the week-end. |
|
Looking fine with Python 2.7, but causing segfault with Python 3.3 and numpy 1.7.1 (https://drone.io/bitbucket.org/lgautier/rpy2/20). I cannot reproduce the segfault locally though. Any chance someone else could try out ? |
|
Rerun the build, if it consistently segfaults, I'll take a look. |
|
Thanks for putting that fix in @lgautier! |
|
@y-p The build on drone.io is consistently ending with a segfault on Python 3.3, while the same code does not locally. I'll hold the release of rpy2-2.3.9 until at least one of the 2 things happens: others report to have it working fine, or the problem on drone.io is identified (and fixed). |
|
That makes good sense. I'll try to repro on my box. |
|
I can reproduce the segfault, and I can reproduce it prior to @jtratner's commit,
You should make sure the CI output gives your more information. |
|
It might well be the case: there was no build on drone for the branch version_2.3.x prior to the merge of the pull request. Now this is quite odd:
or
|
|
I probably used the wrong version of nose, I compiled for 3.3 but invoked the py2 nose. Trying again:
... but no segfault. This is with db6c132, the current tip of the That's all I have. |
|
For what it's worth, I get a bunch of segfaults on the released version |
|
This must be version-specific somewhere, making it OK on some machine (my computer, @y-p 's box) but not others (drone's VM, your machine). C compiler ? R version ? something else ? (The issue tracker for rpy2 on bitbucket might be a better place to follow up on this) |
|
I've seen issues raised by differences between debian and ubuntu libc. I run fedora, what do you and |
|
Ubuntu (me 13.10, not sure about the version used by drone) |
janschulz commentedDec 13, 2013
It seems that the changes to to Series break the data conversion to R: running this Notebook doesn't work anymore with some dev version from last week:
https://gist.github.com/kevindavenport/7771325/raw/87ab5603f406729c6a3866f95af9a1ebfedcf619/Mahalanobis_Outliers.ipynb
The resulting error is this:
I'm not sure if this is something pandas cares, but even if not it would be nice to mention it in the release notes.