Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R data frame #2443

Closed
wants to merge 4 commits into from
Closed

R data frame #2443

wants to merge 4 commits into from

Conversation

arokem
Copy link

@arokem arokem commented Sep 27, 2012

This PR implements a custom python => R converter. The behavior is exactly as before (call to np.asarray), except in case that the python input is a pandas DataFrame object. In that case, we preprocess the dtype of the resulting struct-array, to account for the fact that pandas casts strings into 'object' dtype. Here we will assume that if you got something with 'object' as it's dtype, you want it to be a string.

See example here:

http://nbviewer.ipython.org/urls/raw.github.com/arokem/ipython/r_data_frame/docs/examples/notebooks/rmagic_extension.ipynb

This still seems somewhat limited in scope (what if you want the 'object'), so I would be happy to get feedback on this.

@fperez
Copy link
Member

fperez commented Sep 28, 2012

Great! Can't look at it right now, but pinging @jonathantaylor and @wesm who may actually care quite a bit about this functionality and might have some feedback.

arokem and others added 4 commits October 8, 2012 21:17
Convert the dtype of the resulting array, so that 'object' dtypes get converted
into strings, this is to counter the fact that strings get converted to object
in constructing the DataFrame.
@bfroehle
Copy link
Contributor

@arokem: If I'm reading this patch correctly it attempts to improve the R <-> Python conversion when Pandas is installed. Is there a reason this belongs in IPython instead of upstream in rpy2?

@arokem
Copy link
Author

arokem commented Oct 12, 2012

Yes - another way is to change the way that pandas itself deals with strings. This fix is needed because strings get the 'object' data type in the DF representation. I tried asking on the pydata mailing list why strings get cast into 'objects' in DataFrame class instances, but haven't gotten an answer yet. I should try asking again.

@arokem
Copy link
Author

arokem commented Nov 1, 2012

OK - here's the latest from the pydata mailing list:

https://groups.google.com/forum/?fromgroups=#!topic/pydata/WU8Pq_e881k

Seems like changing pandas behavior is a non-starter, but it also sounds like this hack is not necessary once things get sorted out (presumably at the rpy2 level? About 95% of what Wes said there went over my head).

For the time being, I am using this PR for my own use-cases, but I am not sure this should be merged into ipython itself.

@bfroehle
Copy link
Contributor

Hmm, I'm not really sure how to handle this issue. On the one hand this seems like an odd bit of code to add to IPython, but on the other hand it seems necessary for some work with strings. I'm also concerned that non-string objects might also be in the pandas DataFrame, causing the NumPy cast to go awry.

The good news is that it's also relatively simple to replace the pyconverter in the RMagics class at runtime. A simple custom extension could, for example, also provide this functionality:

def converter(x):
    ...
    # as before
    ...

def load_extension(ip):
    """Load rmagic extension and inject a custom Python -> R converter."""
    ip.extension_manager.load_extension('rmagic')
    ip.magics_manager.registry['RMagics'].pyconverter = converter

@ellisonbg
Copy link
Member

This PR has been inactive for > 2 months. Can we close it and open an issue to track the broader work on this feature?

https://github.com/ipython/ipython/wiki/Policy:-Closing-pull-requests

@arokem
Copy link
Author

arokem commented Jan 14, 2013

I think that's OK. I am still using this branch myself for my own uses, but
I haven't had the bandwidth to solve all the issues related to it for it to
be integrated into ipython.

On Mon, Jan 14, 2013 at 10:53 AM, Brian E. Granger <notifications@github.com

wrote:

This PR has been inactive for > 2 months. Can we close it and open an
issue to track the broader work on this feature?

https://github.com/ipython/ipython/wiki/Policy:-Closing-pull-requests


Reply to this email directly or view it on GitHubhttps://github.com//pull/2443#issuecomment-12233474.

@bfroehle
Copy link
Contributor

@arokem Please feel free to open an issue to track any thoughts you have on this matter.

Thanks!

@ellisonbg
Copy link
Member

We are closing this as further discussion/design is needed before moving forward. Here is an issue to track it: #2787

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants