Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpy2-arrow and forthcoming release of Arrow #2

Closed
paleolimbot opened this issue Dec 22, 2021 · 4 comments
Closed

rpy2-arrow and forthcoming release of Arrow #2

paleolimbot opened this issue Dec 22, 2021 · 4 comments

Comments

@paleolimbot
Copy link
Collaborator

First, this is awesome!

In the upcoming release of Arrow scheduled a few weeks from now, we've updated the recommended way that pointers to Arrow C data structures are specified in both R and Python. As part of this change, we also updated a few internal functions to the R package (notably, allocate_arrow_schema|array|array_stream()) that it appears are getting used in rpy2-arrow.

I'm happy to PR a fix that will work with both the old and new versions of arrow (for R) and pyarrow but wanted feedback on our change and the way we envisioned this kind of thing being done since you're one of a few main users of the C API bridge code! The PRs in which this changed were apache/arrow#11919 and apache/arrow#12011.

Would something like this make sense as a pattern for converting R Arrow arrays to Python arrow arrays? There may be a better way than str(int(schema_ptr_value)) if rpy2 can convert cffi structures to ExternalPointers or create integer64 objects from the bit64 package.

import pyarrow
from pyarrow.cffi import ffi

def rarrow_to_py_datatype(obj):
    schema_ptr = ffi.new("struct ArrowSchema*")
    schema_ptr_value = ffi.cast("uintptr_t", schema_ptr)
    obj["export_to_c"](str(int(schema_ptr_value)))
    return pyarrow.lib.DataType._import_from_c(schema_ptr_value)

from rpy2.robjects import packages
rarrow = packages.importr("arrow")
r_datatype = rarrow.int32()
r_datatype
#> <rpy2.robjects.environments.Environment object at 0x15bf58100> [RTYPES.ENVSXP]
#> R classes: ('Int32', 'FixedWidthType', 'DataType', 'ArrowObject', 'R6')
#> n items: 16

py_datatype = rarrow_to_py_datatype(r_datatype)
py_datatype
#> 
DataType(int32)
@lgautier
Copy link
Member

Thanks for the heads-up. The easiest on my end is if there a PR I can review.

The snippet of code you propose would come to replace the function of the same name here, correct?
https://github.com/rpy2/rpy2-arrow/blob/main/rpy2_arrow/pyarrow_rarrow.py#L133

I looks like the main change on the rpy2-arrow side would be going from

rarrow.ExportType(obj, schema_ptr)

to

schema_ptr_value = ffi.cast("uintptr_t", schema_ptr)
obj["export_to_c"](str(int(schema_ptr_value)))

there may be a better way than str(int(schema_ptr_value)) if rpy2 can convert cffi structures to ExternalPointers or create integer64 objects from the bit64 package.

What is an ExternalPointer here? An R EXTPTRSEXP ? There is class SexpExtPtr in rpy2.rinterface.

Otherwise, why cast to uintptr_t, then to a string, and to finally pass to the export_to_c in R. Couldn't a conversion of schema_ptr to bytes just work and spare the parsing of a string in export_to_c?

obj['export_to_c'] = (schema_ptr_value.to_bytes(8, 'little'))

@paleolimbot
Copy link
Collaborator Author

Thanks for your detailed response!

I agree that it's a bit of a goose chase to go from ffi.new("struct ArrowSchema*") to uintptr_t to str. The intention behind the export functions is that the caller allocates and manages the memory...the R package makes it possible (after the latest PR) to represent pointers as strings, EXTPTRSXP, or bit64::integer64(). I'm not familiar with the set of possible conversions in rpy2 to know what the most readable way to do this is...I would expect that it would be a natural conversion straight from cffi.new() to EXTPTRSXP but I'm a poor candidate to make that happen. In Arrow tests for the Python API we use ffi.cast("uintptr_t", schema_ptr), hence its use here.

I'll prepare a PR for review when I'm back at work on Monday!

@lgautier
Copy link
Member

Thanks for the PR. It will be in rpy2-arrow v0.0.5 and normally released on pypi some time today.

@paleolimbot
Copy link
Collaborator Author

Great! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants