-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(r): Add arrow package integration #85
Conversation
@anthonynorth if you have the bandwidth I'd be grateful for any of the thoughts you have time to write here! |
|
||
as_arrow_array.geoarrow_vctr <- function(x, ..., type = NULL) { | ||
chunked <- as_chunked_array.geoarrow_vctr(x, ..., type = type) | ||
if (chunked$num_chunks == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does arrow::as_array_array()
copy if we have a single chunk and we're avoiding that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure that it does (which should be fixed in arrow, but I haven't gotten there yet!). I only found out recently that concat_arrays()
is the only way to force a copy of an Arrow thing.
} | ||
} | ||
|
||
as_chunked_array.geoarrow_vctr <- function(x, ..., type = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be using indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😬 definitely! I fixed this and added a test.
r/geoarrow/R/arrow-compat.R
Outdated
if (is.null(type)) { | ||
type <- arrow::as_data_type(attr(x, "schema", exact = TRUE)) | ||
chunks <- attr(x, "chunks", exact = TRUE) | ||
} else { | ||
stream <- as_nanoarrow_array_stream(x, schema = as_nanoarrow_schema(type)) | ||
chunks <- nanoarrow::collect_array_stream(stream, validate = FALSE) | ||
type <- arrow::as_data_type(type) | ||
} | ||
|
||
schema <- as_nanoarrow_schema(type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using this conversion pattern in as_geoarrow_vctr() also. Should we extract it into a utility, or just convert x with as_geoarrow_vctr()? E.g.
if (is.null(type)) { | |
type <- arrow::as_data_type(attr(x, "schema", exact = TRUE)) | |
chunks <- attr(x, "chunks", exact = TRUE) | |
} else { | |
stream <- as_nanoarrow_array_stream(x, schema = as_nanoarrow_schema(type)) | |
chunks <- nanoarrow::collect_array_stream(stream, validate = FALSE) | |
type <- arrow::as_data_type(type) | |
} | |
schema <- as_nanoarrow_schema(type) | |
if (!is.null(type)) { | |
array <- arrow::as_chunked_array(as_geoarrow_vctr(x, schema = as_nanoarrow_schema(type))) | |
return(array) | |
} | |
schema <- attr(x, "schema", exact = TRUE) | |
type <- arrow::as_data_type(schema) | |
chunks <- attr(x, "chunks", exact = TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that slices are involved changed this a bit! I think there is more room for optimization (in particular, Arrow C++ doesn't import/export chunked arrays as array streams, which would help a lot!).
r/geoarrow/R/arrow-compat.R
Outdated
arrays <- vector("list", length(chunks)) | ||
for (i in seq_along(arrays)) { | ||
tmp_schema <- nanoarrow::nanoarrow_allocate_schema() | ||
nanoarrow::nanoarrow_pointer_export(schema, tmp_schema) | ||
tmp_array <- nanoarrow::nanoarrow_allocate_array() | ||
nanoarrow::nanoarrow_pointer_export(chunks[[i]], tmp_array) | ||
arrays[[i]] <- arrow::Array$import_from_c(tmp_array, tmp_schema) | ||
} | ||
|
||
arrow::ChunkedArray$create(!!!arrays, type = type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the following isn't equivalent? My lack of knowledge of arrow and nanoarrow internals is probably obvious with this question.
arrow::chunked_array(!!!chunks, type = type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We added chunked_array()
pretty recently but it's definitely the suggested constructor!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what these lines are for? Can we just create the array from chunks
?
geoarrow-c/r/geoarrow/R/arrow-compat.R
Lines 26 to 33 in a9293f3
arrays <- vector("list", length(chunks)) | |
for (i in seq_along(arrays)) { | |
tmp_schema <- nanoarrow::nanoarrow_allocate_schema() | |
nanoarrow::nanoarrow_pointer_export(schema, tmp_schema) | |
tmp_array <- nanoarrow::nanoarrow_allocate_array() | |
nanoarrow::nanoarrow_pointer_export(chunks[[i]], tmp_array) | |
arrays[[i]] <- arrow::Array$import_from_c(tmp_array, tmp_schema) | |
} |
r/geoarrow/R/sf-compat.R
Outdated
} | ||
|
||
st_as_sf.ArrowTabular <- function(x, ..., promote_multi = FALSE) { | ||
df <- tibble::as_tibble(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a tibble?
df <- tibble::as_tibble(x) | |
df <- as.data.frame(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I forgot that Arrow doesn't "depend" on tibble (although it returns them quite a lot of the time)
Looks good. A couple of very minor things that might be improvements (see comments) |
This adds an
arrow::ExtensionType
implementation and a number of S3 methods to enable hopefully seamless interactions between the arrow and sf. There are almost certainly other methods that need to be added (maybe some forwk_wkt
and friends in the wk package) but this should be a start.Created on 2023-12-06 with reprex v2.0.2