Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[r] Enhance support for (partial) table extraction #389

Merged
merged 4 commits into from
Oct 11, 2022

Conversation

eddelbuettel
Copy link
Contributor

This PR adds support for access via the SOMAReader class to extract selected columns in a single pass. Query condition and range support will be added next.

The retrieved data structure is a from an Arrow-support package that works only on the lighterweight C interface to Arrow without linking. Once data has been accessed, we can easily transfer to different Arrow data structures:

> library(tiledbsoma)
> uri <- "test/soco/pbmc3k_processed/obs"  # local data set
> columns <- c("n_counts", "n_genes", "louvain")
> z <- export_recordbatch(uri, columns)
[2022-10-07 18:00:12.785] [tiledbsoma] [Process: 66049] [Thread: 66049] [info] Reading from test/soco/pbmc3k_processed/obs
[2022-10-07 18:00:12.795] [tiledbsoma] [Process: 66049] [Thread: 66049] [info] Read complete with 2638 obs and 3 cols
[2022-10-07 18:00:12.795] [tiledbsoma] [Process: 66049] [Thread: 66049] [info] Accessing n_counts at 0
[2022-10-07 18:00:12.795] [tiledbsoma] [Process: 66049] [Thread: 66049] [info] Accessing n_genes at 1
[2022-10-07 18:00:12.795] [tiledbsoma] [Process: 66049] [Thread: 66049] [info] Accessing louvain at 2
> 
> rb <- arch::from_arch_array(z, arrow::RecordBatch)   ## Arrow RecordBatch
> rb
RecordBatch
2638 rows x 3 columns
$ <float not null>
$ <int64 not null>
$ <large_binary not null>
> 
> tb <- arrow::as_arrow_table(arch::from_arch_array(z, arrow::RecordBatch))  ## Arrow Table
> tb
Table
2638 rows x 3 columns
$ <float not null>
$ <int64 not null>
$ <large_binary not null>
> 

@shortcut-integration
Copy link

This pull request has been linked to Shortcut Story #22234: Improved tiledb-soma R ops.

@eddelbuettel
Copy link
Contributor Author

(rebased and force-pushed)

@eddelbuettel eddelbuettel changed the title Enhance support for (partial) table extraction [r] Enhance support for (partial) table extraction Oct 7, 2022
Copy link
Member

@johnkerl johnkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

apis/r/src/rinterface.cpp Show resolved Hide resolved
apis/r/src/rinterface.cpp Outdated Show resolved Hide resolved
@eddelbuettel eddelbuettel merged commit 64ae175 into main Oct 11, 2022
@eddelbuettel eddelbuettel deleted the de/sc-22234/enhance branch October 11, 2022 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants