Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Apache Arrow #51

Open
MedAnd opened this issue Feb 9, 2019 · 3 comments
Open

Support for Apache Arrow #51

MedAnd opened this issue Feb 9, 2019 · 3 comments

Comments

@MedAnd
Copy link

MedAnd commented Feb 9, 2019

Support for Apache Arrow which is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication.

Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.

Related issues:

@apscomp
Copy link

apscomp commented Aug 26, 2019

I would like to second this... as apache arrow is slowly becoming mainstream.

@AlgorithmsAreCool
Copy link
Contributor

So Arrow is definitely gaining steam, but how would an arrow integration look for Trill?

et say they convert the internal columnar format to Arrow. Trill is still going to be mutating those structures constantly since it is an incremental platform, how would an integration work with those internal structures safely and what would it do with them?

@AlgorithmsAreCool
Copy link
Contributor

A few months on, I can answer my own questions here.

The most obvious point is to open the door to very high performance interop with other applications or runtimes. Someone could write a database plugin that uses Trill operations to compute data.

Furthermore, bulk Arrow structures could be stored directly on disk and accessed via memory mapping. Or systems can make use of standardized readers to import CSVs or Parquet files into arrow structures.

Overall, using a standardized memory layout allows Trill to be integrated more easily and more efficiently with a wider array of projects. It is a very enticing benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants