Skip to content
This repository has been archived by the owner on Jul 27, 2022. It is now read-only.

Propose donating object_store_rs to Apache Arrow project #41

Closed
alamb opened this issue Jun 28, 2022 · 5 comments
Closed

Propose donating object_store_rs to Apache Arrow project #41

alamb opened this issue Jun 28, 2022 · 5 comments

Comments

@alamb
Copy link
Contributor

alamb commented Jun 28, 2022

TDLR I would like to propose donating this project to the Apache Arrow project https://arrow.apache.org/

Rationale

  1. A common, high quality object store abstraction for communicating with various remote object stores is useful for a range of projects and usecases.
  2. A library with a common API to access remote object stores is directly aligned with the Arrow mission of providing building blocks for modern high performance analytics systems
  3. The clear governance of Apache Arrow offers the best chance to build a unified and strong community around this crate, hopefully both increasing its adoption and attracting community contributions for its long term evolution and maintenance

Background

Object stores are increasing important for analytic systems as more data is located in such systems; @yjshen donated an object store abstraction to Arrow Datafusion to allow Datafusion to read from local files, S3, hdfs, and others. In apache/datafusion#2489 the DataFusion community is proposing migrating from this original object store abstraction, part of the DataFusion project (part of apache arrow) to the code in this crate.

Provenance

The code in this crate was originally developed by InfluxData, largely by @carols10cents, for InfluxDB IOx. @tustvold has since extracted the code and released it as its own crate. Upon consideration, as described above, for the long term health of both this code and the arrow-rs and arrow-datafusion projects, moving it to be an official part of Arrow would be beneficial and we would like to donate it to the community

There is additional background here apache/datafusion#2677 (comment)

This ticket hopefully can serve as a discussion on the form this donation can take. Some options:

  1. Move code into the arrow-datafusion repository
  2. Move code into the arrow-rs repository
  3. Move code to an apache/arrow-object-store-rs repository
  4. Move code to datafusion-contrib
@tustvold
Copy link
Contributor

My preference would be for 2 or 3, so that we can potentially provide batteries included APIs within arrow-rs for interacting with object stores (behind a feature gate)

@alamb
Copy link
Contributor Author

alamb commented Jun 28, 2022

Cross reference: Arrow dev mailing list post https://lists.apache.org/thread/l2103pl85xkyq10c96z73d5t68f6tthd

@rdettai
Copy link

rdettai commented Jul 1, 2022

I would be more incline to 1 or 2. I think that 3 creates an unnecessary increase of the number of interconnected repos.

Thanks a lot for the donation!

@alamb
Copy link
Contributor Author

alamb commented Jul 6, 2022

The mailing list discussion and this topic seem to be positive so far -- I will propose a formal vote and figure out the IP clearance process in the next few days

@alamb
Copy link
Contributor Author

alamb commented Jul 8, 2022

This proposal seems to have community support -- I have filed apache/arrow-rs#2030 to track the work to do so (including official acceptance and IP clearance). Closing this issue for now and will track future work in apache/arrow-rs#2030

Thank you all for your comments

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants