Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Add deprecation notice (#1429) #1464

Closed
wants to merge 2 commits into from

Conversation

tustvold
Copy link
Contributor

Closes #1429

Following on from #1446 I think all that remains is to provide some form of communication to users regarding the relationship of the projects going forward. I believe what I've written covers the consensus from the various issues and discussions on the topic, but my intent here is purely to kickstart the conversation, not to dictate the message 😅

@codecov
Copy link

codecov bot commented Apr 12, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.02 ⚠️

Comparison is base (33f6ba1) 83.93% compared to head (bf21ee6) 83.92%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1464      +/-   ##
==========================================
- Coverage   83.93%   83.92%   -0.02%     
==========================================
  Files         387      387              
  Lines       41596    41596              
==========================================
- Hits        34913    34908       -5     
- Misses       6683     6688       +5     

see 5 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ritchie46
Copy link
Collaborator

I am not entirely sure about this. I totally agree on working together towards a similar memory/crate interface.

But as it stands this deprecates the whole arrow2 API, which is used by polars and is deeply entrenched in the project.
I am fine with pointing users tot he official arrow-rs project, but I want to keep maintaining the arrow2 API.

@sundy-li
Copy link
Collaborator

sundy-li commented Apr 12, 2023

We tried to migrate to arrow-rs last week. But we found the untyped ArrayData and NullBuffer of arrow-rs are harder to use than arrow2's Buffer<T> and Bitmap.

@tustvold
Copy link
Contributor Author

are harder to use than arrow2's Buffer and Bitmap.

I'm actively working on porting the remainder of this across in apache/arrow-rs#3879 perhaps you might be able to lend a hand identifying any missing functionality?

Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some wording suggestions to soften the tone and yet still communicate clearly to potential users that this crate isn't likely to maintained going forward

It might also be good to remove / update the https://github.com/jorgecarleitao/arrow2#faq section

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb
Copy link
Collaborator

alamb commented Apr 12, 2023

But as it stands this deprecates the whole arrow2 API, which is used by polars and is deeply entrenched in the project.
I am fine with pointing users tot he official arrow-rs project, but I want to keep maintaining the arrow2 API.

@ritchie46 can you help with the alternate wording? I didn't realize you still planned to keep maintaining the arrow2 crate. Maybe we have different ideas of what "maintaining" means 🤔

I think the best thing for users of arrow2 would be to be as explicit as possible about the state of the crate. For example:

  1. Will PRs be reviewed and accepted? (if so, by who?)
  2. Will new releases be made? (if so, by who? On what schedule?)

If you can provide some guidance on these issues I can take a shot at writing a section to summarize

@ritchie46
Copy link
Collaborator

@alamb polars is heavily depended on arrow2's typed arrays/offsets/buffer interfaces.

If we are able have this API on top of arrow-rs memory, I can progressively refactor to arrow-rs, but this should be a slow progress as this is not top priority for polars. My ideal situation is a way to improve the memory interop first (this is what @tustvold has been adding, so that is a great start).

As I am aware, there is currently not away to go forward whilst using arrow2's array API. Before this should be deprecated, there should be such an interface as a zero-cost abstraction on top of arrow-rs's buffers IMO.

If that is not something that is wanted by arrow-rs, that might also be fine. Maybe polars should fork arrow2 and continue maintaining it less publicly.

I am just scared of the core dependency being pulled from underneath of polars. Releases are still needed as I want bugs to be fixed and be able to publish on crates.io.

@tustvold
Copy link
Contributor Author

should be such an interface as a zero-cost abstraction on top of arrow-rs's buffers IMO

How zero-cost does this need to be? I guess my thinking was that #1446 was sufficiently low overhead as it doesn't copy the underlying buffers, and so would be sufficient to bridge between migrated and unmigrated parts of a codebase? What does a couple of additional reference counts matter when you have a kernel that is processing an array of thousands of rows? Perhaps I am missing something here?

The largest overhead at the moment is likely schema mapping, switching arrow2 to using the same schema types is definitely tractable should this prove a bottleneck.

there is currently not away to go forward whilst using arrow2's array API

My intention with this PR was not to say people should stop using these APIs today, I'm not suggesting we archive this repo 😅, but rather to communicate a direction of travel to the community. I'm not overly concerned about timescales, if it takes 6 months it takes 6 months, I just would like to be able to set both projects on a path towards unification and clearly communicate this, along with any implications.

@alamb
Copy link
Collaborator

alamb commented Apr 12, 2023

I certainly don't mean in any way to stop or prevent maintenance of arrow2 for people who want to do so.

As he says, @tustvold and I and the rest of the Arrow community plan to keep pushing arrow-rs forward and are trying to assist users of arrow2 who want to join us.

I am just scared of the core dependency being pulled from underneath of polars. Releases are still needed as I want bugs to be fixed and be able to publish on crates.io.

The purpose of this particular PR to change the documentation is as @tustvold says, to communicate that there seems to be no maintainers or plan for maintenance of this crate. I certainly don't plan to "pull" anything from arrow2 or polars.

If that is not something that is wanted by arrow-rs, that might also be fine. Maybe polars should fork arrow2 and continue maintaining it less publicly.

I would personally love to get the zero copy approach in arrow-rs.

In terms of maintaining a polars specific fork of arrow2 that certainly is a reasonable approach (or since you are a maintainer of this repo, you could maybe just use this one). Also perhaps you or other interested parties could work out an alternate maintenance plan for this crate with its other users.

I think it is up to you where you want to invest your time and effort.

@@ -15,6 +15,27 @@ for a general introduction on how to use this crate, and
[API docs](https://jorgecarleitao.github.io/arrow2/main/docs/arrow2)
for a detailed documentation of each of its APIs.

# 2023-04-12: Deprecation Notice
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 2023-04-12: Deprecation Notice
# 2023-04-12: Maintenance Notice

Maybe if we change the title so it didn't say "deprecated" this would sound less scary 🤔

@alamb
Copy link
Collaborator

alamb commented May 3, 2023

Here is a PR that clarifies that the maintenance bandwidth on this crate is low: #1476

It side steps the issues of what "deprecated" might mean / not mean for the future and instead attempts to describe the current state of affairs.

I also hope it might help any current users of arrow2 to offer to help maintain the crate if that is valuable to them.

@ritchie46
Copy link
Collaborator

See #1476

@ritchie46 ritchie46 closed this May 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Combination of arrow-rs and arrow2, deprecation of arrow2 repository
4 participants