Import apach-arrow from spark overlay #1192
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These commits pulls
dev-libs/apache-arrow
,dev-python/pyarrow
and their depedency from spark overlayThey are initially introduced into spark overlay because they are runtime dependencies of pyspark, responsible for parquet data format.
Nowadays we found the parquet data format is suitable for generic scientific computing, and pyspark is not the only use case, so I picked up these ebuilds, bumped its version, did some QA enhancements, and created this PR at ::science.
There are many optional features apache-arrow provided, and I do not have time to turn all the options into use flags. So I just add some useful ones for me (mainly for parquet IO). Also, there are many programming languages supported, these two ebuilds provides support for c++ and python. So, further contributions are always welcome.