Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import apach-arrow from spark overlay #1192

Closed
wants to merge 3 commits into from

Conversation

littlewu2508
Copy link
Contributor

These commits pulls dev-libs/apache-arrow, dev-python/pyarrow and their depedency from spark overlay

They are initially introduced into spark overlay because they are runtime dependencies of pyspark, responsible for parquet data format.

Nowadays we found the parquet data format is suitable for generic scientific computing, and pyspark is not the only use case, so I picked up these ebuilds, bumped its version, did some QA enhancements, and created this PR at ::science.

There are many optional features apache-arrow provided, and I do not have time to turn all the options into use flags. So I just add some useful ones for me (mainly for parquet IO). Also, there are many programming languages supported, these two ebuilds provides support for c++ and python. So, further contributions are always welcome.

Tests passed on amd64

Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com>
@littlewu2508
Copy link
Contributor Author

/cc @heroxbd

@littlewu2508
Copy link
Contributor Author

@Berrysoft Currently I do not include https://github.com/6-6-6/spark-overlay/blob/e575584fc60afe93316316f5e3b8d4cd7ecab0e0/dev-libs/apache-arrow/files/apache-arrow-9.0.0-thrift-limit.patch you introduced. Can you give some explanations on this patch, and should it be included for generic usage?

Turns on all use flags, tests all passed:

100% tests passed, 0 tests failed out of 59

Label Time Summary:
arrow-tests      =  18.62 sec*proc (31 tests)
arrow_compute    =  15.69 sec*proc (12 tests)
arrow_dataset    =   5.38 sec*proc (9 tests)
filesystem       =   1.02 sec*proc (1 test)
parquet-tests    =   3.46 sec*proc (7 tests)
unittest         =  43.15 sec*proc (59 tests)

Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com>
Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com>
@Nowa-Ammerlaan
Copy link
Member

Thanks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants