forked from apache/arrow-rs
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parquet-derive
to repository README
#1
Open
konjac
wants to merge
31
commits into
master
Choose a base branch
from
add-parquet-derive-to-readme
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ffer (apache#5741) * Compute data buffer length by offset buffer start and end values * Update code comment * Add unit test * Add round_trip check * Fix clippy
This patch adds reader support for a comment character for reading CSV files. While comments like almost nothing around the CSV format are not truly standardized, a common format supported by many CSV readers[^1][^2] is to ignore full lines starting with a comment character (often `#`); inline or end of line comments are not supported. Example: # This is a comment in a CSV file without header. 1,2 # Comment inside the data block. 11,22 The implementation of this for Arrow is pretty straight-forward as all we need to do is expose the existing `comment` option of `csv_core` used to read CSV files. Closes apache#5758. [^1]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html [^2]: https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
apache#5761) * Downgrade to Rust 1.77 in integration pipeline (apache#5719) * Checkout nanoarrow
… 0 (apache#5740) * fix: parse string to decimal when scale is 0 * fix fmt
* Expose boolean builder contents * Suggest using arrow-provided utility for boolean unpacking
… tests (apache#5764) * maybe run the nanoarrow tests * try to pass the location of nanoarrow to archery * fix name
Signed-off-by: Xuanwo <github@xuanwo.io>
* Remove deprecated comparison kernels (apache#4733) * Fix docs * Fix doctest * Update arrow/src/lib.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
) CPUs have efficient instructions for querying, setting and clearing bits, and modern compilers know how to turn simple bit indexing code into such instructions. The table lookup optimizations may have been useful in older versions of rustc, but as of rustc 1.78, they are a net loss. See PR description for more details.
…pache#5730) * improved the error message * added a test to test the overflow * fixed the format arrow * removed assert
… `decode_footer` (apache#5781)
…pache#5780) Updates the requirements on [itertools](https://github.com/rust-itertools/itertools) to permit the latest version. - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](rust-itertools/itertools@v0.12.0...v0.13.0) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* fix uuid derive * fix byte array length handling * test lengths * fmt
* Support casting a `FixedSizedList<T>[1]` to `T` * Add FixedSizedList[1] => FixeSizedList[1] tests
Allows for writing binary (Binary, LargeBinary, and FixedSizeBinary) to CSV. Note: FixedSizeBinary was already being supported in this way. Values are encoded as HEX, by using the default Arrow formatter. A test was added that accounts for null values when encoding all three binary types in CSV.
…ime` (apache#3125) (apache#5654) (apache#5769) * Structured interval type (apache#3125) (apache#5654) * Update integration-test * Fix 32-bit build * Review feedback
* Refine parquet documentation on types and metadata * Update regen.sh and thrift.rs * Clarify page index encompasses offset index and column index * revert unexpected diff
Updates the requirements on [prost-build](https://github.com/tokio-rs/prost) to permit the latest version. - [Release notes](https://github.com/tokio-rs/prost/releases) - [Commits](tokio-rs/prost@v0.12.4...v0.12.6) --- updated-dependencies: - dependency-name: prost-build dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Updates the requirements on [proc-macro2](https://github.com/dtolnay/proc-macro2) to permit the latest version. - [Release notes](https://github.com/dtolnay/proc-macro2/releases) - [Commits](dtolnay/proc-macro2@1.0.82...1.0.83) --- updated-dependencies: - dependency-name: proc-macro2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
… writing JSON (apache#5785) * feat: encode Binary and LargeBinary types in JSON as hex Added ability to the JSON writer to encode Binary and LargeBinary types as hex. This follows the behaviour for FixedSizeBinary. A test was added to check functionality for both Binary and LargeBinary. * refactor: use ArrayAccessor instead of custom trait * refactor: use generic in test instead of macro * refactor: use const DATA_TYPE from GenericBinaryType
* Refine ParquetRecordBatchReaderBuilder docs * fix link * Suggest using new(), add example
Did you perhaps mean to file this against the upstream repository? |
As in perhaps it should be a PR againt https://github.com/apache/arrow-rs (not this repo, https://github.com/konjac/arrow-rs) |
@tustvold @alamb Sorry for the stupid error. Did not notice the UX populate wrong target branch for me. New PR has been raised on the apache repo. Thank you! apache#5795 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Document refinement. Add
parquet-derive
to repository READMECloses apache#5751
Rationale for this change
See apache#5751
What changes are included in this PR?
Add
parquet-derive
to repository README. Also some minor refinements.Are there any user-facing changes?
No. Only README changes.