Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FeatureRequest] Supporting Feather and/or Parquet file formats #417

Open
mantielero opened this issue Feb 26, 2020 · 1 comment
Open

[FeatureRequest] Supporting Feather and/or Parquet file formats #417

mantielero opened this issue Feb 26, 2020 · 1 comment

Comments

@mantielero
Copy link

mantielero commented Feb 26, 2020

Just for your consideration.

Feather file format seems to have excellent performance while Parquet seems to be more oriented for long term storage as explained here.

It looks like feather development is now maintained under Apache's Arrow

Some results, benchmarking: csv, pickle, messagepack, HDF5, feather and parquet

It looks like Feather requires the use of Flatbuffers. There seems to be a pure Nim library: skflatbuffers.

Another serialization format to explore: fst. Feather, Parquet and FST are explained here.

@mratsim mratsim added the feature label Mar 1, 2020
@mratsim
Copy link
Owner

mratsim commented Mar 1, 2020

I'd prefer supporting Apache Arrow due to it's use in Nvidia RAPIDS and the base format of their GPU accelerated DataFrame library CuDF: https://github.com/rapidsai/cudf.

I have no time to tackle that for the foreseeable future though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants