PyArrowOnRay: implement read_parquet??? #5523
Labels
External
Pull requests and issues from people who do not regularly contribute to modin
new feature/request 💬
Requests and pull requests for new features
P1
Important tasks that we should complete soon
pandas.io
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. What kind of performance improvements would you like to see with this new API?
Right now it appears the only way to create a dataframe with the PyArrowOnRay engine is to read it in from CSV. This is very unfortunate, and makes the module essentially unusable.
In my case, I'm using lists of integers for certain columns, which PyArrow handles beautifully as a segmented array, but pandas encodes as objects, thereby exploding processing time and memory usage. PyArrow's format is highly desirable for this reason, as would reading from feather/parquet (which natively support this format, unlike csv).
I understand that PyArrowOnRay is an experimental feature, but would it be possible to implement a parquet reader? This would enable testing on a key use case as PyArrowOnRay is developed further.
Thanks!
The text was updated successfully, but these errors were encountered: