Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Using parquet files instead/alongside torch splits #377

Closed
Dsantra92 opened this issue Sep 9, 2022 · 4 comments
Closed

Comments

@Dsantra92
Copy link

Hello devs.
I am trying to develop support for OGB Datasets in MLDatasets.jl. One of the bottlenecks we are facing is loading the .pt files. This implementation here using Pickle.jl hack results in substantial memory usage compared to python. With new support for TorchArrow can you support parquet files for loading the splits?

@weihua916
Copy link
Contributor

Hi! Are the split files so large? They are just storing the split indices, no?

@Dsantra92
Copy link
Author

I was asking if it was possible/planned to use a language independent format to store the computed splits.

@weihua916
Copy link
Contributor

I see. That'd require all zipped files to be re-created. I do not think we will support this in the immediate future. You can probably consider some workaround on your side.

@Dsantra92
Copy link
Author

Makes sense!🙁

@Dsantra92 Dsantra92 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants