-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
502 Server Errors when streaming large dataset #6577
502 Server Errors when streaming large dataset #6577
Comments
Hi! We should be able to avoid this error by retrying to read the data when it happens. I'll open a PR in |
Thanks for the fix @mariosasko! Just wondering whether "500 error" should also be excluded? I got these errors overnight:
|
Gently pining @mariosasko and @Wauplin - when trying to stream this large dataset from the HF Hub, I'm running into
|
@sanchit-gandhi thanks for the feedback. I've opened huggingface/huggingface_hub#2026 to make the download process more robust. I believe that you've witness this problem on Saturday due to the Hub outage. Hope the PR will make your life easier though :) |
Awesome, thanks @Wauplin! Makes sense re the Hub outage |
Describe the bug
When streaming a large ASR dataset from the Hug (~3TB) I often encounter 502 Server Errors seemingly randomly during streaming:
This is despite the parquet file definitely existing on the Hub: https://huggingface.co/datasets/sanchit-gandhi/concatenated-train-set/blob/main/train/train-00228-of-07135.parquet
And having the correct commit id: 7d2acc5c59de848e456e951a76e805304d6fb350
I’m wondering whether this is coming from datasets? Or from the Hub side?
Steps to reproduce the bug
Reproducer:
Running the above script tends to fail within about 2 hours with a traceback like the following:
Traceback:
Expected behavior
Should be able to stream the dataset without any 502 error.
Environment info
datasets
version: 2.16.2.dev0huggingface_hub
version: 0.20.1fsspec
version: 2023.10.0The text was updated successfully, but these errors were encountered: