Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imdb source error #4550

Closed
Muhtasham opened this issue Jun 23, 2022 · 1 comment
Closed

imdb source error #4550

Muhtasham opened this issue Jun 23, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@Muhtasham
Copy link

Describe the bug

imdb dataset not loading

Steps to reproduce the bug

from datasets import load_dataset
dataset = load_dataset("imdb")

Expected results

Actual results

06/23/2022 14:45:18 - INFO - datasets.builder - Dataset not on Hf google storage. Downloading and preparing it from source
06/23/2022 14:46:34 - INFO - datasets.utils.file_utils - HEAD request to http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz timed out, retrying... [1.0]
.....
ConnectionError: Couldn't reach http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz (ConnectTimeout(MaxRetryError("HTTPConnectionPool(host='ai.stanford.edu', port=80): Max retries exceeded with url: /~amaas/data/sentiment/aclImdb_v1.tar.gz (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f2d750cf690>, 'Connection to ai.stanford.edu timed out. (connect timeout=100)'))")))

Environment info

  • datasets version: 2.3.2
  • Platform: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.7.13
  • PyArrow version: 6.0.1
  • Pandas version: 1.3.5
@Muhtasham Muhtasham added the bug Something isn't working label Jun 23, 2022
@albertvillanova
Copy link
Member

albertvillanova commented Jun 23, 2022

Thanks for reporting, @Muhtasham.

Indeed IMDB dataset is not accessible from yesterday, because the data is hosted on the data owners servers at Stanford (http://ai.stanford.edu/) and these are down due to a power outage originated by a fire: https://twitter.com/StanfordAILab/status/1539472302399623170?s=20&t=1HU1hrtaXprtn14U61P55w

As a temporary workaroud, you can load the IMDB dataset with this tweak:

ds = load_dataset("imdb", revision="tmp-fix-imdb")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants