Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems of running BERT example: BookCorpus cannot be downloaded #37

Closed
chenmoneygithub opened this issue Mar 5, 2022 · 1 comment · Fixed by #55
Closed

Problems of running BERT example: BookCorpus cannot be downloaded #37

chenmoneygithub opened this issue Mar 5, 2022 · 1 comment · Fixed by #55
Assignees
Labels
type:Bug Something isn't working

Comments

@chenmoneygithub
Copy link
Contributor

Describe the bug
Downloading bookcorpus via the [repo mentioned in BERT instruction]((https://github.com/soskek/bookcorpus/blob/master/README.md) hit an error: HTTPError: HTTP Error 503: Service Temporarily Unavailable Failed to open https://www.smashwords.com/books/download/459173/6/latest/0/0/imperfect-chemistry.txt

This might be transient since the error code is 503, but we need to further check it.

To Reproduce

git clone https://github.com/soskek/bookcorpus.git
cd bookscorpus
python download_files.py --list url_list.jsonl --out out_txts --trash-bad-count

Expected behavior
Should be able to download bookscorpus dataset.

@chenmoneygithub chenmoneygithub added the type:Bug Something isn't working label Mar 5, 2022
@mattdangerw
Copy link
Member

I would suggest using the "file by Shawn Presser" at the top of the README. That skips running the code to recreate.

Another option would be huggingface datasets, though some work would be needed to get these out of their format.

I'll update the README to include a few different sources, we probably shouldn't try to list just one as there's no official source anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants