-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTPError: 404 Client Error: Not Found for url #5086
Labels
bug
Something isn't working
Comments
FYI @lewtun |
Hi @km5ar, thanks for reporting. This should be fixed in the notebook:
Anyway, depending on your version of from datasets import load_dataset
issues_dataset = load_dataset("lewtun/github-issues")
issues_dataset instead of: from huggingface_hub import hf_hub_url
data_files = hf_hub_url(
repo_id="lewtun/github-issues",
filename="datasets-issues-with-hf-doc-builder.jsonl",
repo_type="dataset",
)
from datasets import load_dataset
issues_dataset = load_dataset("json", data_files=data_files, split="train")
issues_dataset Output: In [25]: ds = load_dataset("lewtun/github-issues")
Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10.5k/10.5k [00:00<00:00, 5.75MB/s]
Using custom data configuration lewtun--github-issues-cff5093ecc410ea2
Downloading and preparing dataset json/lewtun--github-issues to .../.cache/huggingface/datasets/lewtun___json/lewtun--github-issues-cff5093ecc410ea2/0.0.0/e6070c77f18f01a5ad4551a8b7edfba20b8438b7cad4d94e6ad9378022ce4aab...
Downloading data: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12.2M/12.2M [00:00<00:00, 26.5MB/s]
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.70s/it]
Extracting data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1589.96it/s]
Dataset json downloaded and prepared to .../.cache/huggingface/datasets/lewtun___json/lewtun--github-issues-cff5093ecc410ea2/0.0.0/e6070c77f18f01a5ad4551a8b7edfba20b8438b7cad4d94e6ad9378022ce4aab. Subsequent calls will reuse this data.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 133.95it/s]
In [26]: ds
Out[26]:
DatasetDict({
train: Dataset({
features: ['url', 'repository_url', 'labels_url', 'comments_url', 'events_url', 'html_url', 'id', 'node_id', 'number', 'title', 'user', 'labels', 'state', 'locked', 'assignee', 'assignees', 'milestone', 'comments', 'created_at', 'updated_at', 'closed_at', 'author_association', 'active_lock_reason', 'pull_request', 'body', 'timeline_url', 'performed_via_github_app', 'is_pull_request'],
num_rows: 3019
})
}) |
Thanks for reporting @km5ar and thank you @albertvillanova for the quick solution! I'll post a fix on the source too |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
I was following chap 5 from huggingface course: https://huggingface.co/course/chapter5/6?fw=tf
However, I'm not able to download the datasets, with a 404 erros
Steps to reproduce the bug
Environment info
datasets
version: 2.5.2The text was updated successfully, but these errors were encountered: