-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Load dataset failure) ConnectionError: Couldn’t reach https://raw.githubusercontent.com/huggingface/datasets/1.1.2/datasets/cnn_dailymail/cnn_dailymail.py #759
Comments
Are you running the script on a machine with an internet connection ? |
Yes , I can browse the url through Google Chrome. |
Does this HEAD request return 200 on your machine ? import requests
requests.head("https://raw.githubusercontent.com/huggingface/datasets/1.1.2/datasets/cnn_dailymail/cnn_dailymail.py") If it returns 200, could you try again to load the dataset ? |
Thank you very much for your response.
It returns 200. And I try again to load the dataset. I got the following errors again. Traceback (most recent call last): Connection error happened but the url was different. I add the following code.
This didn't return 200 Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): |
Is google drive blocked on your network ? requests.head("https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfTHk4NFg2SndKcjQ") returns 200 |
I can browse the google drive through google chrome. It's weird. I can download the dataset through google drive manually. |
Could you try to update |
My |
Is it possible I download the dataset manually from google drive and use it for further test ? How can I do this ? I want to reproduce the model in this link https://huggingface.co/patrickvonplaten/bert2bert-cnn_dailymail-fp16. But I can't download the dataset through load_dataset method . I have tried many times and the connection error always happens . |
The head request should definitely work, not sure what's going on on your side. If you don't manage to fix it you can use from datasets import load_from_disk
dataset = load_from_disk("path/to/local/dataset") |
Hi |
Hi @smile0925 ! Do you have an internet connection ? Are you using some kind of proxy that may block the access to this file ? Otherwise you can try to update
Let me know if that helps. |
Hi @lhoestq |
I have the same problem, have you solved it? Many thanks |
Hi @ZhengxiangShi |
For Ubuntu 20.04, there are the following feedback. Google Drive is ok, but raw.githubusercontent.com has a big problem. It seems that the raw github could not match the common urllib3 protocols. 1. Google Drive
2. raw.githubusercontent.com
........ raise CertificateError( During handling of the above exception, another exception occurred: Traceback (most recent call last): ........ During handling of the above exception, another exception occurred: Traceback (most recent call last): ....... raise SSLError(e, request=request) 3. XSUM
ConnectionError: Couldn't reach https://raw.githubusercontent.com/EdinburghNLP/XSum/master/XSum-Dataset/XSum-TRAINING-DEV-TEST-SPLIT-90-5-5.json (SSLError(MaxRetryError('HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /EdinburghNLP/XSum/master/XSum-Dataset/XSum-TRAINING-DEV-TEST-SPLIT-90-5-5.json (Caused by SSLError(CertificateError("hostname 'raw.githubusercontent.com' doesn't match either of 'default.ssl.fastly.net', 'fastly.com', '.a.ssl.fastly.net', '.hosts.fastly.net', '.global.ssl.fastly.net', '.fastly.com', 'a.ssl.fastly.net', 'purge.fastly.net', 'mirrors.fastly.net', 'control.fastly.net', 'tools.fastly.net'")))'))) The following snippet could not solve the implicit ssl error.
|
Only the oldest versions of |
Thank lhoestq fo the quick response. I solve the big issue with the command line as follows. 1. Open hosts (Ubuntu 20.04)
2. Add the command line into the hosts
3. Save hosts And then the jupyter notebook can access to the datasets (module) and get the datasets of XSUM with raw.githubusercontent.com. So it is not users' fault. But most of the suggestions in the web are wrong. Anyway, I solve the problem finally. By the way, users need to add the other github commnads such as the following.
Cheers!!! |
I use the dataset 2.14.4 that published on Aug 8, 2023.发自我的 iPhone在 2023年9月13日,06:38,Quentin Lhoest ***@***.***> 写道:
Only the oldest versions of datasets use raw.githubusercontent.com. Can you try updating datasets ?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Hey, I want to load the cnn-dailymail dataset for fine-tune.
I write the code like this
from datasets import load_dataset
test_dataset = load_dataset(“cnn_dailymail”, “3.0.0”, split=“train”)
And I got the following errors.
Traceback (most recent call last):
File “test.py”, line 7, in
test_dataset = load_dataset(“cnn_dailymail”, “3.0.0”, split=“test”)
File “C:\Users\666666\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\load.py”, line 589, in load_dataset
module_path, hash = prepare_module(
File “C:\Users\666666\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\load.py”, line 268, in prepare_module
local_path = cached_path(file_path, download_config=download_config)
File “C:\Users\666666\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\utils\file_utils.py”, line 300, in cached_path
output_path = get_from_cache(
File “C:\Users\666666\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\utils\file_utils.py”, line 475, in get_from_cache
raise ConnectionError(“Couldn’t reach {}”.format(url))
ConnectionError: Couldn’t reach https://raw.githubusercontent.com/huggingface/datasets/1.1.2/datasets/cnn_dailymail/cnn_dailymail.py
How can I fix this ?
The text was updated successfully, but these errors were encountered: