Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems when downloading dataset #1555

Closed
lixinzju opened this issue Jun 17, 2023 · 6 comments
Closed

problems when downloading dataset #1555

lixinzju opened this issue Jun 17, 2023 · 6 comments
Labels
question Further information is requested

Comments

@lixinzju
Copy link

❓ Questions and Help

when I run: python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --interval 1d --region cn

the error occurs:

File "C:\ProgramData\anaconda3\lib\site-packages\requests[models.py](https://link.zhihu.com/?target=http%3A//models.py)", line 1021, in raise_for_status

raise HTTPError(http_error_msg, response=self)

requests.exceptions.HTTPError: 409 Client Error: Public access is not permitted on this storage account. for url: https://qlibpublic.blob.core.windows.net/data/default/stock_data/v2/qlib_data_cn_1d_0.9.1.zip

Thank you very much!

We sincerely suggest you to carefully read the documentation of our library as well as the official paper. After that, if you still feel puzzled, please describe the question clearly under this issue.

@lixinzju lixinzju added the question Further information is requested label Jun 17, 2023
@zcyoop
Copy link

zcyoop commented Jun 20, 2023

#1547

@SunsetWolf
Copy link
Collaborator

This issue has been fixed in PR 1558, using the source installation, which solves this problem.

@yiyione
Copy link

yiyione commented Jun 25, 2023

This issue has been fixed in PR 1558, using the source installation, which solves this problem.

Hi @SunsetWolf I got another ERROR "TypeError: token must be bytes" after this PR, looks like something wrong with the TOKEN.

Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/qlib/qlib/run/get_data.py", line 9, in <module>
    fire.Fire(GetData)
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/qlib/qlib/tests/data.py", line 191, in qlib_data
    if not self.check_dataset(file_name):
  File "/qlib/qlib/tests/data.py", line 96, in check_dataset
    url = self.merge_remote_url(file_name)
  File "/qlib/qlib/tests/data.py", line 39, in merge_remote_url
    token = fernet.decrypt(self.TOKEN).decode()
  File "/usr/lib/python3/dist-packages/cryptography/fernet.py", line 75, in decrypt
    timestamp, data = Fernet._get_unverified_token_data(token)
  File "/usr/lib/python3/dist-packages/cryptography/fernet.py", line 100, in _get_unverified_token_data
    utils._check_bytes("token", token)
  File "/usr/lib/python3/dist-packages/cryptography/utils.py", line 30, in _check_bytes
    raise TypeError("{} must be bytes".format(name))
TypeError: token must be bytes

@SunsetWolf
Copy link
Collaborator

Hi @yiyione , a contributor has submitted PR1577, which fixes this issue.

@Lifan1121
Copy link

此问题已在PR 1558中修复,使用源安装解决了此问题。

hello, I use run get_data.py today, but it doesn't work.I clone PR1558, but it still fail to download data with the error
"""
timeout Traceback (most recent call last)
File c:\Users\lf\miniconda3\envs\py38\lib\site-packages\urllib3\response.py:444, in HTTPResponse._error_catcher(self)
443 try:
--> 444 yield
446 except SocketTimeout:
447 # FIXME: Ideally we'd like to include the url in the ReadTimeoutError but
448 # there is yet no clean way to get at it from this context.

File c:\Users\lf\miniconda3\envs\py38\lib\site-packages\urllib3\response.py:567, in HTTPResponse.read(self, amt, decode_content, cache_content)
566 with self._error_catcher():
--> 567 data = self._fp_read(amt) if not fp_closed else b""
568 if amt is None:

File c:\Users\lf\miniconda3\envs\py38\lib\site-packages\urllib3\response.py:533, in HTTPResponse._fp_read(self, amt)
531 else:
532 # StringIO doesn't like amt=None
--> 533 return self._fp.read(amt) if amt is not None else self._fp.read()

File c:\Users\lf\miniconda3\envs\py38\lib\http\client.py:459, in HTTPResponse.read(self, amt)
458 b = bytearray(amt)
--> 459 n = self.readinto(b)
460 return memoryview(b)[:n].tobytes()

File c:\Users\lf\miniconda3\envs\py38\lib\http\client.py:503, in HTTPResponse.readinto(self, b)
...
--> 822 raise ConnectionError(e)
823 except SSLError as e:
824 raise RequestsSSLError(e)

ConnectionError: HTTPSConnectionPool(host='qlibpublic.blob.core.windows.net', port=443): Read timed out.
"""
I have tried for many times, It just halt at almost 50% or 20%

@Lifan1121
Copy link

此问题已在PR 1558中修复,使用源安装解决了此问题。

你好,我今天使用run get_data.py,但它不起作用。我克隆了PR1558,但它仍然无法下载数据,错误为“” ” 超时回溯(最近一次调用最后) 文件c:\ Users \ lf \ miniconda3\envs\py38\lib\site-packages\urllib3\response.py:444,在 HTTPResponse._error_catcher(self) 443 中尝试: --> 444 产生 446,除了 SocketTimeout: 447 # FIXME:理想情况下我们希望包括ReadTimeoutError 中的 url 但是 448 # 目前还没有干净的方法从这个上下文中获取它。

文件 c:\Users\lf\miniconda3\envs\py38\lib\site-packages\urllib3\response.py:567,在 HTTPResponse.read(self, amt,decode_content,cache_content) 566 中,带有 self._error_catcher() : - -> 567 data = self._fp_read(amt) 如果不是 fp_close else b"" 568 如果 amt 为 None:

文件 c:\Users\lf\miniconda3\envs\py38\lib\site-packages\urllib3\response.py:533,在 HTTPResponse._fp_read(self, amt) 531 else: 532 # StringIO 不喜欢 amt = None --> 533 如果 amt 不是 None,则返回 self._fp.read(amt) self._fp.read()

文件 c:\Users\lf\miniconda3\envs\py38\lib\http\client.py:459,在 HTTPResponse.read(self, amt) 458 b = bytearray(amt) --> 459 n = self.readinto( b) 第460章 返回内存视图(b)[:n].tobytes()

文件 c:\Users\lf\miniconda3\envs\py38\lib\http\client.py:503,在 HTTPResponse.readinto(self, b) ... --> 822 引发 ConnectionError ( e) 823 除了 SSLError 作为 e : 824 引发请求SSLError(e)

ConnectionError:HTTPSConnectionPool(主机='qlibpublic.blob.core.windows.net',端口=443):读取超时。 """ 我试了很多次了,就停在差不多50%、20%左右

This problem is due to my VPN. I'm sry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants