Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash normalizing data #1206

Closed
rmallof opened this issue Jul 15, 2022 · 3 comments
Closed

Crash normalizing data #1206

rmallof opened this issue Jul 15, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@rmallof
Copy link

rmallof commented Jul 15, 2022

🐛 Bug Description

Normalizing US stock data, crashes

To Reproduce

Steps to reproduce the behavior:

  1. Download price data using Yahoo collect.py
  2. Normalize the data:
    python collector.py normalize_data --region US --max_workers=4 --interval 1d --source_dir /home/jovyan/quants/qlib-data/us_source --normalize_dir /home/jovyan/quants/qlib-data/us_data_normal

Expected Behavior

Normalizing the data without crashing

Screenshot

2022-07-15 11:09:48.999 | WARNING | collector:normalize_yahoo:425 - MCH change is abnormal for 18 consecutive days, please check the specific data file carefully
61%|█████████████████████████████████████████████████▌ | 6841/11175 [04:14<02:41, 26.89it/s]
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/opt/conda/envs/qlib/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/opt/conda/envs/qlib/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk
return [fn(*args) for args in chunk]
File "/opt/conda/envs/qlib/lib/python3.8/concurrent/futures/process.py", line 198, in
return [fn(*args) for args in chunk]
File "/home/jovyan/quants/qlib/scripts/data_collector/base.py", line 293, in _executor
df = self._normalize_obj.normalize(df)
File "/home/jovyan/quants/qlib/scripts/data_collector/yahoo/collector.py", line 475, in normalize
df = super(YahooNormalize1d, self).normalize(df)
File "/home/jovyan/quants/qlib/scripts/data_collector/yahoo/collector.py", line 440, in normalize
df = self.normalize_yahoo(df, self._calendar_list, self._date_field_name, self._symbol_field_name)
File "/home/jovyan/quants/qlib/scripts/data_collector/yahoo/collector.py", line 391, in normalize_yahoo
symbol = df.loc[df[symbol_field_name].first_valid_index(), symbol_field_name]
File "/opt/conda/envs/qlib/lib/python3.8/site-packages/pandas/core/indexing.py", line 925, in getitem
return self._getitem_tuple(key)
File "/opt/conda/envs/qlib/lib/python3.8/site-packages/pandas/core/indexing.py", line 1100, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "/opt/conda/envs/qlib/lib/python3.8/site-packages/pandas/core/indexing.py", line 838, in _getitem_lowerdim
section = self._getitem_axis(key, axis=i)
File "/opt/conda/envs/qlib/lib/python3.8/site-packages/pandas/core/indexing.py", line 1164, in _getitem_axis
return self._get_label(key, axis=axis)
File "/opt/conda/envs/qlib/lib/python3.8/site-packages/pandas/core/indexing.py", line 1113, in _get_label
return self.obj.xs(label, axis=axis)
File "/opt/conda/envs/qlib/lib/python3.8/site-packages/pandas/core/generic.py", line 3776, in xs
loc = index.get_loc(key)
File "/opt/conda/envs/qlib/lib/python3.8/site-packages/pandas/core/indexes/range.py", line 388, in get_loc
raise KeyError(key)
KeyError: None
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "collector.py", line 1203, in
fire.Fire(Run)
File "/opt/conda/envs/qlib/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/envs/qlib/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/envs/qlib/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "collector.py", line 1022, in normalize_data
super(Run, self).normalize_data(
File "/home/jovyan/quants/qlib/scripts/data_collector/base.py", line 427, in normalize_data
yc.normalize()
File "/home/jovyan/quants/qlib/scripts/data_collector/base.py", line 306, in normalize
for _ in worker.map(self._executor, file_list):
File "/opt/conda/envs/qlib/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
for element in iterable:
File "/opt/conda/envs/qlib/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/opt/conda/envs/qlib/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/opt/conda/envs/qlib/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
KeyError: None

Environment

Note: User could run cd scripts && python collect_info.py all under project directory to get system information
and paste them here directly.

Linux
x86_64
Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.10
#1 SMP Wed Mar 2 00:30:59 UTC 2022

Python version: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) [GCC 10.3.0]

Qlib version: 0.8.6.99
numpy==1.23.1
pandas==1.4.3
scipy==1.8.1
requests==2.28.1
sacred==0.8.2
python-socketio==5.6.0
redis==4.3.4
python-redis-lock==3.7.0
schedule==1.1.0
cvxpy==1.2.1
hyperopt==0.1.2
fire==0.4.0
statsmodels==0.13.2
xlrd==2.0.1
plotly==5.5.0
matplotlib==3.5.1
tables==3.7.0
pyyaml==6.0
mlflow==1.27.0
tqdm==4.64.0
loguru==0.6.0
lightgbm==3.3.2
tornado==6.1
joblib==1.1.0
fire==0.4.0
ruamel.yaml==0.17.21

Additional Notes

@rmallof rmallof added the bug Something isn't working label Jul 15, 2022
@rmallof
Copy link
Author

rmallof commented Jul 15, 2022

This bug plus #1196 are totally breaking the workflow described in your notebooks

@you-n-g
Copy link
Collaborator

you-n-g commented Jul 18, 2022

Please refer to the answer in #1196

@rmallof
Copy link
Author

rmallof commented Jul 18, 2022

Noted, US stock data is not supported at this moment

Thank you again

@rmallof rmallof closed this as completed Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants