🐛 Bug Description
I ask the authors for help about the inconsistency between the data I got following the document and the data illustrate in the example of the doc. The authors told me that the newly available data is normalized by the first day of the stock on the list. And it's adjusted for observing the real trend of the stock in the market. However. the initialized date is consistent with the real market data caused by some matters like acquisition. Take SH600018(上港集团) as an example, it's not on the listing of Shanghai exchange until 2006-10-16, and it's recorded only after 2006-10-26. However, SH600018 represents another company, say, 上港集箱, from 2005-04-01. Therefore, the developers take both stocks as the same one, which may be some trouble for predicting.
- we may fit the stock based on the performance of 上港集箱, which is not existing at present.
- 上港集箱 is not in the list of csi100 and csi300, which conflicts with the market indices.
- The initial price of 上港集团 is 0.26, not 1, and they have different results after dividing and exit divident and right.
To Reproduce
Steps to reproduce the behavior:
import qlib
qlib.init(auto_mount=False, mount_path='/data/csdesign/qlib')
from qlib.data import D
D.features(['SH600018'], ['$close'], start_time='20000101', end_time='20090101')
- return the results as follows:
$close
instrument datetime
SH600018 2005-01-04 1.000000 (the initial listing date of 上港集箱)
2005-01-05 1.007989
2005-01-06 1.011984
2005-01-07 1.019308
2005-01-10 1.023302
2005-01-11 1.045939
2005-01-12 1.039947
...
2006-09-13 1.141976
2006-09-14 1.141976
2006-09-15 1.146159
2006-09-18 1.148250
2006-09-19 1.145461
2006-09-20 1.140581
2006-09-21 1.141976
2006-09-22 1.148947
2006-09-25 1.141278
2006-09-26 NaN
2006-09-27 NaN
2006-09-28 NaN
2006-09-29 NaN
2006-10-09 NaN
2006-10-10 NaN
2006-10-11 NaN
2006-10-12 NaN
2006-10-13 NaN
2006-10-16 NaN
2006-10-17 NaN
2006-10-18 NaN
2006-10-19 NaN
2006-10-20 NaN
2006-10-23 NaN
2006-10-24 NaN
2006-10-25 NaN
2006-10-26 0.264230 (the firstly recorded date of 上港集团)
2006-10-27 0.249589
2006-10-30 0.252378
2006-10-31 0.262138
2006-11-01 0.288631
2006-11-02 0.306061
Expected Behavior
- filter the data that belong to the period of the stock listed in the market indices in the dataset, and correct the real market initialized date.
- illustrate the source of dataset open for the Chinese market.
Screenshot
Environment
Note: User could run cd scripts && python collect_info.py all under project directory to get system information
and paste them here directly.
- Qlib version: the latest one
- Python version: python3.8
- OS (
Windows, Linux, MacOS): Windows
- Commit number (optional, please provide it if you are using the dev version):
Additional Notes
🐛 Bug Description
I ask the authors for help about the inconsistency between the data I got following the document and the data illustrate in the example of the doc. The authors told me that the newly available data is normalized by the first day of the stock on the list. And it's adjusted for observing the real trend of the stock in the market. However. the initialized date is consistent with the real market data caused by some matters like acquisition. Take SH600018(上港集团) as an example, it's not on the listing of Shanghai exchange until 2006-10-16, and it's recorded only after 2006-10-26. However, SH600018 represents another company, say, 上港集箱, from 2005-04-01. Therefore, the developers take both stocks as the same one, which may be some trouble for predicting.
To Reproduce
Steps to reproduce the behavior:
$close
instrument datetime
SH600018 2005-01-04 1.000000 (the initial listing date of 上港集箱)
2005-01-05 1.007989
2005-01-06 1.011984
2005-01-07 1.019308
2005-01-10 1.023302
2005-01-11 1.045939
2005-01-12 1.039947
...
2006-09-13 1.141976
2006-09-14 1.141976
2006-09-15 1.146159
2006-09-18 1.148250
2006-09-19 1.145461
2006-09-20 1.140581
2006-09-21 1.141976
2006-09-22 1.148947
2006-09-25 1.141278
2006-09-26 NaN
2006-09-27 NaN
2006-09-28 NaN
2006-09-29 NaN
2006-10-09 NaN
2006-10-10 NaN
2006-10-11 NaN
2006-10-12 NaN
2006-10-13 NaN
2006-10-16 NaN
2006-10-17 NaN
2006-10-18 NaN
2006-10-19 NaN
2006-10-20 NaN
2006-10-23 NaN
2006-10-24 NaN
2006-10-25 NaN
2006-10-26 0.264230 (the firstly recorded date of 上港集团)
2006-10-27 0.249589
2006-10-30 0.252378
2006-10-31 0.262138
2006-11-01 0.288631
2006-11-02 0.306061
Expected Behavior
Screenshot
Environment
Note: User could run
cd scripts && python collect_info.py allunder project directory to get system informationand paste them here directly.
Windows,Linux,MacOS): WindowsAdditional Notes