Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to query the data of the historical reporting period when using PIT data? #988

Closed
Chaoyingz opened this issue Mar 17, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@Chaoyingz
Copy link
Contributor

Chaoyingz commented Mar 17, 2022

馃専 Feature Description

How to query the data of the historical reporting period when using PIT data?
Is there a way to achieve this effect?

>>> import qlib
>>> from qlib.data import D
>>> qlib.init()
>>> instruments = ["sh600519"]
>>> fields = ["P($$roewa_q)", "P($$yoyni_q)"]
>>> D.features(instruments, fields, start_time="2019-01-01", end_time="2020-01-01", freq="quarter")
                                  P($$roewa_q)  P($$yoyni_q)
        instrument datetime
        sh600519   2019-03-31      1.000000      1.305041
                   2019-06-30      2.000000      2.305041
                   2019-09-30      3.000000      3.305041
                   2019-12-31      4.175322      4.252650

If possible, use a period instead of datetime for indexing, like this:

                       P($$roewa_q)  P($$yoyni_q)
instrument period
sh600519   201901      1.000000      1.305041
           201902      2.000000      2.305041
           201903      3.000000      3.305041
           201904      4.175322      4.252650

I am going to write a PITPeriodProvider to query the stored PIT data by referring to ArcticFeatureProvider. Is there a better way?

Motivation

  1. Application scenario:
  2. Related works (Papers, Github repos etc.):
  3. Any other relevant and important information:

Alternatives

Additional Notes

@Chaoyingz Chaoyingz added the enhancement New feature or request label Mar 17, 2022
@Chaoyingz Chaoyingz changed the title How to query the reporting period of pit data? How to query the data of the historical reporting period when using pit data? Mar 18, 2022
@Chaoyingz Chaoyingz changed the title How to query the data of the historical reporting period when using pit data? How to query the data of the historical reporting period when using PIT data? Mar 18, 2022
@you-n-g
Copy link
Collaborator

you-n-g commented Mar 18, 2022

@Chaoyingz
The value of PIT data depends on your observation timestamp, which is the meaning of datetime index in the current version. So I think we should consider 4 dimensions when we process PIT data (i.e. <instrument, datetime/observation time, period time, factor>). Ignoring any of them will make the query vague.

If I want to refer to historical data , I'll use operators like "P(Ref($$roewa_q, 1))", P(Ref($$roewa_q, 2))", P(Ref($$roewa_q, 2))", and its value will changes at different observation timestamp

Can you give us more details about the scenario you encountered?

@Chaoyingz
Copy link
Contributor Author

Chaoyingz commented Mar 18, 2022

@you-n-g
Thank you for your reply, my application scenario is to filter out the stock pool based on the historical financial data of the stock, so I need to use the current time as the observation point to query the data.

for example, I want to use the current time as the observation point to query the data with the reporting period of 201901. What should I do?

@you-n-g
Copy link
Collaborator

you-n-g commented Mar 20, 2022

@you-n-g
Thank you for your reply, my application scenario is to filter out the stock pool based on the historical financial data of the stock, so I need to use the current time as the observation point to query the data.

for example, I want to use the current time as the observation point to query the data with the reporting period of 201901. What should I do?

so your stock pool will only available after 201901?

I think we can implement a new operator to achieve this goal.
for example "PRef(xxx, 201901)"

@Chaoyingz
Copy link
Contributor Author

Chaoyingz commented Mar 21, 2022

The observation point in this scenario is always the current time. I only care about which stocks have financial data that meet certain criteria based on the current time. if the conditions are met, this stock will enter the stock pool.
Is this how the PRef operator behaves?

>>> import qlib
>>> from qlib.data import D
>>> qlib.init()
>>> instruments = ["sh600519", "sz000858"]
>>> fields = ["PRef($$roewa_q, 201901)", "PRef($$yoyni_q, 201902)"]
>>> D.features(instruments, fields, start_time="2022-03-15", end_time="2022-03-15", freq="day")
                                   PRef($$roewa_q, 201901)  PRef($$yoyni_q, 201902)
        instrument datetime
        sh600519   2022-03-15      1.000000      1.305041
        sh600519   2022-03-15      2.000000      2.305041

Chaoyingz added a commit to Chaoyingz/qlib that referenced this issue Mar 22, 2022
@Chaoyingz
Copy link
Contributor Author

I submitted this PR to resolve the issue #1000.

you-n-g pushed a commit that referenced this issue Mar 24, 2022
* Add PRef operator (#988)

* Fix type annotations

* Add test_pref_operator test case field

* Add note to PITProvider

* Add period parameter comment
qianyun210603 pushed a commit to qianyun210603/qlib that referenced this issue Mar 23, 2023
* Add PRef operator (microsoft#988)

* Fix type annotations

* Add test_pref_operator test case field

* Add note to PITProvider

* Add period parameter comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants