Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix for Rank and WMA operators #1228

Merged
merged 13 commits into from Nov 13, 2022
Merged

Bug fix for Rank and WMA operators #1228

merged 13 commits into from Nov 13, 2022

Conversation

qianyun210603
Copy link
Contributor

Description

  1. change the scaling in the inner function of Rank operator from len(x1) to 100
  2. add 1 on the non-normalized weights of WMA operators

Motivation and Context

For 1): I would guess the goal is to scale the ranking result between 0 and 1. According to the scipy document, percentileofscore

Compute the percentile rank of a score relative to a list of scores.
A percentileofscore of, for example, 80% means that 80% of the scores in a are below the given score.

it doesn't make sense to divide by length to array size, instead, should divide by 100.

For 2): The common convention of linear weighted MA with window size d should be weighted by d, d-1, ..., 1, instead of d-1, d-2, ..., 0, see e.g. https://www.fidelity.com/learning-center/trading-investing/technical-analysis/technical-indicator-guide/wma and the decay_linear function in famous 101 Formulaic Alphas by Zura Kakushadze from WorldQuant.
Though convention problem is arguable I think it's better to be more intuitive.

How Has This Been Tested?

  • Pass the test by running: pytest qlib/tests/test_all_pipeline.py under upper directory of qlib.
  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Pipeline test:
    image

  2. Your own tests:

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

… to 0-1, not length of array; 2) for (linear) weighted MA(n), weight should be n, n-1, ..., 1 instead of n-1, ..., 0
@qianyun210603
Copy link
Contributor Author

@you-n-g
Would you take a look please? Let me know if anything needs improvement. Thanks!

@you-n-g
Copy link
Collaborator

you-n-g commented Aug 12, 2022

Nice shot!
Thanks for fixing the bug!
I have left one comment.
LGTM after it is fixed.

@ghost
Copy link

ghost commented Aug 13, 2022

CLA assistant check
All CLA requirements met.

@qianyun210603
Copy link
Contributor Author

Thanks for reply. Updated as you suggested. Also corrected some wrong param list comments.

@qianyun210603
Copy link
Contributor Author

@you-n-g any feedback please?

@you-n-g
Copy link
Collaborator

you-n-g commented Aug 22, 2022

@qianyun210603
It looks that the CI still fails.
Please check the error. Thanks

@qianyun210603
Copy link
Contributor Author

qianyun210603 commented Aug 23, 2022

@you-n-g It's because the base 'microsoft:main' on Aug 13, which was the base of my pull request, had failed CI .
Fixed after merging the latest 'microsoft:main'.

@qianyun210603
Copy link
Contributor Author

@you-n-g
took a further look on the failure
it complains

self = Rolling [window=10,min_periods=1,center=False,axis=0,method=single]
attr = 'rank'

    def __getattr__(self, attr: str):
        if attr in self._internal_names_set:
            return object.__getattribute__(self, attr)
        if attr in self.obj:
            return self[attr]
    
        raise AttributeError(
>           f"'{type(self).__name__}' object has no attribute '{attr}'"
        )
E       AttributeError: 'Rolling' object has no attribute 'rank'

I would guess it is caused by too low pandas version which rank has not been implemented in Rolling.
Could you let me know where to check the pandas version in the Action run? Thanks!

@qianyun210603
Copy link
Contributor Author

qianyun210603 commented Aug 23, 2022

@you-n-g

Dig in some further, it looks the py3.7 tests fails because Rolling.rank was added from pandas 1.4.0+ but py3.7 can only run pandas up to 1.3.5. So the failures.

Therefore we need to fall back to the original scipy solution for rank for py3.7.
I modified the pull request to use scipy version for pandas 1.3.5 and below and native pandas Rolling.rank for pandas 1.4.0 and above. The code is a bit ugly though.

Alternative would be drop the py37 support and use native pandas Rolling.rank, or
fall back to the scipy solution for all versions (I would recommend against as scipy version is 20 times slower than pandas native from a simple test I did)

Let me know your thought. thx.

@qianyun210603
Copy link
Contributor Author

@you-n-g any thoughts please?

qlib/data/ops.py Outdated
@@ -1154,18 +1148,32 @@ class Rank(Rolling):

def __init__(self, feature, N):
super(Rank, self).__init__(feature, N, "rank")
major_version, minor_version, *_ = pd.__version__.split(".")
self._load_internal = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend implementing it as a method to override the parent method instead of setting the attribute for the following reasons.

  • fewer attributes will make it simpler.
  • It will not work in some special cases (for example, someone dump an instance of the operator in low version of pandas and load it in high version of pandas)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@you-n-g

think of a way to use hasattr instead of version check. See latest commit. let me know if you have better idea. thx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any update/feedback please?

@you-n-g you-n-g self-assigned this Sep 2, 2022
@qianyun210603
Copy link
Contributor Author

@you-n-g any progress please?

@qianyun210603
Copy link
Contributor Author

@you-n-g any comments on this?

@you-n-g
Copy link
Collaborator

you-n-g commented Nov 13, 2022

Sorry for the late response.
Thanks for your great efforts!

@you-n-g you-n-g merged commit 4001a5d into microsoft:main Nov 13, 2022
@you-n-g you-n-g added the bug Something isn't working label Dec 9, 2022
qianyun210603 added a commit to qianyun210603/qlib that referenced this pull request Mar 23, 2023
* bug fix: 1) 100 should be used to scale down percentileofscore return to 0-1, not length of array; 2) for (linear) weighted MA(n), weight should be n, n-1, ..., 1 instead of n-1, ..., 0

* use native pandas fucntion for rank

* remove useless import

* require pandas 1.4+

* rank for py37+pandas 1.3.5 compatibility

* lint improvement

* lint black fix

* use hasattr instead of version to check whether rolling.rank is implemented
qianyun210603 added a commit to qianyun210603/qlib that referenced this pull request Mar 23, 2023
* bug fix: 1) 100 should be used to scale down percentileofscore return to 0-1, not length of array; 2) for (linear) weighted MA(n), weight should be n, n-1, ..., 1 instead of n-1, ..., 0

* use native pandas fucntion for rank

* remove useless import

* require pandas 1.4+

* rank for py37+pandas 1.3.5 compatibility

* lint improvement

* lint black fix

* use hasattr instead of version to check whether rolling.rank is implemented
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants