Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix for Rank and WMA operators #1228

Merged
merged 13 commits into from Nov 13, 2022
30 changes: 19 additions & 11 deletions qlib/data/ops.py
Expand Up @@ -34,8 +34,6 @@


#################### Element-Wise Operator ####################


class ElemOperator(ExpressionOps):
"""Element-wise Operator

Expand Down Expand Up @@ -216,9 +214,7 @@ class Not(NpElemOperator):

Parameters
----------
feature_left : Expression
feature instance
feature_right : Expression
feature : Expression
feature instance

Returns
Expand All @@ -241,8 +237,6 @@ class PairOperator(ExpressionOps):
feature instance or numeric value
feature_right : Expression
feature instance or numeric value
func : str
operator function

Returns
----------
Expand Down Expand Up @@ -1154,18 +1148,32 @@ class Rank(Rolling):

def __init__(self, feature, N):
super(Rank, self).__init__(feature, N, "rank")
major_version, minor_version, *_ = pd.__version__.split(".")
self._load_internal = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend implementing it as a method to override the parent method instead of setting the attribute for the following reasons.

  • fewer attributes will make it simpler.
  • It will not work in some special cases (for example, someone dump an instance of the operator in low version of pandas and load it in high version of pandas)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@you-n-g

think of a way to use hasattr instead of version check. See latest commit. let me know if you have better idea. thx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any update/feedback please?

self._load_internal_pd14
if int(major_version) > 1 or int(major_version) == 1 and int(minor_version) > 3
else self._load_internal_pd_below_13
)

def _load_internal(self, instrument, start_index, end_index, *args):
def _load_internal_pd14(self, instrument, start_index, end_index, *args):
series = self.feature.load(instrument, start_index, end_index, *args)
if self.N == 0:
series = series.expanding(min_periods=1).rank(pct=True)
else:
series = series.rolling(self.N, min_periods=1).rank(pct=True)
return series

# for compatiblity of python 3.7, which doesn't support pandas 1.4.0+ which implements Rolling.rank
def _load_internal_pd_below_13(self, instrument, start_index, end_index, *args):
series = self.feature.load(instrument, start_index, end_index, *args)
# TODO: implement in Cython

def rank(x):
if np.isnan(x[-1]):
return np.nan
x1 = x[~np.isnan(x)]
if x1.shape[0] == 0:
return np.nan
return percentileofscore(x1, x1[-1]) / len(x1)
return percentileofscore(x1, x1[-1]) / 100
qianyun210603 marked this conversation as resolved.
Show resolved Hide resolved

if self.N == 0:
series = series.expanding(min_periods=1).apply(rank, raw=True)
Expand Down Expand Up @@ -1341,7 +1349,7 @@ def _load_internal(self, instrument, start_index, end_index, *args):
# TODO: implement in Cython

def weighted_mean(x):
w = np.arange(len(x))
w = np.arange(len(x)) + 1
w = w / w.sum()
return np.nanmean(w * x)

Expand Down