Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EHH: New filter NoMissingValues with tests #1969

Merged
merged 1 commit into from Oct 10, 2017
Merged

Conversation

@analicia
Copy link
Contributor

@analicia analicia commented Oct 2, 2017

New filter returns true if there are no missing values for the past window_length days.

@coveralls
Copy link

@coveralls coveralls commented Oct 2, 2017

Coverage Status

Coverage decreased (-0.06%) to 87.139% when pulling 88f40c5 on no-missing-values-filter into fbdfaf8 on master.

Copy link
Contributor

@ssanderson ssanderson left a comment

@ahgnaw finished a pass here.

mask=NotSpecified):

if isinstance(inputs[0], Filter):
raise ValueError(

This comment has been minimized.

@ssanderson

ssanderson Oct 2, 2017
Contributor

This should probably be a TypeError, since the error is that we received a value of the wrong type.

This comment has been minimized.

@ssanderson

ssanderson Oct 2, 2017
Contributor

What do you think about putting this check in _validate instead of __new__? __new__ is harder for a reader not familiar with pipeline to follow, and implementing __new__ creates more maintenance work for us if we add or remove parameters in Term or ComputableTerm.

This comment has been minimized.

@analicia

analicia Oct 2, 2017
Author Contributor

It makes more sense to check in _validate, will change it. Also changing the error to TypeError.

@@ -533,3 +535,34 @@ class StaticAssets(StaticSids):
def __new__(cls, assets):
sids = frozenset(asset.sid for asset in assets)
return super(StaticAssets, cls).__new__(cls, sids)


class NoMissingValues(CustomFilter, SingleInputMixin):

This comment has been minimized.

@ssanderson

ssanderson Oct 2, 2017
Contributor

We can probably also add StandardOutputs and PositiveWindowLengthMixin to this term.

This comment has been minimized.

@ssanderson

ssanderson Oct 2, 2017
Contributor

I tend to try to name filters after a positive property rather than the negation of a property, e.g., AllPresent instead of NoMissingValues. The main reason for that preference that I think the positive assertion is easier to parse if it gets negated: ~AllPresent() ("not all present") is easier for me to think about than ~NoMissingValues ("not no missing values"). Thoughts?

This comment has been minimized.

@analicia

analicia Oct 2, 2017
Author Contributor

CustomFilter already has PositiveWindowLengthMixin, I'll add StandardOutputs.

I agree, a positive assertion is is more easily read if negated.

)

def compute(self, today, assets, out, value):
if isinstance(value, LabelArray):

This comment has been minimized.

@ssanderson

ssanderson Oct 2, 2017
Contributor

My inner micro-optimizer is slightly twitchy that we're doing this dispatch every day even though we know the dtypes will never change. In practice though, I think the cost is probably negligible

In [28]: %%timeit
    ...: data.any(axis=0)
    ...:
    ...:
100000 loops, best of 3: 3.61 µs per loop

In [30]: %%timeit
    ...: if isinstance(x, int):
    ...:     data.any(axis=0)
    ...: else:
    ...:     data.any(axis=0)
    ...:
100000 loops, best of 3: 3.77 µs per loop

0.16 microseconds per factor-dayends up being basically nothing. For example, for a 10-year pipeline, that adds up to 0.0004 seconds.


def compute(self, today, assets, out, value):
if isinstance(value, LabelArray):
out[:] = ~np_any(value.is_missing(), axis=0)

This comment has been minimized.

@ssanderson

ssanderson Oct 2, 2017
Contributor

One thing that probably is worth doing here, though, is passing out=out to np.any instead of assigning the result to out[:].

This comment has been minimized.

@analicia

analicia Oct 2, 2017
Author Contributor

That would make a difference, however the output should be the negation of the np.any results.

This comment has been minimized.

@richafrank

richafrank Oct 2, 2017
Member

Would moving the negation inside work? np_all(~value.is_missing(), axis=0, out=out)

This comment has been minimized.

@analicia

analicia Oct 2, 2017
Author Contributor

That doesn't work in the case where some of the asset's values are missing, but np.all would.

This comment has been minimized.

@analicia

analicia Oct 2, 2017
Author Contributor

Ah, nvm, didn't see that @richafrank suggested np.all already.

input_factor = SomeWindowSafeStringClassifier()

shape = (10, 6)
data = choice(

This comment has been minimized.

@ssanderson

ssanderson Oct 2, 2017
Contributor

I think this should be a LabelArray if we're trying to simulate what the term would actually get passed?

This comment has been minimized.

@analicia

analicia Oct 2, 2017
Author Contributor

I think classifier inputs gets converted to a LabelArray when it is an input to compute()

This comment has been minimized.

@ssanderson

ssanderson Oct 4, 2017
Contributor

The place that data from loaders is normally converted into a LabelArray is in zipline.lib.adjusted_array._normalize_array, but that's not happening here because this test suite doesn't use any pipeline loaders, it just pre-populates the graph with the values provided by the test suite.

This comment has been minimized.

@ssanderson

ssanderson Oct 4, 2017
Contributor

Turns out this is happening, in PipelineEngine._inputs_for_term.

@coveralls
Copy link

@coveralls coveralls commented Oct 2, 2017

Coverage Status

Coverage increased (+0.008%) to 87.211% when pulling d3667b6 on no-missing-values-filter into fbdfaf8 on master.

@coveralls
Copy link

@coveralls coveralls commented Oct 2, 2017

Coverage Status

Coverage increased (+0.01%) to 87.216% when pulling 6e4363f on no-missing-values-filter into fbdfaf8 on master.

Copy link
Contributor

@ssanderson ssanderson left a comment

@ahgnaw added two very minor notes. Feel free to address those however you see fit and then merge.

@@ -251,6 +251,30 @@ Built-in Factors
.. autoclass:: zipline.pipeline.factors.WeightedAverageValue
:members:

Built-in Filters

This comment has been minimized.

@ssanderson

ssanderson Oct 4, 2017
Contributor

👍


def compute(self, today, assets, out, value):
if isinstance(value, LabelArray):
np_all(~value.is_missing(), axis=0, out=out)

This comment has been minimized.

@ssanderson

ssanderson Oct 4, 2017
Contributor

Small perf note: I think we can avoid making an extra copy of the full window here by rewriting this expression as ~np.any(value.is_missing, ...). Lifting the negation to the outer scope means that we only do the negation on the final reduced row, rather than on the whole window.

This comment has been minimized.

@analicia

analicia Oct 10, 2017
Author Contributor

👍

"""

def _validate(self):

This comment has been minimized.

@ssanderson

ssanderson Oct 4, 2017
Contributor

minor style note: I'd probably remove the blank line here.

Ana Ruelas
DOCS: Include built-in filters section in user documentation
@analicia analicia force-pushed the no-missing-values-filter branch from 6e4363f to 5dec3b7 Oct 10, 2017
@coveralls
Copy link

@coveralls coveralls commented Oct 10, 2017

Coverage Status

Coverage increased (+0.008%) to 87.204% when pulling 5dec3b7 on no-missing-values-filter into c9262a8 on master.

@analicia analicia merged commit 5c0e19b into master Oct 10, 2017
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@analicia analicia deleted the no-missing-values-filter branch Oct 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.