New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classifiers #1046

Merged
merged 12 commits into from Mar 20, 2016

Conversation

Projects
None yet
4 participants
@ssanderson
Member

ssanderson commented Mar 10, 2016

No description provided.

@ssanderson ssanderson force-pushed the classifiers branch from 83afb9c to 35b7148 Mar 10, 2016

@@ -5,20 +5,22 @@
from operator import attrgetter
from numbers import Number
from numpy import inf
from numpy import inf, where, nanmean, nanstd

This comment has been minimized.

@richafrank

richafrank Mar 11, 2016

Member

Should we use nanmean from our math_utils instead? It uses bottleneck if it's available.

This comment has been minimized.

@ssanderson

ssanderson Mar 16, 2016

Member

yeah, will do

@ssanderson ssanderson force-pushed the classifiers branch 2 times, most recently from 9d0a70e to 7bd56ea Mar 11, 2016

not_bool=self.inputs[0].dtype,
)
)
super(Latest, self)._validate()

This comment has been minimized.

@richafrank

richafrank Mar 17, 2016

Member

I noticed that Classifier._validate makes sure to return super()._validate(). Do we need to here too?

This comment has been minimized.

@ssanderson

ssanderson Mar 17, 2016

Member

Nothing uses these values right now, and all the existing classifiers just return None, but we should probably be consistent.

[ 0.33333333, 1. , 0.66666667],
[ 1. , 1. , 1. ]])
"""
out = np.empty_like(data)

This comment has been minimized.

@dmichalowicz

dmichalowicz Mar 17, 2016

Contributor

Should this be

if out is None:
    out = np.empty_like(data)

This comment has been minimized.

@ssanderson
dtype = int64_dtype
missing_value = -1
inputs = ()
window_length = 0

This comment has been minimized.

@dmichalowicz

dmichalowicz Mar 17, 2016

Contributor

What's the purpose of having this copy?

This comment has been minimized.

@ssanderson

ssanderson Mar 17, 2016

Member

It's used in test_normalizations below for a case where I needed two different classifiers.

@ssanderson ssanderson force-pushed the classifiers branch 3 times, most recently from 94141d1 to 3ce16f2 Mar 17, 2016

@dmichalowicz

This comment has been minimized.

Contributor

dmichalowicz commented Mar 17, 2016

@ssanderson Looks good by me. I worked through testing demean and zscore with different masks and groupby's and they worked well.

@ssanderson

This comment has been minimized.

Member

ssanderson commented Mar 17, 2016

@dmichalowicz thanks! I'll merge this once I finish making tweaks to the docstrings.

ssanderson added some commits Mar 7, 2016

ENH: Add support for Classifiers.
Classifiers are computations that represent grouping keys. They can be
used in conjuction with normalization functions like ``zscore`` or
``demean`` to perform normalizations over subsets of a dataset.

Notable changes:

- Added ``demean()`` and ``zscore()`` methods to ``Factor``.

- Added a classifier versions of ``Latest`` and ``CustomTermMixin``.
  The .latest attribute of int64 dataset columns no produces a
  classifier by default.

- Added ``Everything``, a classifier that maps all data to the same
  value.

- Added ``zipline.lib.normalize``, which implements a naive, pure-Python
  grouped normalize function.  This will likely be moved to Cython in a
  subsequent PR.
BUG: Allow Filter comparisons with AssetExists.
Allow comparisons like SomeFilter() & AssetExists().

Previously such comparisons would fail because & and | on Filters
explicitly checked that the other side of the operator was also a
Filter.

We now only enforce that the other side of the expression is a Term
with a dtype of bool_.
BUG: .latest, not latest.
The latter happens to work on py2 :(.

@ssanderson ssanderson force-pushed the classifiers branch from 8bd5742 to d67d339 Mar 19, 2016

ssanderson added some commits Mar 19, 2016

MAINT: Clean up mixin usage.
- Use RestrictedDTypeMixin for dtype validation in
  Filter/Factor/Classifier.
- Use new LatestMixin for Latest{Filter,Factor,Classifier} instead of
  duplicating logic across all three.
- Always ignore return values in _validate.
- Consistently call super() first in validation mixins.

@ssanderson ssanderson force-pushed the classifiers branch from d67d339 to 396d2f4 Mar 19, 2016

@coveralls

This comment has been minimized.

coveralls commented Mar 19, 2016

Coverage Status

Coverage increased (+0.1%) to 87.963% when pulling 396d2f4 on classifiers into bc09318 on master.

ssanderson added a commit that referenced this pull request Mar 20, 2016

@ssanderson ssanderson merged commit 14c1bb0 into master Mar 20, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ssanderson ssanderson deleted the classifiers branch Mar 20, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment