Skip to content
This repository has been archived by the owner on Sep 1, 2023. It is now read-only.

Anomaly modular #956

Merged
merged 74 commits into from Sep 11, 2014
Merged

Anomaly modular #956

merged 74 commits into from Sep 11, 2014

Conversation

breznak
Copy link
Member

@breznak breznak commented May 23, 2014

1/ turn anomaly.py into a class for better reuse, move related logic from clamodel.py here
2/ nice speedup for python (use numpy.sum())
3/ new features/"implementation" to anomaly computation - parametrized, default off, so no change from current
3.1/ new cumulative anomaly (sliding window)
3.2/ @subutai Please have a look if you like this approach, it's prepared for integration with your likelihood anomaly code
4/ added tests

breznak added 12 commits May 23, 2014 04:39
and convert tests / usecases accordingly;
Just create an instance of nupic.algorithms.anomaly.Anomaly()
and call the computeAnomalyScore() method
yields cca 300x speedup! (only of array is numpy, not python array) which is this case
instead of python's sum() on numpy array (much slower)

found by: grep -R 'numpy.*.sum()' $NUPIC
this class was extracted from clamodel previously, so it's just moving around
fixed bug: self._buf must be float array!
Conflicts:
	py/nupic/frameworks/opf/clamodel.py
@breznak
Copy link
Member Author

breznak commented May 23, 2014

TODO1: integration tests will fail on serialization/ pickle - probably as anomaly turned into a class, not just a function. Can someone help me fix that please?
TODO2: more like new-feature: add slidingWindowSize and useLikelihood to anomaly parameters so it can be used from OPF, any pointers where these can be modified? (would be a good article for wiki). Thanks a lot!

@breznak breznak added this to the Sprint 23 milestone May 23, 2014
@breznak breznak self-assigned this May 23, 2014
@rhyolight
Copy link
Member

Subutai is out until next week.

Sent from my MegaPhone

On May 23, 2014, at 5:29 AM, breznak notifications@github.com wrote:

TODO1: integration tests will fail on serialization/ pickle - probably as anomaly turned into a class, not just a function. Can someone help me fix that please?
TODO2: more like new-feature: add slidingWindowSize and useLikelihood to anomaly parameters so it can be used from OPF, any pointers where these can be modified? (would be a good article for wiki). Thanks a lot!


Reply to this email directly or view it on GitHub.

@rhyolight
Copy link
Member

TODO1: integration tests will fail on serialization/ pickle - probably as anomaly turned into a class, not just a function. Can someone help me fix that please?

This could be a hard thing to fix.

@scottpurdy
Copy link
Contributor

Thanks Marek, very good idea! In fact, we have a version of this that we use internally that Matt put up in #946

As far as the specifics, I don't think we want to require clients to create an instance of the class to have access to the static functionality of the base computeAnomalyScore. We do want a more sophisticated method for de-noising the score though and in the case where you have a running state it may make sense to encapsulate that in an object. A lot of thought has gone into how to do this likelihood calculation here at Grok and Matt has a PR to merge our internal likelihood algorithm (#946 - originally created by @subutai). Can you try that out and see if it solves your needs?

Feel free to put feedback on that PR if you think the API could be improved. If that change doesn't solve your use case then please follow up with an explanation of what the data looks like and how our likelihood doesn't solve the issue / what you would like to see done.

@scottpurdy scottpurdy closed this May 23, 2014
@breznak
Copy link
Member Author

breznak commented May 24, 2014

Hi Scott, I did check #946 quickly, I'm not sure if the moving average referenced there is meant for anomaly, or for keeping track of records (to update the distribution model)? Anyways, there are still things in this PR that could be merged (numpy speedup) and usability from one anomaly class. I will comment to likelihood for the rest..

@breznak breznak reopened this May 24, 2014
@breznak
Copy link
Member Author

breznak commented May 24, 2014

(@rhyolight btw, Matt, a closed PR gets reopened when one comments on it)
I just reviewed #946 , looks great, I just had some questions about unclear things.. As I understand it, the likelihood code is not intended to replace anomaly.py it just computes a probability that what has been detected as anomaly (input value, its anomaly score) is really an anomaly.

That's why I've created this PR so that people can turn features on/off in anomaly detection - the likelihood core is great example, final anomaly score would be anomaly_score * probability_its_really_anomaly, moving average would be other example.

@scottpurdy
Copy link
Contributor

@breznak - I think it would be better to separate the numpy speed ups from the anomaly score changes.

And it looks like you use numpy.sum(numpyArray) in a lot of places rather than numpyArray.sum() but I believe that is the same thing (and the original is the more canonical way to use methods in Python). Note that using the sum method of a numpy array is different than using the builtin Python sum function.

@breznak
Copy link
Member Author

breznak commented May 30, 2014

@scottpurdy ok, I'll separate the speedups to a new PR.
about sum() I was referring to https://stackoverflow.com/questions/10922231/pythons-sum-vs-numpys-numpy-sum
The thing is to use numpy.sum on numpy array; I didn't check if np_array.sum() and py_array.sum() use numpy, resp. python's implementation (I'd think so). Should I redo those cases?

@breznak
Copy link
Member Author

breznak commented May 30, 2014

TODO1: integration tests will fail on serialization/ pickle - probably as anomaly turned into a class, >>not just a function. Can someone help me fix that please?
This could be a hard thing to fix.

@rhyolight can you elaborate please? To implement serialization for a new class should not be a big problem (if I knew how); troubles can be with checkpoints from past models ... And I think we should think this through, as backwards compatibility for loading modes might be important for Grok or corporate customers, but I believe at this stage it's a minor issue for community usecases, so we could be more flexible here

@breznak
Copy link
Member Author

breznak commented May 30, 2014

@subutai I plan to use code from #946 here, so that people can easily combine all sorts of modifications (sliding window, likelihood,..) to anomaly.

I wanted to ask, is there a reason why in grok you are using just likelihood of anomalies and not something like "probability weighted anomaly"? (anomaly_score * anomaly_probability) This would make more sense to me: eg huge anomaly but very unlikely is unimportant; but somewhat likely but catastrophical anomaly is important; I plan to implement this here. Thanks

@breznak
Copy link
Member Author

breznak commented May 30, 2014

Here I also intend to fix #967

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 42ea7a6 on breznak:anomaly_modular into 2664c19 on numenta:master.

@rhyolight
Copy link
Member

@breznak What is the status of this PR at this point?

@breznak
Copy link
Member Author

breznak commented Sep 5, 2014

it's ready ;)

@rhyolight
Copy link
Member

@scottpurdy / @chetan51 / @subutai Please review. It will be nice to finally get this merged!

@rhyolight rhyolight modified the milestones: Sprint 30, Sprint 29 Sep 5, 2014
@breznak
Copy link
Member Author

breznak commented Sep 6, 2014

it's ready ;)

@rhyolight
Copy link
Member

@scottpurdy / @chetan51 / @subutai Any of you have time to review this today? This PR has been around since May, and it would be great to put it to bed.

@rhyolight
Copy link
Member

@breznak I've gotten word that this PR is good to merge after some style issues are fixed. You should run pylint on the files you changed. There are a lot of little things to fix, for example:

› pylint nupic/algorithms/anomaly.py
************* Module nupic.algorithms.anomaly
W0311,  30: 0 - Bad indentation. Found 4 spaces, expected 2 (bad-indentation)
C0301,  34: 0 - Line too long (88/80) (line-too-long)
C0301,  36: 0 - Line too long (84/80) (line-too-long)
W0311,  39: 0 - Bad indentation. Found 4 spaces, expected 2 (bad-indentation)
W0311,  40: 0 - Bad indentation. Found 4 spaces, expected 2 (bad-indentation)
W0311,  44: 0 - Bad indentation. Found 6 spaces, expected 4 (bad-indentation)
W0311,  47: 0 - Bad indentation. Found 6 spaces, expected 4 (bad-indentation)
W0311,  48: 0 - Bad indentation. Found 4 spaces, expected 2 (bad-indentation)
W0311,  50: 0 - Bad indentation. Found 6 spaces, expected 4 (bad-indentation)
W0311,  51: 0 - Bad indentation. Found 4 spaces, expected 2 (bad-indentation)
W0311,  53: 0 - Bad indentation. Found 6 spaces, expected 4 (bad-indentation)
W0311,  54: 0 - Bad indentation. Found 4 spaces, expected 2 (bad-indentation)
C0303,  61: 0 - Trailing whitespace (trailing-whitespace)
C0301,  61: 0 - Line too long (93/80) (line-too-long)
C0303,  62: 0 - Trailing whitespace (trailing-whitespace)
C0301,  73: 0 - Line too long (92/80) (line-too-long)
C0303,  75: 0 - Trailing whitespace (trailing-whitespace)
C0301,  76: 0 - Line too long (89/80) (line-too-long)
C0303,  77: 0 - Trailing whitespace (trailing-whitespace)
C0301,  77: 0 - Line too long (82/80) (line-too-long)
C0303,  78: 0 - Trailing whitespace (trailing-whitespace)
C0303,  79: 0 - Trailing whitespace (trailing-whitespace)
C0301,  79: 0 - Line too long (82/80) (line-too-long)
C0303,  81: 0 - Trailing whitespace (trailing-whitespace)
C0303,  82: 0 - Trailing whitespace (trailing-whitespace)
C0301,  82: 0 - Line too long (100/80) (line-too-long)
C0303,  84: 0 - Trailing whitespace (trailing-whitespace)
C0301,  84: 0 - Line too long (106/80) (line-too-long)
C0303,  85: 0 - Trailing whitespace (trailing-whitespace)
C0301,  85: 0 - Line too long (107/80) (line-too-long)
C0301,  86: 0 - Line too long (113/80) (line-too-long)
C0301,  92: 0 - Line too long (95/80) (line-too-long)
C0301,  95: 0 - Line too long (137/80) (line-too-long)
C0325, 101: 0 - Unnecessary parens after 'not' keyword (superfluous-parens)
C0301, 102: 0 - Line too long (93/80) (line-too-long)
C0301, 103: 0 - Line too long (98/80) (line-too-long)
C0301, 109: 0 - Line too long (93/80) (line-too-long)
C0303, 111: 0 - Trailing whitespace (trailing-whitespace)
C0301, 113: 0 - Line too long (107/80) (line-too-long)
C0301, 114: 0 - Line too long (106/80) (line-too-long)
C0301, 115: 0 - Line too long (99/80) (line-too-long)
C0301, 132: 0 - Line too long (87/80) (line-too-long)
C0301, 135: 0 - Line too long (87/80) (line-too-long)
W0611,  27: 0 - Unused import nupic (unused-import)

@breznak
Copy link
Member Author

breznak commented Sep 9, 2014

@rhyolight that is a darn long list :) i'll try to do something about it. but we should get a coding-style template, so I could just ctrl+shift+f in netbeans and have it formated. 🐫

@rhyolight
Copy link
Member

@breznak There is a pylintrc in the file tree, which provides that info. Not sure about IDE support, however. It all adheres to https://github.com/numenta/nupic/wiki/Python-Style-Guide.

@rhyolight
Copy link
Member

@oxtopus Aside from Python code style issues -- which should be addressed by @breznak after adhering to pylint advice -- can you do one last functional review of this PR? @subutai and @scottpurdy have already reviewed most of it, but more commits were added.

@subutai
Copy link
Member

subutai commented Sep 10, 2014

@rhyolight I don't have any more substantive changes beyond what's already been discussed. Would be great to get this merged in soon. @breznak I've had to make those pylint changes too - even though the list looks long, they can actually be done very quickly.

@breznak
Copy link
Member Author

breznak commented Sep 11, 2014

@rhyolight fixed most of the formatting, although I'd like to raise a discussion on usefulness of some of those -- esp. 80char/line limit and trailing whitespace (in comments) make the code look ugly and less readible, imho.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) when pulling 914b4cb on breznak:anomaly_modular into 3d45da2 on numenta:master.

@rhyolight
Copy link
Member

@breznak said:

esp. 80char/line limit and trailing whitespace (in comments) make the code look ugly and less readible, imho.

I'm pretty positive we're not going to deviate from PEP-8 on that issue.

rhyolight added a commit that referenced this pull request Sep 11, 2014
@rhyolight rhyolight merged commit 98d1452 into numenta:master Sep 11, 2014
@oxtopus
Copy link
Contributor

oxtopus commented Sep 11, 2014

I think it's worth discussion. I find the strict 80-char limit anachronistic and even PEP-8, which is limited in scope to setting the standards for contributions to the Python stdlib, provides provisions for 100-char width in http://legacy.python.org/dev/peps/pep-0008/#maximum-line-length for code maintained exclusively or primarily by a team that can reach agreement on this issue. We deviate somewhat from PEP-8 already and as stated in http://legacy.python.org/dev/peps/pep-0020/, readability counts.

@rhyolight
Copy link
Member

Well, perhaps a discussion on nupic-hackers would be worthwhile.


Matt Taylor
OS Community Flag-Bearer
Numenta

On Thu, Sep 11, 2014 at 8:45 AM, Austin Marshall notifications@github.com
wrote:

I think it's worth discussion. I find the strict 80-char limit
anachronistic and even PEP-8, which is limited in scope to setting the
standards for contributions to the Python stdlib, provides provisions for
100-char width in
http://legacy.python.org/dev/peps/pep-0008/#maximum-line-length for code
maintained exclusively or primarily by a team that can reach agreement on
this issue
. We deviate somewhat from PEP-8 already and as stated in
http://legacy.python.org/dev/peps/pep-0020/, readability counts.


Reply to this email directly or view it on GitHub
#956 (comment).

@scottpurdy
Copy link
Contributor

Hey all, I think this needs a bit more work. I am going to send a follow up PR but want to get feedback asap so let me know if you disagree with any of these changes:

  • The versioning should bump a version number rather than just check if it is missing the anomaly instance
  • The current method for specifying the type of anomaly you want doesn't let a 3rd party create their own. Can we change this to a base class defining the interface and then a set of subclasses? Separating into separate classes also makes each method easier to understand and test since you don't have the sliding window logic, etc all in the same class.
  • The anomaly class supports taking either current predicted columns or previously predicted columns. I don't see any reason to have the class support both. Rather than duplicating the previously predicted state I think the new class should take the previously predicted only.
  • I don't think we use the KNNAnomalyClassifierRegion but the change there added state to the class with no benefit. It should just use computeRawAnomalyScore.

@scottpurdy scottpurdy mentioned this pull request Sep 11, 2014
2 tasks
@breznak breznak deleted the anomaly_modular branch January 8, 2016 12:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants