Conversation
and convert tests / usecases accordingly; Just create an instance of nupic.algorithms.anomaly.Anomaly() and call the computeAnomalyScore() method
yields cca 300x speedup! (only of array is numpy, not python array) which is this case
instead of python's sum() on numpy array (much slower) found by: grep -R 'numpy.*.sum()' $NUPIC
this class was extracted from clamodel previously, so it's just moving around
avail anomaly implementations
fixed bug: self._buf must be float array!
Conflicts: py/nupic/frameworks/opf/clamodel.py
This reverts commit be71384.
TODO1: integration tests will fail on serialization/ |
Subutai is out until next week. Sent from my MegaPhone
|
This could be a hard thing to fix. |
Thanks Marek, very good idea! In fact, we have a version of this that we use internally that Matt put up in #946 As far as the specifics, I don't think we want to require clients to create an instance of the class to have access to the static functionality of the base Feel free to put feedback on that PR if you think the API could be improved. If that change doesn't solve your use case then please follow up with an explanation of what the data looks like and how our likelihood doesn't solve the issue / what you would like to see done. |
Hi Scott, I did check #946 quickly, I'm not sure if the moving average referenced there is meant for anomaly, or for keeping track of records (to update the distribution model)? Anyways, there are still things in this PR that could be merged (numpy speedup) and usability from one anomaly class. I will comment to |
(@rhyolight btw, Matt, a closed PR gets reopened when one comments on it) That's why I've created this PR so that people can turn features on/off in anomaly detection - the likelihood core is great example, final anomaly score would be |
This reverts commit d7e635f.
@breznak - I think it would be better to separate the numpy speed ups from the anomaly score changes. And it looks like you use |
@scottpurdy ok, I'll separate the speedups to a new PR. |
@rhyolight can you elaborate please? To implement serialization for a new class should not be a big problem (if I knew how); troubles can be with checkpoints from past models ... And I think we should think this through, as backwards compatibility for loading modes might be important for Grok or corporate customers, but I believe at this stage it's a minor issue for community usecases, so we could be more flexible here |
@subutai I plan to use code from #946 here, so that people can easily combine all sorts of modifications (sliding window, likelihood,..) to anomaly. I wanted to ask, is there a reason why in grok you are using just likelihood of anomalies and not something like "probability weighted anomaly"? ( |
Here I also intend to fix #967 |
@breznak What is the status of this PR at this point? |
it's ready ;) |
@scottpurdy / @chetan51 / @subutai Please review. It will be nice to finally get this merged! |
it's ready ;) |
@scottpurdy / @chetan51 / @subutai Any of you have time to review this today? This PR has been around since May, and it would be great to put it to bed. |
@breznak I've gotten word that this PR is good to merge after some style issues are fixed. You should run
|
@rhyolight that is a darn long list :) i'll try to do something about it. but we should get a coding-style template, so I could just ctrl+shift+f in netbeans and have it formated. 🐫 |
@breznak There is a |
@oxtopus Aside from Python code style issues -- which should be addressed by @breznak after adhering to |
@rhyolight I don't have any more substantive changes beyond what's already been discussed. Would be great to get this merged in soon. @breznak I've had to make those pylint changes too - even though the list looks long, they can actually be done very quickly. |
@rhyolight fixed most of the formatting, although I'd like to raise a discussion on usefulness of some of those -- esp. 80char/line limit and trailing whitespace (in comments) make the code look ugly and less readible, imho. |
I think it's worth discussion. I find the strict 80-char limit anachronistic and even PEP-8, which is limited in scope to setting the standards for contributions to the Python stdlib, provides provisions for 100-char width in http://legacy.python.org/dev/peps/pep-0008/#maximum-line-length for code maintained exclusively or primarily by a team that can reach agreement on this issue. We deviate somewhat from PEP-8 already and as stated in http://legacy.python.org/dev/peps/pep-0020/, readability counts. |
Well, perhaps a discussion on nupic-hackers would be worthwhile. Matt Taylor On Thu, Sep 11, 2014 at 8:45 AM, Austin Marshall notifications@github.com
|
Hey all, I think this needs a bit more work. I am going to send a follow up PR but want to get feedback asap so let me know if you disagree with any of these changes:
|
1/ turn
anomaly.py
into a class for better reuse, move related logic fromclamodel.py
here2/ nice speedup for python (use
numpy.sum()
)3/ new features/"implementation" to anomaly computation - parametrized, default off, so no change from current
3.1/ new
cumulative anomaly
(sliding window)3.2/ @subutai Please have a look if you like this approach, it's prepared for integration with your
likelihood
anomaly code4/ added tests