Skip to content
This repository has been archived by the owner on Sep 1, 2023. It is now read-only.

low level anomaly detection example #890

Closed
wants to merge 41 commits into from

Conversation

breznak
Copy link
Member

@breznak breznak commented Mar 30, 2016

calls the Encoder, SpatialPooler, TemporalPooler and Anomaly classes to show
a full chain C++ demo.
Adds some helper utils for working with CSV and vectors.

Fixes: #889
Blocked by: #869 #904 #902

TODO:

  • wait for Anomaly merged (blocked)
  • switch to TM instead of TP
  • consider creating "utils" from some of the helper methods
    • VectorHelpers (?)
    • CSVHelpers
    • tests
  • use as a C++ example
    • bring Timer from HelloSP_TP
    • remove Hello_SP_TP
    • update Readme
  • change to class with initialize() and compute() methods
  • fix build on Win
  • optimize speed (params?) of TM (currently slower than TP)
  • fix coding style
  • allow testing of all TP/TM/extendedTM (ETM) implementations

@numenta-ci
Copy link

By analyzing the blame information on this pull request, we identified @scottpurdy, @oxtopus and @rcrowder to be potential reviewers


#include "nupic/encoders/ScalarEncoder.hpp"
#include "nupic/algorithms/SpatialPooler.hpp"
#include "nupic/algorithms/Cells4.hpp"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's work with TemporalMemory instead, as the API is cleaner and closer to the reference python impl.
(I'd still be curious how to get the proper data from TP implementation)

@breznak breznak mentioned this pull request Mar 30, 2016
13 tasks
@breznak
Copy link
Member Author

breznak commented Mar 31, 2016

ping @numenta/nupic-committers ;) I'd like some review before the work continues.

@breznak breznak changed the title WIP: low level anomaly detection example low level anomaly detection example Apr 5, 2016
@breznak
Copy link
Member Author

breznak commented Apr 5, 2016

@scottpurdy can I get a review on this please?

@breznak breznak force-pushed the example_anomaly branch 2 times, most recently from 519356e to c1daf53 Compare April 6, 2016 16:33
static std::vector<UInt> cellsToColumns(const std::vector<UInt>& cellsBinary, const UInt cellsPerColumn);
};

#include "nupic/utils/VectorHelpers.cpp"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottpurdy @oxtopus @chetan51 guys, I need help on the templates, it's a damn headache in C++ :/
It would be nice and useful; but templates don't "allow" separation of implementation to cpp/hpp & our project seems to require inclusion only of cpp files, because with the hpp I'm getting redeclaration errors as the VectorHelpers.hpp is included on more places (even though I'm using #ifndef gates for includes) in libnupic_core_solo.a etc.

It's a static class, can I make it a header only and make it work somehow with the build system?
I'm stuck here, thanks for any advice!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Templates are evaluated at compile-time so any templated types/functions have to be in the headers. Do not include cpp files anywhere.

@scottpurdy
Copy link
Contributor

Please break out the components into separate PRs.

@breznak
Copy link
Member Author

breznak commented Apr 6, 2016

Please break out the components into separate PRs.

is that necessary? the commits are standalone, but all the functionality is just for this example.

@scottpurdy
Copy link
Contributor

is that necessary? the commits are standalone, but all the functionality is just for this example.

Yes, I think there is a lot of work to do here before these things can be merged (there are issues with code conventions, API design, and basic C++ organization) and I don't want to have conversations for all the different components in the same PR since it would be unmanageable. Better to start with one small component like the CSV reader/writer, get that merged, then move to the next piece.

@breznak
Copy link
Member Author

breznak commented Apr 7, 2016

@scottpurdy separated to multiple PRs as requested, please review all of them when you can.

@cogmission
Copy link

@breznak @scottpurdy Would this be a possible tool to use to learn C++?

@breznak
Copy link
Member Author

breznak commented Apr 15, 2016 via email

@breznak
Copy link
Member Author

breznak commented Aug 9, 2016

@rhyolight I've revisited this long-dead PR; fixed the problems, hope to resolve the coding style issues (if there's a cpp-lint profile, please tell me how to use it, else please point out remaining issues), optimized the code a bit and extended the example to use all 3 temporal memory implementations (TP/TM/ETM).
I think it's ready

@rhyolight
Copy link
Member

Time for another code review round @scottpurdy / @mrcslws

@scottpurdy
Copy link
Contributor

@breznak - Great to see this come back up. I'd like to see a little different implementation, similar to how we've changed things in nupic. The class that you are adding here has three separate TM implementations even though only one of them is used. Here's my proposal for how to get this same functionality and allow external experimentation with different anomaly detection algorithms and different TM implementations:

  • We mirror the nupic changes here in nupic.core. Specifically, have a raw score function, AnomalyLikelihood class, and AnomalyLikelihoodRegion class. Clients can use the AnomalyLikelihood class directly or through the region in a network.
  • We can add an example showing anomaly detection on a data set. We'd probably use the taxi dataset instead of hotgym since hotgym isn't the best example of AD.
  • External users can then experiment with the NuPIC algorithms, their own, or combinations. And they can do so directly or through the network api simply by wrapping any of their own algorithms in a Region.

One thing I see you adding here that you've done elsewhere as well is try to make a class handle multiple implementations. The problem with this is that the implementations have different interfaces (different constructor params, different methods, etc). A better way to do this is to define the common interface. Then you have separate wrapper classes that implement the interface and translate from the common interface to the specific implementations constructors and methods. Once you've done this, you can create a factory function that takes the common interface parameters and generically creates one of the wrapper instances. The interface may be specific to your application though, in which case we wouldn't put it in NuPIC.

@breznak
Copy link
Member Author

breznak commented Aug 9, 2016

Hi @scottpurdy , thank you for the quick response!

  • regarding the dataset for AD: can you point me to the "taxi dataset", please? Also, I noticed src/nupic/datafiles/extra/hotgym/hotgym-anomaly.csv (but it looks like hotgym with anomaly scores added?)
  • regarding the 3 TM implementations:
    • for some reasons I do not wish to use the Region/NetworkAPI here

      The problem with this is that the implementations have different interfaces

    • actually the TM and ExtendedTM (ETM) have pretty good (1:1) compatibility 👏

      • I haven't looked much deeper, but the ETM should extend(sic) or implement interface with TM, then I could abstract them into a single code-path.
    • for TP, I'm waiting to know if/when TP is officialy obsoleted (used by default, in NAB, ...)?

      • I'm also using the code to highlight the incompatibilities, which can be ironed in later PRs optionally.

A better way to do this is to define the common interface. Then you have separate wrapper classes ... The interface may be specific to your application though, in which case we wouldn't put it in NuPIC.

  • That's the point. I intended the example to be a "newbie" example to C++ low-level AD chain (wishing to obsolete the hello_SP_TP example). I could create 3 classes but I decided ifs would be more readable and highlight the API-(in)compatibilities., also, it's just an example, so I thought I'd keep it in a single class.
    Which approach do you think I should take?

@scottpurdy
Copy link
Contributor

Yeah no problem if you don't want to use the network api but all of the logic still applies.

Is your goal here to provide an anomaly detection example in nupic.core or to expose some functionality for work that you want to do?

If it is the former then we should just get a standard AD setup. No need for supporting multiple TM implementations. If it is the latter, then you should already be able to use these algorithms (or any of your own) however you'd like.

The interface setup is just a suggestion since I think it creates cleaner code then "monolithic" classes that contain multiple algorithms.

@breznak
Copy link
Member Author

breznak commented Aug 9, 2016

My motivation is both an AD example + I use/need the code with different TM implementations for evaluation experiments.
Please let me know if/how you are interested in the PR? Then we can:

  • simplify to just AD example using TM
  • leave as-is
  • refactor to using 3 TM classes

I'd prefer the variant where the option to switch TM implementations is kept, as it allows for nice and easy experimentation, but I dont care too much..

@scottpurdy
Copy link
Contributor

It would be a great to get an AD example into nupic.core (with and without the network api). There are some things I'd like to do first, like numenta/nupic-legacy#3228. And we need to get the taxi dataset into core (I don't think it has been added to nupic either). But then we should be able to create a simple AD example with and without the network api.

@breznak
Copy link
Member Author

breznak commented Aug 10, 2016

But then we should be able to create a simple AD example with and without the network api.

@scottpurdy are those necessarily related to the PR? Are there any obstacles to merging as-is (the code is functional) and doing any modifications as follow-ups?

@rhyolight
Copy link
Member

rhyolight commented Aug 10, 2016

@breznak Marek, you've obviously put a lot of work into this, and I know this is not what you want to hear, but after discussing this internally I'm going to close this PR. The reasons are:

  1. we have a small review team and only a few C++ experts (this PR has taken a lot of engineering time to code review over the past few months, and there is still more to do that we don't have time to do right now)
  2. this is a low-priority example of a low-level API that we don't normally encourage people to use
  3. we don't have an automated C++ linting tool in place, and I don't have time to work on that right now

That being said, I hate to see this example going to waste. I would like to see you put it in https://github.com/htm-community/nupic-example-code repository. You have push rights to this repo, and we can start collecting NuPIC use-cases here created by the community and without a strict code review process.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Low-level Anomaly example
8 participants