Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real-life benchmark: Hotgym example using C++ algorithms #30

Open
breznak opened this issue Jan 24, 2018 · 8 comments

Comments

Projects
None yet
3 participants
@breznak
Copy link
Member

commented Jan 24, 2018

Implement a pipeline running full real-world HTM task.

Currently implemented using raw HTM classes (TM, SP,...),
not NetworkAPI (needs TM/SPRegion), not as python code using c++ bindings (would be possible).

Pipeline:

  • compile as standalone executable (for profiling)
  • load CSV from file
    • use our classical "hotgym" dataset
    • parse command-line for optional filename and num runs
  • encode CSV data
    • MultiEncoder for more fields than 1
  • run SpatialPooler to get SDR
    • global
    • local inhibition
  • run TP to get temporal predictions
    • use more modern TM as alternative
    • TP (old) obsoleted
    • BacktrackingTM (TP based) obsoleted
    • show SDR output computed by these TM flavours
      • also checks deterministic algorithms' outputs #194
  • compute Anomaly score
    • test AnomalyLikelihood
  • add SDR Classifier
    • needs encoder topDownCompute (SDR -> Real) decided WONTFIX
  • measure execution time
    • more fine-grained separate timers for each part of pipeline (SP, Encoder, TM,..)
      • fine grained timer checks for each part
  • implement as a class to make more reusable
  • use SDR for all layers
    • SDR Metrics
    • enforce common Compute/Serializable/... interface
  • implement using core algorithms (SP, TM)
    • encoder
    • SP
    • TM
    • AN
    • classifier
      • predictor
    • CP (later, when implemented) #285
    • implement using NetworkAPI
  • test parallelization #255
  • test interfaces
    • serialization
  • optimize parameters #433

We are looking for a real-life benchmark we can use as a base for our performance optimizations #3 .
In Python there is a "Hotgym anomaly example" (stresses encoder, SP, TM, Anomaly) , implement similar example in C++ and add it to integration-tests with timing.

  • suggested with NAPI

I have ported SPRegion and TMRegion and they run under windows but I was waiting until after PyBind was implemented to merge them in with a later PR.

Waiting for #54 SPRegion & TMRegion in C++

@breznak breznak referenced this issue Jan 24, 2018

Open

Optimization for performance #3

1 of 6 tasks complete

@breznak breznak self-assigned this Aug 26, 2018

@breznak breznak added the newbie label Aug 26, 2018

@breznak breznak removed their assignment Aug 26, 2018

@breznak breznak added this to the optimization milestone Aug 30, 2018

@breznak breznak self-assigned this Aug 30, 2018

@breznak breznak referenced this issue Nov 25, 2018

Merged

Eigen speed #42

2 of 2 tasks complete
@breznak

This comment has been minimized.

Copy link
Member Author

commented Nov 25, 2018

@ctrl-z-9000-times @dkeeney Please hold off merging PRs before we implement this (should be soon, by Mon)

Any of you good with NetworkAPI? I need a "hello world" example, where we create a Network with all the basic parts: encoder, SP, TM, Anomaly. And run it through hotgym dataset and measure time.
It should be relatively simple, is there any example constructing network?

Then we can proceed merging: SDR, Random, Eigen PRs
Thanks! 👍

@dkeeney

This comment has been minimized.

Copy link

commented Nov 25, 2018

The CppRegionTests.cpp has some of the parts, but SP and TM cannot be implemented in C++ only until we complete the SPRegion and TMRegion classes in C++. Currently there is Python code that fills in the gap. I have ported SPRegion and TMRegion and they run under windows but I was waiting until after PyBind was implemented to merge them in with a later PR.

I think a hotgym example/tests would be a good addition to unit_tests.

@ctrl-z-9000-times

This comment has been minimized.

Copy link

commented Nov 25, 2018

Sorry, I can't help with the network API. I don't use the network API code, it doesn't compile so I comment it out from CMakeLists.txt (boost is playing hide and seek w/ cmake).

To benchmark the SpatialPooler and SDR-Classifier, MNIST is a good hello-world type dataset. It should take 10-20 minutes to run through the whole dataset, assuming you compiled for release mode. Debug mode takes approx 10 times longer. The repo "Numenta/nupic.vision" has an example of MNIST which works, but it will need to be updated to work with this fork. I wrote my own solution to the MNIST dataset but I think there are bugs in it still.

@dkeeney

This comment has been minimized.

Copy link

commented Nov 25, 2018

Our whole objective is to get the network API to build (with boost until we can use C++17) so that it is usable. It is the framework in which all of the algorithms can be coordinated.

  • Did you install cmake and boost?
  • Have you tried the latest version of the nupic_cpp repository?

The hot Gym example is Python code. In order to port that to a C++ example/test so we can use it as a performance tests we will need to port not only the hot Gym example but also complete the C++ code set by porting the Python code SPRegion and TMRegion modules to C++ so that the SP and TM can be executed as C++ plugins rather than as Python plugins.

@breznak

This comment has been minimized.

Copy link
Member Author

commented Nov 26, 2018

have ported SPRegion and TMRegion and they run under windows but I was waiting until after PyBind was implemented to merge them in with a later PR.

Thanks David, so I have the options:

  • implement regions in python (and test bindings, c++ core)
  • implement LATER with your c++ *Region classes
  • implement using raw classes (SP, TM), similar to what is done in HelloSPTMTest

To benchmark the SpatialPooler and SDR-Classifier, MNIST is a good hello-world type dataset. It should take 10-20 minutes to run through the whole dataset

That would be a good example, thanks! We can even add TM (would be useless, but we want to stress it). I will use MNIST as addition to hotgym

(please resolve the boost build issue, but in a new thread)

need to port not only the hot Gym example but also complete the C++ code set by porting the Python code SPRegion and TMRegion modules to C++ so that the SP and TM can be executed as C++ plugins

Still, I can now write example in Py that uses cpp impl, so our c++ code, right?

@breznak breznak referenced this issue Nov 26, 2018

Merged

SDR class #113

@ctrl-z-9000-times

This comment has been minimized.

Copy link

commented Nov 26, 2018

build (with boost until we can use C++17)

Thanks David! I updated to gcc-8 and c++17 and now it builds much cleaner.

@breznak

This comment has been minimized.

Copy link
Member Author

commented Jan 30, 2019

@dkeeney @ctrl-z-9000-times should we keep both TP, TM in this example,benchmark, or go only with TM, as most code does?

@dkeeney

This comment has been minimized.

Copy link

commented Jan 30, 2019

Lets keep both because this tests the C++ to Python and Python to C++ interfaces for both algorithms.

@breznak breznak added high example and removed newbie labels Feb 9, 2019

@breznak breznak referenced this issue Feb 9, 2019

Merged

Hotgym 4 #256

@breznak breznak changed the title Real-life benchmark: Hotgym example using NetworkAPI (NAPI) Real-life benchmark: Hotgym example using C++ algorithms May 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.