Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization for performance #3

Open
breznak opened this issue Jan 19, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@breznak
Copy link
Member

commented Jan 19, 2018

Steps:

  • set baseline benchmarking tests, the more, the better
    • micro benchmarks
    • IDE profiling
    • real life benchmark hotgym) #30
  • refactor code to use shared, encapsulated class for passing around data, "SDR type"
    • for now it could be typedef UInt*,
    • later wrap vector, add some methods,
    • even later wrap opt-Matrix type,...
  • identify bottlenecks
  • vectorize
    • almost all the optimization libraries work on vectors
    • replace usecases where we have setPermanence(newValue) called in a loop, with vectorized version (a scalar can be a vector with 1 item)
  • compare math library toolkits
    • the library have their data type (EIgenMatrix, etc)
    • converting to/from it will kill the (gained) performance -> "SDR type"
  • iterative optimizations

Requirements:

  • what we want from the library?
  • speed
  • multi-platform
  • sparse (memory efficient)
  • big user-base, popular
  • low code "intrusiveness"
  • CPU backend (SSE, openMP)
  • nVidia GPU backend (CUDA)
  • AMD GPU backend (openCL)
  • open source
  • clean & lean API (ease of use)
  • bindings/support for other languages (python,...)
  • I don't need no optimizations

Considered toolkits:

Links:

@breznak breznak referenced this issue Jan 19, 2018

Open

Wishlist #12

0 of 5 tasks complete

@breznak breznak referenced this issue Jul 31, 2018

Merged

Eigen speed #42

2 of 2 tasks complete
@breznak

This comment has been minimized.

Copy link
Member Author

commented Jul 31, 2018

Related #40 ASM removed

dkeeney added a commit that referenced this issue Oct 10, 2018

@ctrl-z-9000-times

This comment has been minimized.

Copy link

commented Nov 9, 2018

Performance improvements for Spatial Pooler Topology / Local Inhibition. I mentioned this on the numenta.org forum, and here I'm hoping to flesh out the idea more and communicate it with you all.

The spatial pooler with global inhibition works great as is; however local inhibition does not scale well because of the algorithms used. The differences between local and global inhibition happen at a large scale, but within a small (topological) area local and global inhibition do the same thing. Poor mans topology uses global inhibition to approximate local inhibition by making a spatial pooler with global inhibition for each area of local inhibition. In other words: Macro columns can use global inhibition and still have a large scale topology, by simulating a macro column for each topological area.

Pros:

  • Speed, this should run as fast as the underlying spatial poolers with global inhibition
  • API, should be similar to the underlying spatial pooler's API

Cons:

  • Spatial resolution. The current local-inhibition spreads the mini-columns across the input space, but this proposal would cluster many mini-columns into a point and many clusters are spread across the input space. This can be mitigated by using many clusters of mini-columns which allows for an evenly spread blanket of mini-columns.

Implementation:
A new C++ class which will create and maintain a list of SpatialPooler instances, one for each macro column. Macro columns are arranged in a uniform grid over the input space. Macro columns inputs are rectangular slices of the input space.

Example:
The MNIST dataset would be a good example. Its fast, easy to solve, widely recognized, and its visual data which is pretty.

API: Similar to SpatialPooler class ...

  • I thought that I'd replace references to "columns" with either "macroColumns" or "miniColumns" throughout this class.
  • initialize() - has the same parameters as the SP class except:
    • remove param columnDimensions
    • add param macroColumnDimentions of type vector<UInt> This must have the same length as the inputDimensions.
    • add param miniColumnsPerMacro of type UInt
    • change type of potentialRadius from UInt to vector<Real>
    • change type of wrapAround from bool to vector<bool>
  • compute() - no change to public facing API. This method will deal with dividing up the inputs, running SP.compute(), and concatenating the results.
  • Add method getMacroColumns() -> vector<*SpatialPooler> Use this method to access the underlying SP instances.
  • Replace method getColumnDimensions() with:
    • getMacroColumnDimensions() -> vector<UInt>
    • getMiniColumns() -> UInt

...

@breznak breznak added this to the optimization milestone Nov 12, 2018

breznak pushed a commit that referenced this issue Dec 10, 2018

Merge pull request #3 from htm-community/master
update from htm-community
@breznak

This comment has been minimized.

Copy link
Member Author

commented Dec 17, 2018

#153 is a huge performance win for SP (and all HTM, SP being the bottleneck)

Speeds as of now:

[ RUN      ] HelloSPTPTest.performance
starting test. DIM_INPUT=10000, DIM=2048, CELLS=10
EPOCHS = 5000
starting:  5000 iterations.Epoch = 4999
Anomaly = 0.488755
SP= 1571,1576,1581,1582,1583,1585,1586,1588,1589,1591
TP= 1571,1576,1581,1582,1583,1585,1586,1588,1589,1591
==============TIMERS============
Init:   0.071764
Random: 0.000457971
Encode: 0.0203253
SP:     0.680287
TP:     2.02909
AN:     1.33426
Total elapsed time = 4 seconds
[       OK ] HelloSPTPTest.performance (5430 ms)

@breznak breznak referenced this issue Feb 9, 2019

Open

Parallel execution, Multithreading #255

0 of 4 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.