Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish performance analysis and optimization options & methods #158

Open
breznak opened this issue Dec 12, 2018 · 4 comments

Comments

Projects
None yet
2 participants
@breznak
Copy link
Member

commented Dec 12, 2018

Relevant: #3 #50

  • discuss ways to measure performance:
    • Hotgym benchmark #30
    • micro benchmarks
    • time per SP/TM/...
    • random vs "logical" data
    • profiling
  • factors of performance
    • a function of #columns x #cells x #inputField x local/global-Inhibition ,....
  • compare PY, bindings, numenta c++, our c++ versions
  • graphs of time/speed for each
  • discuss (un)imlemented ways to optimize:
    • smaller data types (uint16, byte, float16 ?) -> smaller cache footprint
    • vectorized - CUDA #50 , Eigen
    • parallel regions (SP, TM Regions run in parallel) #253
    • parallel loops (c++17 TS parallel ) #214
    • parallel core algorithms (SP, TM, Connections) #254
    • removed ASM
    • removed SparseBinaryMatrix ? #169
    • implement Batch mode for SP
    • compiler optimizations (LTO, PGO, Ofast, ...)
    • profiling (what parts take the most?)

@breznak breznak added this to the optimization milestone Dec 12, 2018

@breznak

This comment has been minimized.

Copy link
Member Author

commented Dec 12, 2018

May be relevant for #93 if SPonConn is slower than current SP on SparseMatrix

@ctrl-z-9000-times

This comment has been minimized.

Copy link

commented Dec 13, 2018

In theory, permanence could be scaled from [0-1] into an integers valid range. Integer arithmetic is faster than floating point math. Integers also come in 8-bit & 16-bit variants which saves memory / cache space, although this space saving can be hard to achieve because of the way C++ lays & pads out its data structures.

My favourite method for profiling is to install signal handler which prints the stack trace. Then randomly keyboard interrupt the program several times. See where (on average) the program spends its time. This method simple and good for finding egregious errors.

@breznak

This comment has been minimized.

Copy link
Member Author

commented Dec 13, 2018

My favourite method for profiling is to install signal handler which prints the stack trace. Then randomly keyboard interrupt the program several times. See where (on average) the program spends its time. This method simple and good for finding egregious errors.

C++ profiling in Netbeans works rather nice! That's why I needed the hotgym benchmark, to run a "nonrmal" program and be able to get time spent in each method(s)

@ctrl-z-9000-times

This comment has been minimized.

Copy link

commented Jan 16, 2019

How to profile on linux with valgrind:
http://valgrind.org/docs/manual/manual.html
See section for cachegrind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.