Skip to content
This repository has been archived by the owner on Sep 1, 2023. It is now read-only.

Use optimized lingebra math libraries #28

Open
1 of 13 tasks
subutai opened this issue Feb 20, 2014 · 15 comments
Open
1 of 13 tasks

Use optimized lingebra math libraries #28

subutai opened this issue Feb 20, 2014 · 15 comments

Comments

@subutai
Copy link
Member

subutai commented Feb 20, 2014

This super issue plans workflow for speed optimizations by using a specialized library.

Benefits:

  • SPEED!
  • fewer our-manually optimized (=hacked) code; cleanup
  • bugs/improvements delegated to lib's upstream
  • better portability
  • use of parallel cores (openMP), special CPU instructions (SSE,..), GPGPU backends

Requirements:

  • usability
    • suitable licence
    • platform support (Linux/Mac/Win; x86_64)
    • convenient installation/bundling with nupic
  • functionality
    • SSE instructions
    • GPGPU backend support (CUDA, openCL)
    • parallelism support (openMP)
    • sparse matrices
  • programming
    • clean & lean API
    • active development
    • (opt) bindings to other languages we use (Python)

Workflow:

  1. decide on library implementation to use
  2. create profiling/benchmark tools
  3. hello world usecase using the chosen lib
  4. focus on Temporal pooler - the current bottleneck
  5. Optimize Connections for Temporal memory
  6. Optimize SparseMatrix classes (cleanup, mem reduction)
  7. Optimize other (less significant parts)
    • Optimize Spatial pooler
  8. Misc
@breznak
Copy link
Member

breznak commented Aug 19, 2014

is this still an issue? (given the optimizations were not that big?) Before I was suggesting multiplatform eigen library, but not sure if we should bother at this time.

@breznak breznak modified the milestones: Bug Reports, Optimization Sep 18, 2014
@breznak breznak changed the title Figure out how to add vecLib back in Use optimized lingebra math libraries Sep 18, 2014
@rhyolight rhyolight modified the milestone: Optimization Oct 15, 2014
@breznak
Copy link
Member

breznak commented Feb 26, 2015

relevant: #193 #151

@breznak
Copy link
Member

breznak commented Feb 26, 2015

@subutai would you mind if I reword the issue a bit?
former description:

subutai commented on Feb 20, 2014
See issue #27. We'd like to possibly add it back in later so tracking it here. Some related web pages:

https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man7/vecLib.7.html

Before adding it back in we should verify this really gives a performance improvement in real cases. This is doubtful.

@breznak breznak added this to the 0.6.0 milestone Feb 26, 2015
@breznak breznak added the super label Feb 26, 2015
@rhyolight rhyolight modified the milestones: 0.6.0: features, 1.1.0: Future Development Mar 30, 2015
rcrowder added a commit that referenced this issue Dec 16, 2015
@rhyolight
Copy link
Member

Please review this issue

This issue needs to be reviewed by the original author or another contributor for applicability to the current codebase. The issue might be obsolete or need updating to match current standards and practices. If the issue is out of date, please close. Otherwise please leave a comment to justify its continuing existence. It may be closed in the future if no further activity is noted.

@breznak
Copy link
Member

breznak commented May 18, 2016

This is still valid, although noone is at the time working on the porting to lingebra libraries. I think it should stay open to monitor optimization progress and results.
E.g. the PRs from @mrcslws speeding up TM could be referenced here for record.

@rhyolight
Copy link
Member

Ok, so the issue is still valid, but it is also defined very broadly. It's labeled type:optimization so I'll track it that way, but I think the ticket description needs to be simplified. It's too long and complicated, and too many subjects and TODO items. We need to try to keep our issues simpler and smaller. This could turn into a super issue, but honestly I would rather break it up even farther. Something to think about @subutai.

@subutai
Copy link
Member Author

subutai commented May 23, 2016

@rhyolight Agreed. The issue is indeed pretty big right now. I think a good first step is to replace the use of sparse matrices in the python spatial pooler, python KNN classifier, and/or optimize the existing C++ SpatialPooler (which is not currently too optimized).

@breznak
Copy link
Member

breznak commented May 23, 2016

I think a good first step is to replace the use of sparse matrices in the python spatial pooler, python KNN classifier, and/or optimize the existing C++ SpatialPooler (which is not currently too optimized).

@subutai shouldn't the effort focus on the big-impact first? Aka the biggest bottlenecks, which is still TM/TP?

You all will have to please forgive me for my novice understanding of the code (I'm still learning it... slowly), but I wanted to understand what kinds of calculations are being made within nupic that could require a library like Eigen or Armadillo or MKL or OpenBLAS or whatever. Is there massive matrix multiplication going on? Vector multiplication? Even if someone could just point me to proper class/function/file so I could get a better handle on it, I think I could offer up some help with this.

@jshahbazi Sorry, I missed your call, if you are still interested, we certainly would! The logic and operations are in algrithms/Connestions.hpp (for TemporalMemory) and in math/{Sparse,Dense}Matrix (for SpatialPooler).

The operations (someone please correct me): vector AND, searching N-highest entries, indexing and updating weights, ... @scottpurdy @mrcslws @subutai ?

The code can be benchmarked (globally, for a typical use) using #890 . Also please weight in on #948

@subutai
Copy link
Member Author

subutai commented May 23, 2016

shouldn't the effort focus on the big-impact first? Aka the biggest bottlenecks, which is still TM/TP?

The TM is actually not the biggest bottleneck right now. After changes by @mrcslws it is a pretty small part of the overall profile.

@breznak
Copy link
Member

breznak commented May 24, 2016

The TM is actually not the biggest bottleneck right now. ...

@subutai not really, it still is (even the code complexity compared to SP is higher)

Please see numenta/nupic-legacy#3131 for my benchmarks:

  • fastest SP (c++ "2D" SP): 0.040 s/call
  • fastest TM/TP (cpp TP): 0.040 s/call
    • fastest TM: 0.158 s/call

The old SP problem I've discovered with 1D vs 2D inputs: #380
Problem with TM speed: #890 (comment)

@breznak
Copy link
Member

breznak commented May 24, 2016

We need to try to keep our issues simpler and smaller. This could turn into a super issue, but honestly I would rather break it up even farther

@rhyolight this IS a super issue with links to sub-issues where possible/active

@breznak
Copy link
Member

breznak commented May 24, 2016

Added #967 as a proposal that would halve the computation time easily.

@subutai
Copy link
Member Author

subutai commented May 24, 2016

not really, it still is (even the code complexity compared to SP is higher)

@breznak I will let @mrcslws comment on this. According to Marcus, when you run hotgym, the new TM is a small percentage of the overall profile. Marcus - am I mis-remembering?

I took a quick look at #3131 and sp_profile. I don't remember seeing this script before but it looks like the SP parameters are quite off in sp_profile. Why is potentialRadius only 3? It should be much larger to form good SDRs. Same with numActiveColumnsPerInhArea, etc. etc. I think the parameters should be set to realistic numbers and the profile re-run with those numbers.

@mrcslws
Copy link
Contributor

mrcslws commented May 24, 2016

I commented on numenta/nupic-legacy#3131 (comment).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants