Encoders in c++ #258

breznak · 2019-02-09T20:35:18Z

ctrl-z-9000-times · 2019-02-12T14:51:37Z

I'd like to make some SDR-based tools for working with encoders. The fact that we have no existing C++ encoders means that we can make them in whatever way we'd like to.

MultiEncoder via SDR-Concatenator

The MultiEncoder creates a group of encoders and concatenates the results together. I'd like to make an SDR-Concatenator class to do this. The users would create the encoders and use this class to join the results into a single SDR to give to the algorithms. Example:

SDR A  <-  from constituent encoder
SDR B  <-  from constituent encoder
SDR_Concatenator C( A, B, axis=0 )
A.setDense( data )
B.setDense( data )
C.getDense() -> A & B concatenated

SDR-Intersection

This would be useful for working with multidimensional data. The user encodes each dimension separately and then takes the intersection of the resulting SDR's. The result is an SDR where each bit responds to an area of the input space.

Category Encoder

Would be nice to have.

breznak · 2019-02-12T16:08:24Z

There was some interest in encoders at the forums, hope it'll make our repo more exciting and accessible.

The MultiEncoder creates a group of encoders ... make an SDR-Concatenator class to do this.

make it a function of SDR::append(vector<SDR> concatenate) ?
call at a MultiEncoder, and let it do what you've described
no need as it's rather easy to do with SDRs now, just show "best practices" as

vector concat(sdr1.getDense());
concat.assign(concat.end(), sdr2.getDense().begin(), sdr2.getDense().end());
SDR concatenated; 
concatenated.setDense(concat);

dkeeney · 2019-02-12T17:39:40Z

We do have a rudimentary encoder: ScalarEncoder.cpp
But there is a lot more we could do there. Be sure that we also include a Region implication that can handle the new encoders. ScalerSensor.cpp is the one for ScalarEncoder. Perhaps a general purpose region that can handle any type of encoder would be cool.

ctrl-z-9000-times · 2019-02-12T19:10:34Z

no need as it's rather easy to do with SDRs now, just show "best practices" as

This wont work for encoders which have dimensions. Imagine a large image with 3 color channels (RGB), and you want to encode each color separately and then combine them into a large SDR with topology. In this situation you need to splice together each pixels encoded color.

ctrl-z-9000-times · 2019-02-16T03:08:47Z

I started a wiki page listing all of the encoders in both C++ & Python repositories, annotated. This wiki page also contains a tentative plan of action for providing a cohesive set of features.

https://github.com/htm-community/nupic.cpp/wiki/Encoder-Roundup

ctrl-z-9000-times · 2019-02-16T03:34:10Z

Can Python Encoders use SDRs?

I'd like for the python encoders to use SDRs, and this brings up an interesting topic: we agreed to merge the pure python code into this repo, see issue #216. We also agreed that the python should remain separate from the C++ code. Does it need to be absolutely 100% separate? Or python make use of the C++ SDR & Connections classes? To answer this I question why users might prefer python:

Python is easy to setup & install. The C++ is getting a lot better at this, many thanks to David Keeney for his work on CMake and reducing external dependencies.
Python is easy to inspect & interrogate. The SDR & Connections have bindings which make this easy to do.
Python is easy to experiment with. Python can not subclass & override C++ bindings, but this limitation can be mitigated by allowing python to register callbacks for events which the SDR & Connections C++ classes already have.
Python is easy to use. The C++ SDR is easy to use as well, so I think that integrating the SDR into the python code will further this goal.

The downside of integrating SDR into the Encoders is that it adds a new API to the encoder algorithms, which then needs to be supported. This issue won't effect the NetworkAPI.

dkeeney · 2019-02-16T14:08:36Z

My vote would be to encode all encoders in 100% C++.

The SDR class is available.
The incoming raw data to the encoders can be passed to C++ easy enough.
Experimenters that are building apps in 100% C++ can take advantage of these encoders.
The encoders become language independent by calling the C++ routines via its bindings. Python, C#, or whatever.

ctrl-z-9000-times · 2019-02-16T16:30:06Z

My vote would be to encode all encoders in 100% C++.

For the most part I agree, but here are a few counter arguments:

All of the python encoders are already written & have unit tests.
ScalarEncoder - I think we should provide this in every language because it's the simplest example. It's like a "hello world" level of difficulty.
SDR-Category - Implementation must use python hash() & dict(), can be written in C++ w/ bindings?
delta.py & logarithm.py - Conveniences, not necessary for C++ but since python already has them why not use them.
date.py - Python's datetime library is too good to give up. datetime.datetime.today() -> (year, month, day, hour, minute, second, day-of-week, day-of-year, daylight-savings, time-zone, GMT-offset)

dkeeney · 2019-02-16T18:25:36Z

I don't need ALL of the encoders in C++ but one of my personal objectives is to eventually provide a set of bindings for C#. A C# app using our library is not going to have access to any Python modules. It would be nice to be able to just call into C++ for encoders. Otherwise I would have to duplicate the logic in C#.

ctrl-z-9000-times · 2019-02-22T15:56:52Z

RDSE Algorithm Memo

I hope to change the implementation of the Random Distributed Scalar Encoder (RDSE). Inside of this encoder: the RDSE transforms a real valued input into an integer valued index, and then it associates the index with a set of active bits.

Currently, the association between indices and active bits is randomly generated as needed, and then stored for the lifetime of the encoder. This allows the encoder to find & guarantee a good set of random activations which don't overlap with any existing mapping. It also allows the encoder to decode an SDR into the input value which likely created it.
Instead, the association between indices and active bits will be calculated from the hash of the index. This uses a smaller amount of memory because it does not need to explicitly keep the association for the lifetime of the encoder. It is also faster because it will not check that all encodings are distinct, instead it will rely on the random & distributed nature of SDRs to prevent conflicts between different encodings. This method does not allow for decoding SDRs into the inputs which likely created it.

Pros:

Faster Construction: the new method is O(1). The current method is at least O(n) where n is the number of distinct inputs.
Smaller memory footprint: O(n) -> O(1) where n is the number of distinct inputs to the encoder.

Cons:

No Decode method. Instead make an SDR Classifier. We could even implement the decode method using an SDR classifier, but we would want it to be optional since the SDR Classifier has significant overhead.
No strong guarantee that semantically unrelated inputs have a low overlap. This should only be an issue if the encoder is too small or its sparsity is too large. Mitigation: we can quantify these failure conditions and test them with unit-tests.

dkeeney · 2019-02-22T17:45:42Z

* No Decode

I would think that you could still perform a Decode. You are not storing the previously used patterns but you can re-calculate the patterns used provided you have the starting seed. Just cycle through the used real values until you find one that results in a pattern that matches the one you are trying to decode. Slow but it would work. Or am I misunderstanding what you are proposing.

But then again....decoding is not biological. The only way we know a color is RED is that we match it with another pattern that someone in our experience has told us is RED. The sound of the spoken word "RED" and the word RED all match with the pattern of RED in our experiance. Is that decoding?

One could argue that encoders are sort of biological depending on the data being encoded.

ctrl-z-9000-times · 2019-02-22T18:05:13Z

Just cycle through the used real values until you find one that results in a pattern that matches the one you are trying to decode.

That could be very time consuming. The range of values for an RDSE is infinite.

An alternative to decode method is to make an SDR classifier. We could even integrate the classifier into the encoder to provide a decode method? Would want it to be optional since SDR classifier has significant overhead.

ctrl-z-9000-times · 2019-04-01T23:43:51Z

Category Encoders

Category encoders should be implemented as Scalar encoders, which encode an Enumeration of the categories using a radius of less than 1.

I think we should not implement category encoders, but rather describe to the user how to make them. We would document this in the following places:

Python module nupic.bindings.encoders
C++ Header src/nupic/encoders/BaseEncoder.hpp

Both places already contain a general description of what an encoder is. I think we should add our notes about encoders to these locations.

Also, we should add a few unit tests to prove this works.

breznak · 2019-04-02T08:27:46Z

Category encoders should be implemented as Scalar encoders, which encode an Enumeration of the categories using a radius of less than 1.

yes, that is suffecient. Category encoder used to work as a demonstration example, and I guess de-coding was easier to implement, but we don't support that anymore.

I think we should add our notes about encoders to these locations.

this, or I can imagine an encoders/README.md with most of the text collected from this PR, issue ,..

breznak · 2019-05-05T23:50:50Z

I think we should not implement category encoders, but rather describe to the user how to make them.

will now have the best of both worlds, category encoding implemented "via" a flag to RDSE/Scalar. See #448

documentation in wiki

I really like the "blog" posts in this issue. Just a note about the wiki, I think it would be even better to make an encoders/README.md with its content.

advantages be: same markup, both online (view from web) and offline (in git). Info in wikis/issues is a pain once migrating to a new service, while git is rock solid in that matter.

ctrl-z-9000-times · 2019-05-28T02:50:46Z

I'd like to close this issue, as well as PR #291. All of the tasks here have either been completed or have been moved to another open issue, except for:

in-depth explanations, and notes for practical usage

The encoders are documented, tested, and have a few examples, so I'd say this is done. Giving an in depth explanation is beyond the scope of this project. There is an HTM-School video about how encoders work, as well as a whitepaper. We could put a link to the HTM-School youtube channel in the README.

Great work all around on this issue!

ctrl-z-9000-times · 2019-06-12T15:54:01Z

Closing this issue, please reopen if there is more to discuss.

breznak added the encoder label Feb 9, 2019

ctrl-z-9000-times mentioned this issue Feb 25, 2019

Encoders: Random Distributed Scalar Encoder - WIP #278

Merged

14 tasks

ctrl-z-9000-times added this to the Release version 1.0 milestone Mar 1, 2019

breznak mentioned this issue Mar 1, 2019

c++ encoders dump #291

Closed

29 tasks

ctrl-z-9000-times mentioned this issue Mar 5, 2019

Coordinate Encoder - WIP #304

Draft

ctrl-z-9000-times mentioned this issue Mar 13, 2019

New class BaseEncoder #314

Merged

12 tasks

ctrl-z-9000-times mentioned this issue Mar 26, 2019

New script rf_view_ScalarEncoder.py #342

Merged

This was referenced May 1, 2019

Category Encoder #435

Merged

ScalarEncoder & RDSE display scripts allow Categories. #448

Merged

breznak mentioned this issue May 8, 2019

Add DateTime encoder from python #458

Closed

2 tasks

ctrl-z-9000-times mentioned this issue May 13, 2019

Move MurmurHash3 from utils/ to external/ #468

Closed

breznak mentioned this issue May 31, 2019

SimHash Distributed Encoders (Scalar, Document) for Old NuPIC. numenta/nupic-legacy#3872

Closed

ctrl-z-9000-times closed this as completed Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoders in c++ #258

Encoders in c++ #258

breznak commented Feb 9, 2019 •

edited by ctrl-z-9000-times

Loading

ctrl-z-9000-times commented Feb 12, 2019

breznak commented Feb 12, 2019

dkeeney commented Feb 12, 2019

ctrl-z-9000-times commented Feb 12, 2019

ctrl-z-9000-times commented Feb 16, 2019

ctrl-z-9000-times commented Feb 16, 2019

dkeeney commented Feb 16, 2019

ctrl-z-9000-times commented Feb 16, 2019

dkeeney commented Feb 16, 2019

ctrl-z-9000-times commented Feb 22, 2019 •

edited

Loading

dkeeney commented Feb 22, 2019

ctrl-z-9000-times commented Feb 22, 2019

ctrl-z-9000-times commented Apr 1, 2019 •

edited

Loading

breznak commented Apr 2, 2019

breznak commented May 5, 2019

ctrl-z-9000-times commented May 28, 2019

ctrl-z-9000-times commented Jun 12, 2019

Encoders in c++ #258

Encoders in c++ #258

Comments

breznak commented Feb 9, 2019 • edited by ctrl-z-9000-times Loading

Outstanding Tasks for ScalarEncoder:

Outstanding Tasks for RDSE:

Outstanding Tasks for CategoryEncoder:

ctrl-z-9000-times commented Feb 12, 2019

MultiEncoder via SDR-Concatenator

SDR-Intersection

Category Encoder

breznak commented Feb 12, 2019

dkeeney commented Feb 12, 2019

ctrl-z-9000-times commented Feb 12, 2019

ctrl-z-9000-times commented Feb 16, 2019

ctrl-z-9000-times commented Feb 16, 2019

Can Python Encoders use SDRs?

dkeeney commented Feb 16, 2019

ctrl-z-9000-times commented Feb 16, 2019

dkeeney commented Feb 16, 2019

ctrl-z-9000-times commented Feb 22, 2019 • edited Loading

RDSE Algorithm Memo

dkeeney commented Feb 22, 2019

ctrl-z-9000-times commented Feb 22, 2019

ctrl-z-9000-times commented Apr 1, 2019 • edited Loading

Category Encoders

breznak commented Apr 2, 2019

breznak commented May 5, 2019

ctrl-z-9000-times commented May 28, 2019

ctrl-z-9000-times commented Jun 12, 2019

breznak commented Feb 9, 2019 •

edited by ctrl-z-9000-times

Loading

ctrl-z-9000-times commented Feb 22, 2019 •

edited

Loading

ctrl-z-9000-times commented Apr 1, 2019 •

edited

Loading