Python column segment walk #3197

andrewmalta13 · 2016-07-06T16:24:02Z

Implements the column segment walk described here .

The implementations are very similar but not exact as there are differences in connections.py and connections.cpp which affect the implementation. (perhaps we can discuss if we eventually want these to agree to ensure complete alignment between the implementations).

Some of the differences that exist due to this discrepancy are:

Cpp version delete segments when its last synapse is destroyed and the python version does not.
When adding segments or synapses, if the number of created segments or synapses exceeds the parameters determining the maximum number of synapses on a segment or maximum number of segments on a cell:
i) Cpp version deletes the least recently used segments and the minimum permanence synapses to make room for a new segment/synapse.
ii) Python version ignores the parameters entirely (this does not seem right, but was in the old implementation) and creates them anyway without removing anything.
Less critical, but the data structures used to represent segments and cells are different. CPP has SegmentOverlap struct and uses a struct to represent a segment. Python version just uses an int to index into some dicts to represent a segment. Cpp uses a dumb Cell struct wrapper that just contains an int, where python just represents a cell by an int. (effectively the same, but a bit annoying on the cpp side to always have to do something like int cellindex = cell.idx;

Furthermore, this implementation does not seem to work with the old serialization methods as it does not preserve the numbering of segments. If this is confirmed to be a result of the new algorithm, I will submit an github issue to address this, but as of right now I am not 100% certain it isn't an implementation bug.

Please Review:
@mrcslws

…-columnSegmentWalk

numenta-ci · 2016-07-06T16:24:12Z

By analyzing the blame information on this pull request, we identified @chetan51, @david-ragazzi and @ywcui1990 to be potential reviewers

mrcslws · 2016-07-07T00:10:43Z

src/nupic/research/temporal_memory.py

-                           prevMatchingSegments)
+    """
+    prevActiveCells = sorted(self.activeCells)
+    prevWinnerCells = sorted(self.winnerCells)


Sorting these shouldn't be necessary. The cells / columns are walked in order, so these lists are always sorted.

andrewmalta13 · 2016-07-07T16:35:40Z

@mrcslws , incorporates the changes you suggested except the more functional change you suggested. We can talk about it when you get in. I think I agree with the change, but we probably should discuss if we still want to be able to support extending TemporalMemory.

mrcslws · 2016-07-07T21:21:39Z

src/nupic/research/temporal_memory.py

-
+    self.activeCells = []
+    self.activeSegments = []
+    self.winnerCells = []


This should also reset the matchingSegments.

(I realize the code had this issue before. It might not have caused problems because the old code doesn't make full use of the matchingSegments, I think it only used them for punishment but not for computing the best matching segment -- it essentially computed the matching segments twice.)

mrcslws · 2016-07-09T02:24:14Z

I left a couple more documentation comments, but I'm otherwise done reviewing. It looks good.

The one open issue for me: I still don't love the static methods. It makes the invocations really long, e.g. TemporalMemory.activatePredictedColumn(). Since these aren't intended to be part of the class's public interface, I think these would make sense as regular functions outside the class, similar to the C++. I figure we'll bring in @scottpurdy to discuss. :)

scottpurdy · 2016-07-09T04:24:47Z

src/nupic/research/connections.py

+                      activeSynapseThreshold, matchingPermananceThreshold,
+                      matchingSynapseThreshold):
+    """
+    Computes active and matching segments given the current active input.


You can start doc strings on the first line:
https://www.python.org/dev/peps/pep-0257/

andrewmalta13 · 2016-07-11T23:33:59Z

@scottpurdy addressed your initial comments.

scottpurdy · 2016-07-12T23:33:10Z

@mrcslws let me know if/when you're happy with this and I'll do final review

mrcslws · 2016-07-13T00:12:29Z

Looks good! 👍 @scottpurdy I'll be curious to see what you think about the static methods and how they're invoked. I can go either way.

scottpurdy · 2016-07-13T00:29:35Z

src/nupic/research/connections.py

+      if numMatchingSynapsesForSegment[i] >= matchingSynapseThreshold:
+        matchingSegments.append(i)
+
+    return (sorted(activeSegments, cmp=self.segmentCmp),


Using lambda functions is probably easier here since it's such a simple comparison and used only in these two places:

segmentCmp = lambda a, b: self._segments[a] - self._segments[b] return (sorted(activeSegments, cmp=segmentCmp), sorted(activeSegments, cmp=segmentCmp))

scottpurdy · 2016-07-13T00:38:57Z

Skimmed through and put a few comments.

…-columnSegmentWalk

scottpurdy · 2016-07-14T18:19:22Z

@andrewmalta13 this looks good to merge except that we can't break the serialization. It seems from your description that the new serialization works but the old serialization breaks. @mrcslws can you advise on this (since I recall we discussed how to handle it in the C++).

mrcslws · 2016-07-14T19:02:30Z

Andrew and I looked into the serialization issue a little bit. Independently of this change, the Connections class's segment numbers get rearranged when serialized and deserialized. This is true independent of this change. So I think the activeSegments and matchingSegments have always been screwy for one timestep after deserialization, and this new implementation relies even more on them being correct, whereas the previous implementation stored predictive cells, matching cells rather than just using the segments.

Maybe we should figure out the fix for that issue, then decide what to do? I think it will give us some insight.

andrewmalta13 · 2016-07-14T19:09:38Z

Yeah Marcus and I were a little thrown that is was working before the changes. I thought at first that it was breaking because of something specific to the column segment walk algorithm, but it seems that it is doing something equally weird in both implementations. I think it really comes down to the fact that we store the arrays of matching and active segments in the serialization which both are just arrays of integers corresponding to the segments and then we renumber segments in deserialization. This leads us with incorrect matching and active segments for the next time step which is certain to cause trouble with the new implementation.

It seems as though the action item would then be:

Ensure that the numbering of segments stays consistent throughout the serialization. (this would require a new serialization format for the connections object)

I will update the issue I filed for this:
#3199

scottpurdy · 2016-07-14T19:53:41Z

Can you guys propose a resolution to the serialization issue? Then we can decide how to proceed. If this is a problem already then it may be fine to merge this without the fix but we should address that ASAP and make sure there are tests to enforce proper behavior.

…-columnSegmentWalk

andrewmalta13 · 2016-07-18T21:18:22Z

@scottpurdy discussed with @mrcslws and we both agree that the best way to approach the issue is not to change how we are serializing, but rather change how we keep track of segments in connections.py. Right now a segment is just an integer determined by the order it was created. This differs from the C++ implementation which uses a struct that contains a similar id for a segment. If we make this change, the serialization would not mess up the numbering of these segments as before. The questions I have are:

Is this an acceptable resolution in your eyes
If so should this be included in this PR or a separate one?

mrcslws · 2016-07-18T21:48:51Z

I think I vote "let's fix that separately". It's not surprising that using a flat list of segments in the Python Connections, combined with a Cell -> Segments -> Synapses serialization format, leads to the segment indices getting scrambled on serialize + deserialize.

scottpurdy · 2016-07-18T21:56:11Z

Yes that's a good solution.
Given that this isn't a new issue and existed before this PR, yes you can do that as a follow up PR. Does that mean this PR is ready to go?

andrewmalta13 · 2016-07-18T22:00:30Z

@scottpurdy unless there is anything else you want to be included, it is ready to merge.

scottpurdy · 2016-07-18T22:40:33Z

👍

andrewmalta13 added 18 commits June 27, 2016 16:31

initial implementation

3908315

skip serialization test

2e99d31

change to column generator

af063d3

passes unit tests, tutorial_temporal_memory, and extensive tests

3b8bda5

removes debugging prints and skips serialization test

c2fc03d

removes debugging comments

ca9d304

removes debug comments from connections

44b6906

cleans up sloppy computeActivity

384b870

Merge branch 'master' of https://github.com/numenta/nupic into python…

2efc991

…-columnSegmentWalk

addresses some implementation bugs

a9dff1b

incorporates binary search into the algorithm where applicable

f3faf02

misc changes to connections

d7f631a

removes some unneccesary validation checks

88d26e8

updates documentation and style

29fcbb4

removes pylint warnings I introduced

25d4e73

fixes pylint warnings in temporal memory unit tests

283180b

updates connections tests

87fa098

updated docstrings

7e9faf1

removes testing file

d02c6fb

mrcslws reviewed Jul 7, 2016
View reviewed changes

addresses some comments

2696941

andrewmalta13 mentioned this pull request Jul 7, 2016

Copies phases temporal memory implementation to nupic.research numenta/htmresearch#568

Closed

mrcslws reviewed Jul 7, 2016
View reviewed changes

scottpurdy reviewed Jul 9, 2016
View reviewed changes

addresses comments and updates style to pass pylint

9aa84d5

scottpurdy reviewed Jul 13, 2016
View reviewed changes

andrewmalta13 added 2 commits July 13, 2016 09:32

replaces segmentCMP with lambda and updates docstrings

b5cd93f

Merge branch 'master' of https://github.com/numenta/nupic into python…

6ebafaa

…-columnSegmentWalk

andrewmalta13 mentioned this pull request Jul 13, 2016

Copies phases temporal memory implementation to nupic.research numenta/htmresearch#569

Merged

fixes some pylint warnings

010cd7a

andrewmalta13 added 2 commits July 18, 2016 14:08

adds pseudocode to core functions

92cbaaa

Merge branch 'master' of https://github.com/numenta/nupic into python…

f3a1a54

…-columnSegmentWalk

scottpurdy merged commit 004ada6 into numenta:master Jul 18, 2016

cogmission mentioned this pull request Jul 25, 2016

From nupic.core [#3197] Replace Phases w/columnSegmentWalk numenta/htm.java#421

Closed

andrewmalta13 deleted the python-columnSegmentWalk branch August 15, 2016 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python column segment walk #3197

Python column segment walk #3197

andrewmalta13 commented Jul 6, 2016

numenta-ci commented Jul 6, 2016

mrcslws Jul 7, 2016

andrewmalta13 Jul 7, 2016

andrewmalta13 commented Jul 7, 2016 •

edited

mrcslws Jul 7, 2016

mrcslws commented Jul 9, 2016

scottpurdy Jul 9, 2016

andrewmalta13 commented Jul 11, 2016

scottpurdy commented Jul 12, 2016

mrcslws commented Jul 13, 2016

scottpurdy Jul 13, 2016

scottpurdy commented Jul 13, 2016

scottpurdy commented Jul 14, 2016

mrcslws commented Jul 14, 2016

andrewmalta13 commented Jul 14, 2016 •

edited

scottpurdy commented Jul 14, 2016

andrewmalta13 commented Jul 18, 2016

mrcslws commented Jul 18, 2016

scottpurdy commented Jul 18, 2016

andrewmalta13 commented Jul 18, 2016

scottpurdy commented Jul 18, 2016

Python column segment walk #3197

Python column segment walk #3197

Conversation

andrewmalta13 commented Jul 6, 2016

numenta-ci commented Jul 6, 2016

mrcslws Jul 7, 2016

Choose a reason for hiding this comment

andrewmalta13 Jul 7, 2016

Choose a reason for hiding this comment

andrewmalta13 commented Jul 7, 2016 • edited

mrcslws Jul 7, 2016

Choose a reason for hiding this comment

mrcslws commented Jul 9, 2016

scottpurdy Jul 9, 2016

Choose a reason for hiding this comment

andrewmalta13 commented Jul 11, 2016

scottpurdy commented Jul 12, 2016

mrcslws commented Jul 13, 2016

scottpurdy Jul 13, 2016

Choose a reason for hiding this comment

scottpurdy commented Jul 13, 2016

scottpurdy commented Jul 14, 2016

mrcslws commented Jul 14, 2016

andrewmalta13 commented Jul 14, 2016 • edited

scottpurdy commented Jul 14, 2016

andrewmalta13 commented Jul 18, 2016

mrcslws commented Jul 18, 2016

scottpurdy commented Jul 18, 2016

andrewmalta13 commented Jul 18, 2016

scottpurdy commented Jul 18, 2016

andrewmalta13 commented Jul 7, 2016 •

edited

andrewmalta13 commented Jul 14, 2016 •

edited