Python column segment walk #3197
Python column segment walk #3197
Conversation
…-columnSegmentWalk
By analyzing the blame information on this pull request, we identified @chetan51, @david-ragazzi and @ywcui1990 to be potential reviewers |
prevMatchingSegments) | ||
""" | ||
prevActiveCells = sorted(self.activeCells) | ||
prevWinnerCells = sorted(self.winnerCells) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting these shouldn't be necessary. The cells / columns are walked in order, so these lists are always sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@mrcslws , incorporates the changes you suggested except the more functional change you suggested. We can talk about it when you get in. I think I agree with the change, but we probably should discuss if we still want to be able to support extending TemporalMemory. |
|
||
self.activeCells = [] | ||
self.activeSegments = [] | ||
self.winnerCells = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also reset the matchingSegments
.
(I realize the code had this issue before. It might not have caused problems because the old code doesn't make full use of the matchingSegments, I think it only used them for punishment but not for computing the best matching segment -- it essentially computed the matching segments twice.)
I left a couple more documentation comments, but I'm otherwise done reviewing. It looks good. The one open issue for me: I still don't love the static methods. It makes the invocations really long, e.g. |
activeSynapseThreshold, matchingPermananceThreshold, | ||
matchingSynapseThreshold): | ||
""" | ||
Computes active and matching segments given the current active input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can start doc strings on the first line:
https://www.python.org/dev/peps/pep-0257/
@scottpurdy addressed your initial comments. |
@mrcslws let me know if/when you're happy with this and I'll do final review |
Looks good! 👍 @scottpurdy I'll be curious to see what you think about the static methods and how they're invoked. I can go either way. |
if numMatchingSynapsesForSegment[i] >= matchingSynapseThreshold: | ||
matchingSegments.append(i) | ||
|
||
return (sorted(activeSegments, cmp=self.segmentCmp), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using lambda functions is probably easier here since it's such a simple comparison and used only in these two places:
segmentCmp = lambda a, b: self._segments[a] - self._segments[b]
return (sorted(activeSegments, cmp=segmentCmp),
sorted(activeSegments, cmp=segmentCmp))
Skimmed through and put a few comments. |
@andrewmalta13 this looks good to merge except that we can't break the serialization. It seems from your description that the new serialization works but the old serialization breaks. @mrcslws can you advise on this (since I recall we discussed how to handle it in the C++). |
Andrew and I looked into the serialization issue a little bit. Independently of this change, the Connections class's segment numbers get rearranged when serialized and deserialized. This is true independent of this change. So I think the Maybe we should figure out the fix for that issue, then decide what to do? I think it will give us some insight. |
Yeah Marcus and I were a little thrown that is was working before the changes. I thought at first that it was breaking because of something specific to the column segment walk algorithm, but it seems that it is doing something equally weird in both implementations. I think it really comes down to the fact that we store the arrays of matching and active segments in the serialization which both are just arrays of integers corresponding to the segments and then we renumber segments in deserialization. This leads us with incorrect matching and active segments for the next time step which is certain to cause trouble with the new implementation. It seems as though the action item would then be:
I will update the issue I filed for this: |
Can you guys propose a resolution to the serialization issue? Then we can decide how to proceed. If this is a problem already then it may be fine to merge this without the fix but we should address that ASAP and make sure there are tests to enforce proper behavior. |
@scottpurdy discussed with @mrcslws and we both agree that the best way to approach the issue is not to change how we are serializing, but rather change how we keep track of segments in connections.py. Right now a segment is just an integer determined by the order it was created. This differs from the C++ implementation which uses a struct that contains a similar id for a segment. If we make this change, the serialization would not mess up the numbering of these segments as before. The questions I have are:
|
I think I vote "let's fix that separately". It's not surprising that using a flat list of segments in the Python |
|
@scottpurdy unless there is anything else you want to be included, it is ready to merge. |
👍 |
fixes #3116
Implements the column segment walk described here .
The implementations are very similar but not exact as there are differences in connections.py and connections.cpp which affect the implementation. (perhaps we can discuss if we eventually want these to agree to ensure complete alignment between the implementations).
Some of the differences that exist due to this discrepancy are:
i) Cpp version deletes the least recently used segments and the minimum permanence synapses to make room for a new segment/synapse.
ii) Python version ignores the parameters entirely (this does not seem right, but was in the old implementation) and creates them anyway without removing anything.
int cellindex = cell.idx;
Furthermore, this implementation does not seem to work with the old serialization methods as it does not preserve the numbering of segments. If this is confirmed to be a result of the new algorithm, I will submit an github issue to address this, but as of right now I am not 100% certain it isn't an implementation bug.
Please Review:
@mrcslws