Skip to content
This repository has been archived by the owner on Sep 1, 2023. It is now read-only.

Python column segment walk #3197

Merged
merged 32 commits into from Jul 18, 2016

Conversation

andrewmalta13
Copy link
Contributor

fixes #3116

Implements the column segment walk described here .

The implementations are very similar but not exact as there are differences in connections.py and connections.cpp which affect the implementation. (perhaps we can discuss if we eventually want these to agree to ensure complete alignment between the implementations).

Some of the differences that exist due to this discrepancy are:

  1. Cpp version delete segments when its last synapse is destroyed and the python version does not.
  2. When adding segments or synapses, if the number of created segments or synapses exceeds the parameters determining the maximum number of synapses on a segment or maximum number of segments on a cell:
    i) Cpp version deletes the least recently used segments and the minimum permanence synapses to make room for a new segment/synapse.
    ii) Python version ignores the parameters entirely (this does not seem right, but was in the old implementation) and creates them anyway without removing anything.
  3. Less critical, but the data structures used to represent segments and cells are different. CPP has SegmentOverlap struct and uses a struct to represent a segment. Python version just uses an int to index into some dicts to represent a segment. Cpp uses a dumb Cell struct wrapper that just contains an int, where python just represents a cell by an int. (effectively the same, but a bit annoying on the cpp side to always have to do something like int cellindex = cell.idx;

Furthermore, this implementation does not seem to work with the old serialization methods as it does not preserve the numbering of segments. If this is confirmed to be a result of the new algorithm, I will submit an github issue to address this, but as of right now I am not 100% certain it isn't an implementation bug.

Please Review:
@mrcslws

@numenta-ci
Copy link
Contributor

By analyzing the blame information on this pull request, we identified @chetan51, @david-ragazzi and @ywcui1990 to be potential reviewers

prevMatchingSegments)
"""
prevActiveCells = sorted(self.activeCells)
prevWinnerCells = sorted(self.winnerCells)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting these shouldn't be necessary. The cells / columns are walked in order, so these lists are always sorted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@andrewmalta13
Copy link
Contributor Author

andrewmalta13 commented Jul 7, 2016

@mrcslws , incorporates the changes you suggested except the more functional change you suggested. We can talk about it when you get in. I think I agree with the change, but we probably should discuss if we still want to be able to support extending TemporalMemory.


self.activeCells = []
self.activeSegments = []
self.winnerCells = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also reset the matchingSegments.

(I realize the code had this issue before. It might not have caused problems because the old code doesn't make full use of the matchingSegments, I think it only used them for punishment but not for computing the best matching segment -- it essentially computed the matching segments twice.)

@mrcslws
Copy link
Contributor

mrcslws commented Jul 9, 2016

I left a couple more documentation comments, but I'm otherwise done reviewing. It looks good.

The one open issue for me: I still don't love the static methods. It makes the invocations really long, e.g. TemporalMemory.activatePredictedColumn(). Since these aren't intended to be part of the class's public interface, I think these would make sense as regular functions outside the class, similar to the C++. I figure we'll bring in @scottpurdy to discuss. :)

activeSynapseThreshold, matchingPermananceThreshold,
matchingSynapseThreshold):
"""
Computes active and matching segments given the current active input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can start doc strings on the first line:
https://www.python.org/dev/peps/pep-0257/

@andrewmalta13
Copy link
Contributor Author

@scottpurdy addressed your initial comments.

@scottpurdy
Copy link
Contributor

@mrcslws let me know if/when you're happy with this and I'll do final review

@mrcslws
Copy link
Contributor

mrcslws commented Jul 13, 2016

Looks good! 👍 @scottpurdy I'll be curious to see what you think about the static methods and how they're invoked. I can go either way.

if numMatchingSynapsesForSegment[i] >= matchingSynapseThreshold:
matchingSegments.append(i)

return (sorted(activeSegments, cmp=self.segmentCmp),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using lambda functions is probably easier here since it's such a simple comparison and used only in these two places:

segmentCmp = lambda a, b: self._segments[a] - self._segments[b]
return (sorted(activeSegments, cmp=segmentCmp),
        sorted(activeSegments, cmp=segmentCmp))

@scottpurdy
Copy link
Contributor

Skimmed through and put a few comments.

@scottpurdy
Copy link
Contributor

@andrewmalta13 this looks good to merge except that we can't break the serialization. It seems from your description that the new serialization works but the old serialization breaks. @mrcslws can you advise on this (since I recall we discussed how to handle it in the C++).

@mrcslws
Copy link
Contributor

mrcslws commented Jul 14, 2016

Andrew and I looked into the serialization issue a little bit. Independently of this change, the Connections class's segment numbers get rearranged when serialized and deserialized. This is true independent of this change. So I think the activeSegments and matchingSegments have always been screwy for one timestep after deserialization, and this new implementation relies even more on them being correct, whereas the previous implementation stored predictive cells, matching cells rather than just using the segments.

Maybe we should figure out the fix for that issue, then decide what to do? I think it will give us some insight.

@andrewmalta13
Copy link
Contributor Author

andrewmalta13 commented Jul 14, 2016

Yeah Marcus and I were a little thrown that is was working before the changes. I thought at first that it was breaking because of something specific to the column segment walk algorithm, but it seems that it is doing something equally weird in both implementations. I think it really comes down to the fact that we store the arrays of matching and active segments in the serialization which both are just arrays of integers corresponding to the segments and then we renumber segments in deserialization. This leads us with incorrect matching and active segments for the next time step which is certain to cause trouble with the new implementation.

It seems as though the action item would then be:

  • Ensure that the numbering of segments stays consistent throughout the serialization. (this would require a new serialization format for the connections object)

I will update the issue I filed for this:
#3199

@scottpurdy
Copy link
Contributor

Can you guys propose a resolution to the serialization issue? Then we can decide how to proceed. If this is a problem already then it may be fine to merge this without the fix but we should address that ASAP and make sure there are tests to enforce proper behavior.

@andrewmalta13
Copy link
Contributor Author

@scottpurdy discussed with @mrcslws and we both agree that the best way to approach the issue is not to change how we are serializing, but rather change how we keep track of segments in connections.py. Right now a segment is just an integer determined by the order it was created. This differs from the C++ implementation which uses a struct that contains a similar id for a segment. If we make this change, the serialization would not mess up the numbering of these segments as before. The questions I have are:

  1. Is this an acceptable resolution in your eyes
  2. If so should this be included in this PR or a separate one?

@mrcslws
Copy link
Contributor

mrcslws commented Jul 18, 2016

I think I vote "let's fix that separately". It's not surprising that using a flat list of segments in the Python Connections, combined with a Cell -> Segments -> Synapses serialization format, leads to the segment indices getting scrambled on serialize + deserialize.

@scottpurdy
Copy link
Contributor

  1. Yes that's a good solution.
  2. Given that this isn't a new issue and existed before this PR, yes you can do that as a follow up PR. Does that mean this PR is ready to go?

@andrewmalta13
Copy link
Contributor Author

@scottpurdy unless there is anything else you want to be included, it is ready to merge.

@scottpurdy
Copy link
Contributor

👍

@scottpurdy scottpurdy merged commit 004ada6 into numenta:master Jul 18, 2016
@andrewmalta13 andrewmalta13 deleted the python-columnSegmentWalk branch August 15, 2016 18:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

From nupic.core [#917] Replace Phases w/columnSegmentWalk
6 participants