Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select cells consistently, esp. when beginning sequences #34

Closed
floybix opened this issue Oct 25, 2015 · 54 comments
Closed

select cells consistently, esp. when beginning sequences #34

floybix opened this issue Oct 25, 2015 · 54 comments

Comments

@floybix
Copy link
Member

floybix commented Oct 25, 2015

When beginning a sequence (or after a sequence reset/break), there is no distal input, so no basis for choosing a winner/learning cell in each column. Cells are then chosen at random.

That random selection is a problem because when the same sequence is presented several times (in isolation) they will begin on different cells; and will consequently not reinforce previous learning, but will have partial learning spread across several cells. This can be seen in repeated sequence demos, where the whole sequence is learned but it keeps bursting.

Proposal - I think it would be better to start on the same cell consistently. The first cell.

Perhaps more generally the choice of winner/learning cell (when there are no predictive cells in a column) should not be completely random but should be a deterministic function of the set of previously-active cells. And it should be a robust function, so that similar activity consistently selects the same cells.

Proposal - Select cell number as (mod depth) of each distal input bit, and take the mode of that. Offset by the current column number (mod depth again), otherwise all cells would be synchronised and we lose combinatorial capacity (see #31).

Needs testing.

@robjfr
Copy link

robjfr commented Oct 25, 2015

When beginning a sequence (or after a sequence reset/break), there is no distal input, so no basis for choosing a winner/learning cell in each column. Cells are then chosen at random.

That random selection is a problem because when the same sequence is presented several times (in isolation) they will begin on different cells; and will consequently not reinforce previous learning, but will have partial learning spread across several cells. This can be seen in repeated sequence demos, where the whole sequence is learned but it keeps bursting at progressively later steps through the cycle.

Proposal - I think it would be better to start on the same cell consistently. The first cell.

You may be right. I think this is what I was referring to as the need for "consolidation" across contexts: cells would need to be consolidated to the columns they predict before we judge whether they indicate similar contexts.

Setting to a consistent cell would solve this.

But it did occur to me that there might be an advantage in having different cells for the same context at different locations in the global sequence. In a way coding with a different cell each time gives us more information. It complicates things a little, but we don't lose anything. And, perhaps importantly, it gives us a means to code the strength of a transition when cells are merged. We could judge the strength of a transition by counting how many different cells linked the respective columns.

@cogmission
Copy link
Member

Felix!

I'm going to quote your thoughts and share them with NuPIC Community - this is true and a very insightful realization. Do you mind? This is the kind of stuff that I was talking about previously that shouldn't get lost and will inevitably get lost if it's not shared...

Cheers,
David

Sent from my iPhone

On Oct 25, 2015, at 4:08 AM, Felix Andrews notifications@github.com wrote:

When beginning a sequence (or after a sequence reset/break), there is no distal input, so no basis for choosing a winner/learning cell in each column. Cells are then chosen at random.

That random selection is a problem because when the same sequence is presented several times (in isolation) they will begin on different cells; and will consequently not reinforce previous learning, but will have partial learning spread across several cells. This can be seen in repeated sequence demos, where the whole sequence is learned but it keeps bursting at progressively later steps through the cycle.

Proposal - I think it would be better to start on the same cell consistently. The first cell.

Perhaps more generally the choice of winner/learning cell (when there is no predictive signal) should not be completely random but should be a deterministic function of the set of previously-active cells. And it should be a robust function, so that similar activity consistently selects the same cells.

Proposal - Select cell number as (mod depth) of each distal input bit, and take the mode of that. Offset by the current column number (mod depth again), otherwise all cells would be synchronised and we lose combinatorial capacity (see #31).

Needs testing.


Reply to this email directly or view it on GitHub.

@cogmission
Copy link
Member

I'll wait for your response before sharing, just in case you have another
plan in mind. But I am scared this kind of stuff will be lost or worse,
others in the community will waste time duplicating your insights?

Cheers,
David

On Sun, Oct 25, 2015 at 9:21 AM, David Ray cognitionmission@gmail.com
wrote:

Felix!

I'm going to quote your thoughts and share them with NuPIC Community -
this is true and a very insightful realization. Do you mind? This is the
kind of stuff that I was talking about previously that shouldn't get lost
and will inevitably get lost if it's not shared...

Cheers,
David

Sent from my iPhone

On Oct 25, 2015, at 4:08 AM, Felix Andrews notifications@github.com
wrote:

When beginning a sequence (or after a sequence reset/break), there is no
distal input, so no basis for choosing a winner/learning cell in each
column. Cells are then chosen at random.

That random selection is a problem because when the same sequence is
presented several times (in isolation) they will begin on different cells;
and will consequently not reinforce previous learning, but will have
partial learning spread across several cells. This can be seen in repeated
sequence demos, where the whole sequence is learned but it keeps bursting
at progressively later steps through the cycle.

Proposal - I think it would be better to start on the same cell
consistently. The first cell.

Perhaps more generally the choice of winner/learning cell (when there is
no predictive signal) should not be completely random but should be a
deterministic function of the set of previously-active cells. And it should
be a robust function, so that similar activity consistently selects the
same cells.

Proposal - Select cell number as (mod depth) of each distal input bit,
and take the mode of that. Offset by the current column number (mod depth
again), otherwise all cells would be synchronised and we lose combinatorial
capacity (see #31 #31
).

Needs testing.


Reply to this email directly or view it on GitHub
#34.

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

@cogmission
Copy link
Member

Here's an idea. Maybe some insight can be gained from the fact that a
"reset" is an unnatural occurrence? The neocortex doesn't have
"meta-knowledge" of the sequences its seeing, so there must be some
distinguishing function at play to indicate a new sequence as opposed to
one that is merely being continued? And maybe understanding that will yield
some insight on the process of training cells for re-occurring sequences as
opposed to new ones not previously seen?

Just a simple observation...

On Sun, Oct 25, 2015 at 9:28 AM, cogmission (David Ray) <
cognitionmission@gmail.com> wrote:

I'll wait for your response before sharing, just in case you have another
plan in mind. But I am scared this kind of stuff will be lost or worse,
others in the community will waste time duplicating your insights?

Cheers,
David

On Sun, Oct 25, 2015 at 9:21 AM, David Ray cognitionmission@gmail.com
wrote:

Felix!

I'm going to quote your thoughts and share them with NuPIC Community -
this is true and a very insightful realization. Do you mind? This is the
kind of stuff that I was talking about previously that shouldn't get lost
and will inevitably get lost if it's not shared...

Cheers,
David

Sent from my iPhone

On Oct 25, 2015, at 4:08 AM, Felix Andrews notifications@github.com
wrote:

When beginning a sequence (or after a sequence reset/break), there is no
distal input, so no basis for choosing a winner/learning cell in each
column. Cells are then chosen at random.

That random selection is a problem because when the same sequence is
presented several times (in isolation) they will begin on different cells;
and will consequently not reinforce previous learning, but will have
partial learning spread across several cells. This can be seen in repeated
sequence demos, where the whole sequence is learned but it keeps bursting
at progressively later steps through the cycle.

Proposal - I think it would be better to start on the same cell
consistently. The first cell.

Perhaps more generally the choice of winner/learning cell (when there is
no predictive signal) should not be completely random but should be a
deterministic function of the set of previously-active cells. And it should
be a robust function, so that similar activity consistently selects the
same cells.

Proposal - Select cell number as (mod depth) of each distal input bit,
and take the mode of that. Offset by the current column number (mod depth
again), otherwise all cells would be synchronised and we lose combinatorial
capacity (see #31
#31).

Needs testing.


Reply to this email directly or view it on GitHub
#34.

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

@floybix
Copy link
Member Author

floybix commented Oct 25, 2015

@cogmission I don't have a problem with you sharing this, although at this stage my statement of the problem and my proposals aren't really backed up with evidence.

Not sure I agree with you that resets are unnatural. I think that's what happens when you shift attention.

@floybix
Copy link
Member Author

floybix commented Oct 25, 2015

@cogmission ah yes I see what you mean. Even within a period of unbroken attention, one can distinguish parts, even out of context. That seems to be a higher-level property like temporal pooling. Whether appropriate cell selection is best thought of as a prerequisite, or a consequence, of temporal pooling: I'm not sure.

@cogmission
Copy link
Member

Ok. But I don't think you have to wait to present your "suspicions" or
ideas to the community. There is obviously something to think about here.
Even if you think this through as thoroughly as you want, there may still
be something someone else can contribute, that maybe your understanding
prevents you from seeing? I agree that it may not even be a disadvantage or
represent a "problem" necessarily -or- maybe some of the conclusions you
initially came to regarding the impact of choosing different cells for
learning previous sequences may not be the "whole picture", but I guess I
think it's good to share it anyway to get some others thinking about it?

I'm not trying to tell you what to do or anything - I just think the work
you're doing is very important (you too Rob), and hesitation with sharing
it due to concerns of correctness may not be to anyone's advantage?

Anyway, sorry for the distraction...

Cheers,
David

On Sun, Oct 25, 2015 at 9:55 AM, Felix Andrews notifications@github.com
wrote:

@cogmission https://github.com/cogmission I don't have a problem with
you sharing this, although at this stage my statement of the problem and my
proposals aren't really backed up with evidence.

Not sure I agree with you that resets are unnatural. I think that's what
happens when you shift attention.


Reply to this email directly or view it on GitHub
#34 (comment)
.

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

@mrcslws
Copy link
Collaborator

mrcslws commented Oct 25, 2015

Idea: do both. e.g. maybe select two cells.

One of them represents the possibility that this is the beginning of a new sequence. The other represents the possibility that it's a continuation of the current sequence.

It's similar to the Escher-like perspective switches that Rob talks about. Let both be true for a while, until one of them fizzles out.

A useful example ("use full exam pull") is disambiguation of sequences of syllables. At all points we need to try both possibilities: (1) the syllable is the start of a new sequence, or (2) the syllable is a continuation of the previous sequence.

@fergalbyrne
Copy link

We just asked Subutai on the Challenge Hangout. In NuPIC all cells in the initial active columns A become active, so prediction is passed to all second time step cells. Whichever feedforward input comes in next will use all A cells to learn from.

@cogmission
Copy link
Member

Fergal, I think it will select the best matching cell of those - to learn
from:

Here's the poignant line in temporal_memory.burstColumns()

https://github.com/numenta/nupic/blob/f06bed0931aa3879fcd18da91e2526df8d314476/src/nupic/research/temporal_memory.py#L288

On Mon, Oct 26, 2015 at 12:06 PM, Fergal Byrne notifications@github.com
wrote:

We just asked Subutai on the Challenge Hangout. In NuPIC all cells in the
initial active columns A become active, so prediction is passed to all
second time step cells. Whichever feedforward input comes in next will use
all A cells to learn from.


Reply to this email directly or view it on GitHub
#34 (comment)
.

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

@mrcslws
Copy link
Collaborator

mrcslws commented Oct 26, 2015

David, I think you're referring to the "learning" cells. Fergal, I think you're referring to the "learnable" cells. Both NuPIC and Comportex select the best match for the "learning" cell. But Comportex goes on to use that as the "learnable" cell in the next time step, whereas NuPIC treats all previously active cells as "learnable".

(I'm still forming my thoughts on this, just wanted to untangle that.)

@mrcslws
Copy link
Collaborator

mrcslws commented Oct 26, 2015

In other words, what Fergal said makes sense.

@cogmission
Copy link
Member

Please explain the difference between "learn-(ing)" and "learn-(able)" as
used in Comportex, because I don't think NuPIC has an allegory for that?

The learn(ing) cell in NuPIC is the Cell chosen either:

  1. from cells called "best matching" which have a "threshold" or greater
    number of connecting segments. In Felix's inquiry this would be the cell
    which portends the selection of the sequence being repeated (although we
    have reset).
  2. if no cell is above the aforementioned threshold, a random cell is
    then selected to learn on. But only then is a random cell selected.

The above mechanism avoids the situation that Felix is inquiring into.

Now, what a "learn(able)" cell is, I have no idea but how is that used in
Comportex?

Cheers,
David

On Mon, Oct 26, 2015 at 12:23 PM, Marcus Lewis notifications@github.com
wrote:

In other words, what Fergal said makes sense.


Reply to this email directly or view it on GitHub
#34 (comment)
.

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

@cogmission
Copy link
Member

Sorry, please substitute the word "Synapse" for "Segment" above where it
says, "...greater number of connecting segments"

On Mon, Oct 26, 2015 at 12:34 PM, cogmission (David Ray) <
cognitionmission@gmail.com> wrote:

Please explain the difference between "learn-(ing)" and "learn-(able)" as
used in Comportex, because I don't think NuPIC has an allegory for that?

The learn(ing) cell in NuPIC is the Cell chosen either:

  1. from cells called "best matching" which have a "threshold" or greater
    number of connecting segments. In Felix's inquiry this would be the cell
    which portends the selection of the sequence being repeated (although we
    have reset).
  2. if no cell is above the aforementioned threshold, a random cell is
    then selected to learn on. But only then is a random cell selected.

The above mechanism avoids the situation that Felix is inquiring into.

Now, what a "learn(able)" cell is, I have no idea but how is that used in
Comportex?

Cheers,
David

On Mon, Oct 26, 2015 at 12:23 PM, Marcus Lewis notifications@github.com
wrote:

In other words, what Fergal said makes sense.


Reply to this email directly or view it on GitHub
#34 (comment)
.

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

@fergalbyrne
Copy link

Ok, that's a good distinction. In step 2, NuPIC treats all step 1 active bursting cells as potential precursors (learnable), and so builds the best predictive connection back to step 1 using all of them. This is the correct approach, since at time 1 we have no context - that's what reset() means. This will build a distribution of predictability from reset()-step 1-step 2 which depends on what has been seen.

So eg, a letter-reading HTM reading Wikipedia and seeing reset()-Q will predict U 95%, T 2% (the graphics toolkit), I 1% (Stephen Fry's show), and so on. By these percentages I mean that 95% of the predictive cells will be in U columns, 2% in T columns, etc. Upon getting U next, you burst all the predictive U cells (which beat their column cells) and carry on. The average next predicted letter is another distribution - I, O, A, E at the top, almost no U prediction, almost no consonant prediction.

I'm using reset()-Q as a pathological example to force home this point. Arguably, you could use reset()-T and get H, O, I, A, V... as your distribution. But you need to burst as NuPIC does to get this working - the next step acts as if all cells in the first column were active after the reset.

Does this make sense?

@cogmission
Copy link
Member

Hi Fergal,

From what I reason, your first paragraph rings true for me, and is aligned
with my previous explanation. I'm not sure about the second paragraph where
you are talking about percentages of predicted capital letters (I was
looking for the setup for using these letters in some example, are they on
the github issue and not among the emails being mailed out)?

You guys might also want more detail, because the process I spelled out is
a simplification of the actual process because there is some specificity in
what cells get selected as the set from which learning cells become
candidates.

If you are not interested in the hard-core details and only want to
understand this at a higher conceptual level, then don't read below this
line: :-P

If I were going to spell this out specifically, I would say that there is
a set of cells which are from the feed-forward activated columns but
weren't among the predicted cells in "t - 1". We'll call these the
unpredictedActiveCells.

Then there is a set of cells which are the "activeCells (A1)" in "t - 1".

For each Cell (C1) in unpredictedActiveCells, a segment (S1) leading to a
pre-synaptic cell (cell which connects to C1) which is also in the set of
activeCells(A1) and whose permanence is above 0, a counter is incremented.
If S1 has the most active Synapses it is chosen as the best matching
Segment, and the unpredictedActiveCell becomes the "learning cell".

Best matching Segments then have their synapses adjusted up or down as part
of the further learning if their synapse is part of the activeLearning
synapse set.

That's the best pseudo code explanation I can muster, in case that's
helpful to Felix, Rob, Marcus, Fergal or anyone else who would want a
summary of implementation details in NuPIC...

Cheers,
David

On Mon, Oct 26, 2015 at 1:58 PM, Fergal Byrne notifications@github.com
wrote:

Ok, that's a good distinction. In step 2, NuPIC treats all step 1 active
bursting cells as potential precursors (learnable), and so builds the best
predictive connection back to step 1 using all of them. This is the correct
approach, since at time 1 we have no context - that's what reset() means.
This will build a distribution of predictability from reset()-step 1-step 2
which depends on what has been seen.

So eg, a letter-reading HTM reading Wikipedia and seeing reset()-Q will
predict U 95%, T 2% (the graphics toolkit), I 1% (Stephen Fry's show), and
so on. By these percentages I mean that 95% of the predictive cells will be
in U columns, 2% in T columns, etc. Upon getting U next, you burst all the
predictive U cells (which beat their column cells) and carry on. The
average next predicted letter is another distribution - I, O, A, E at the
top, almost no U prediction, almost no consonant prediction.

I'm using reset()-Q as a pathological example to force home this point.
Arguably, you could use reset()-T and get H, O, I, A, V... as your
distribution. But you need to burst as NuPIC does to get this working - the
next step acts as if all cells in the first column were active after the
reset.

Does this make sense?


Reply to this email directly or view it on GitHub
#34 (comment)
.

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

@fergalbyrne
Copy link

Hi David,
We're only talking about what happens after a reset(). There are no predictions following a reset(), that's what it's for.

The example I'm using is for letter-by-letter sequence learning. The letter Q is almost always followed by U, but can be followed by a small number of other letters due to acronyms etc.

@cogmission
Copy link
Member

Right. Then the "activeCells" (predictive in "t - 1") are empty and that
part of Burst does nothing...

On Mon, Oct 26, 2015 at 2:56 PM, Fergal Byrne notifications@github.com
wrote:

Hi David,
We're only talking about what happens after a reset(). There are no
predictions following a reset(), that's what it's for.


Reply to this email directly or view it on GitHub
#34 (comment)
.

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

@floybix
Copy link
Member Author

floybix commented Oct 27, 2015

Morning...

In NuPIC all cells in the initial active columns A become active, so prediction is passed to all second time step cells. Whichever feedforward input comes in next will use all A cells to learn from.

After reading the NuPIC code, I think that (second sentence) is not entirely correct.

Both NuPIC and Comportex select the best match for the "learning" cell. But Comportex goes on to use that as the "learnable" cell in the next time step, whereas NuPIC treats all previously active cells as "learnable".

And this was a rephrasing of the same thing, so also not correct. (I contend that NuPIC and Comportex have the same basic treatment of learnable cells).

There are two parts to learning: growing new synapses, and reinforcing existing synapses. Reinforcement does apply to synapses from all bursting cells, assuming they already exist on an active -- sufficiently connected -- segment. However, growing new synapses in the first place only considers the winner cells; that is the part I was talking about.

NuPIC details /src/nupic/research/temporal_memory.py

  • From each bursting column a single winner cell and a single learning segment are chosen (by best match or otherwise randomly). code
  • On the learning segments, new synapses are grown only to the previous winner cells code

@cogmission
Copy link
Member

Hi Felix,

I guess the question is whether you are introducing a scenario that hasn't
been accounted for? And if it is unhandled in one or both codebases?
Subutai seems to think (if I may paraphrase his point), (and I agree) that
a burst following a reset will not result in a random cell being selected
to learn a sequence on unless there are no other cells which pass the
learnable candidate requirements. This is because the NuPIC TemporalMemory
always looks for a "bestMatchingCell" first, before falling back to the
random selection of a "winner" cell.

Cheers,
David

On Mon, Oct 26, 2015 at 8:17 PM, Felix Andrews notifications@github.com
wrote:

Morning...

In NuPIC all cells in the initial active columns A become active, so
prediction is passed to all second time step cells. Whichever feedforward
input comes in next will use all A cells to learn from.

After reading the NuPIC code, I think that (second sentence) is not
entirely correct.

Both NuPIC and Comportex select the best match for the "learning" cell.
But Comportex goes on to use that as the "learnable" cell in the next time
step, whereas NuPIC treats all previously active cells as "learnable".

And this was a rephrasing of the same thing, so also not correct. (I
contend that NuPIC and Comportex have the same basic treatment of learnable
cells).

There are two parts to learning: growing new synapses, and reinforcing
existing synapses. Reinforcement does apply to synapses from all bursting
cells, assuming they already exist on an active -- sufficiently connected
-- segment. However, growing new synapses in the first place only considers
the winner cells; that is the part I was talking about.

NuPIC details /src/nupic/research/temporal_memory.py


Reply to this email directly or view it on GitHub
#34 (comment)
.

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

@floybix
Copy link
Member Author

floybix commented Oct 27, 2015

@cogmission just the scenario I laid out in the original issue description. There can be no "bestMatchingCell" directly after a reset because there is no distal excitation. So the winner cells chosen in each column after a reset will be random (and in fact, biased away from any existing segments). And on the next step, new synapses grow only to those (random) winner cells.

@cogmission
Copy link
Member

Hi Felix,

Yep you're right...

The code is really hard to keep track of, but after combing through it a
bunch of times, I finally see that you are right about what results in a
"best matching cell" in NuPIC also, since the best match in NuPIC depends
on the intersection between cells from active feed-forward columns from the
SP, and previouslyActiveCells from previouslyPredictedColumns. After reset,
there are no previouslyPredictedColumns (since they are emptied or their
list is remade). Without the appearance in both the ff-actives and the
prevPredicted, a "leastUsedCell" (with the fewest segments) is chosen. If
there is more than one cell with the same minimum number of segments, a
cell is chosen from those cells randomly.

On Mon, Oct 26, 2015 at 9:41 PM, Felix Andrews notifications@github.com
wrote:

@cogmission https://github.com/cogmission just the scenario I laid out
in the original issue description. There can be no "bestMatchingCell"
directly after a reset because there is no distal excitation. So the winner
cells chosen in each column after a reset will be random (and in fact,
biased away from any existing segments). And on the next step, new synapses
grow only to those (random) winner cells.


Reply to this email directly or view it on GitHub
#34 (comment)
.

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

@floybix
Copy link
Member Author

floybix commented Oct 27, 2015

I'm reminded of @mrcslws's "flippant suggestion" in his essay:

(If HTM supported distal connections from cells to columns, it might handle this better. You could imagine flipping it around, starting out connected to the entire column and then pruning. Flippant suggestion.)

i.e. instead of picking a single random winner as the representative to learn from, grow synapses to all active cells. Effectively treating all cells in bursting columns as "winners" (therefore learnable) in the case where there are no partial matches. If there are partial matches -- with segment activation above the learning threshold -- those would be the winners in a bursting column.

For bursting within a sequence, the eventual representative cell for some context would emerge on subsequent occurrences as the best matching cell, according to whichever ended up with the most active synapses (synapses are grown by random selection from candidate source cells). But it would keep more options open, I think.

In a novel sequence we'd be doing a lot of work: growing segments on all bursting cells. So it would be slower.

It's all quite confusing. Needs testing.

@mrcslws I also liked your idea about treating every step as possibly the beginning of a sequence. But there are probably several ways to go about that kind of recognition / re-evaluation / backtracking and I'm not convinced real-time cell choice is the right way.

@rcrowder
Copy link
Member

I wonder if Alex @BoltzmannBrain is currently looking at this, in regard to Marcus's comment?

@mrcslws
Copy link
Collaborator

mrcslws commented Oct 27, 2015

From the essay, the idea of "connections from cells to columns" would apply to bursting columns, as Felix mentions, but it would also apply to non-bursting columns. In the essay I held this up as a way of learning first-order sequences, but actually it's a way of learning sequences that start with the previous input. The sequences might be first-order, or they might be longer.

My comment above suggests something related but different:

Idea: do both. e.g. maybe select two cells.

One of them represents the possibility that this is the beginning of a new sequence. The other represents the possibility that it's a continuation of the current sequence.

This might only apply to bursting columns, or it might happen all the time. I liked my example:

A useful example ("use full exam pull") is disambiguation of sequences of syllables. At all points we need to try both possibilities: (1) the syllable is the start of a new sequence, or (2) the syllable is a continuation of the previous sequence.

These two ideas could converge into one. E.g. use the first cell in the column to represent the input being the beginning of a sequence. Or other ways I'm sure we could imagine. We'd need a coherent story for distal learning. Maybe we apply the above idea for growing new synapses, but the synapse reinforcement follows slightly different rules, favoring cells that contain context whenever possible.

I still consider these suggestions flippant. I'd like to distance myself from these untested ideas. :)

But I am a little excited by the idea that "re-evaluation / backtracking" is an illusion, that the alternate possibilities are actually being maintained in realtime and we just don't notice them until the others are nullified.

@BoltzmannBrain
Copy link

@rcrowder I'm just now catching up on this discussion; echoing David's suggestion, the nupic discuss listserve would be a much better forum. Are you asking if a possible mechanism would be to grow synapses to all cells in a column, and then later prune such that one remains as the "learning" cell? This does not seem like a biologically accurate approach to me.

@mrcslws
Copy link
Collaborator

mrcslws commented Oct 28, 2015

For bursting columns that wouldn't be too radical. But yes, doing it for bursting and non-bursting columns was roughly the "flippant suggestion" from my essay. I'd never bring this up on the mailing list without testing it. Untested solutions are a dime a dozen.

Also, that idea was intentionally ugly, leaving room for it to resolve into a biologically-plausible equivalent. It was more about shining a light on the downside of high-order sequence memory.

Also, more context: on this thread I'm just a sideshow. :)

@floybix
Copy link
Member Author

floybix commented Oct 29, 2015

Essentially, the point is that it makes no sense to assign a random selection of cells after a reset because it can never be reproduced. It is a meaningless signal to learn. An example (I just ran) is the sequence of letters "hello hello hello hello hello" with resets between each word. If you do this with random cell selection (and with initial distal permanence below the connected threshold) then it never predicts the transition from "h" to "e".

I just looked at the old TP.py in NuPIC and that does have "start cells". In fact here it says:

# Start over on start cells if any of the following occur:
#  1.) A reset was just called
#  2.) We have been loo long out of sequence (the pamCounter has expired)
#  3.) We have reached maximum allowed sequence length.

@floybix
Copy link
Member Author

floybix commented Oct 29, 2015

@subutai you might want to consider a similar change to NuPIC's temporal_memory.py.

@cogmission
Copy link
Member

@subutai This is crucial. I agree whole heartedly. @rhyolight is you be
listening? :-P

We must account for this:
https://github.com/numenta/nupic/blob/master/src/nupic/research/temporal_memory.py#L530

The point that following a reset, and given the absence of
prevPredictedCells; down through getBestMatchingCell >
getBestMatchingSegment > getLeastUsedCell... we chose a random cell to
represent the up and coming sequence. The problem is that it may just be a
repeat of the one we just saw! So we will "teach" a new cell to have
affinity to the "old" sequence. In fact (as Felix points out), the point of
getLeastUsedCell is to bias us away from the tried and true, already
learned sequence - by using a cell with the least amount of segments.

Felix we can't be sure @subutai is now listening to this can we?

On Thu, Oct 29, 2015 at 2:51 AM, Felix Andrews notifications@github.com
wrote:

@subutai https://github.com/subutai you might want to consider a
similar change to NuPIC's temporal_memory.py.


Reply to this email directly or view it on GitHub
#34 (comment)
.

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

@subutai
Copy link

subutai commented Oct 29, 2015

@cogmission Is this the same issue you and Fergal discussed at the hangout on Monday? It sounds like it. If so, I don't believe it is an issue in temporal_memory.py. @floybix We could discuss this at the community meet up on the 13th? Please don't look at TP.py or TP10x.py - that implements a very different (hacky) algorithm (see Chetan's presentation from a couple of hackathons ago).

@cogmission
Copy link
Member

Hi @subutai,

Yep, same one - but after combing the code, it in fact looks like an issue.
We'd have to trace the code together for me to point it out, its not the
easiest to see - took me a while...

Cheers,
David

On Thu, Oct 29, 2015 at 10:34 AM, Subutai Ahmad notifications@github.com
wrote:

@cogmission https://github.com/cogmission Is this the same issue you
and Fergal discussed at the hangout on Monday? It sounds like it. If so, I
don't believe it is an issue in temporal_memory.py. @floybix
https://github.com/floybix We could discuss this at the community meet
up on the 13th? Please don't look at TP.py or TP10x.py - that implements a
very different (hacky) algorithm (see Chetan's presentation from a couple
of hackathons ago).


Reply to this email directly or view it on GitHub
#34 (comment)
.

With kind regards,

David Ray
Java Solutions Architect

Cortical.io http://cortical.io/
Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io
http://cortical.io

@subutai
Copy link

subutai commented Oct 29, 2015

@cogmission It is best to do this on a whiteboard - very difficult to do in github comment fields. One other criterion is that we should be able to come up with simple test cases that demonstrate the failure using temporal_memory.py in NuPIC. Do such tests exist for this change? That would help me understand the need for core algorithm changes.

@cogmission
Copy link
Member

@subutai I agree, It's a pain in the butt to type out minutiae painstakingly. :-P
Also, @floybix can point this out too - since he's the one who uncovered this - though I don't know how comfortable he is in the Python world either?

I think a test for this can be easily written. We train a sequence and save the learning cell. Then we call reset, and then train the same sequence - we will see that the same learning cell is not selected by comparing both cells. Does that sound like a good plan/test @floybix ?

Now what the ramifications are of not training the same learning cell, is another point that @floybix is investigating which you @subutai would probably have the best handle on...?

I would write the test but I couldn't submit it because I'm scared to death to update my version of NuPIC (it's been 6 months), because all my work hinges on me having a running Python version that I can reference and getting Java and Python running side by side is no easy task (at least for me because I'm not as comfortable in the Python universe).

@subutai
Copy link

subutai commented Oct 29, 2015

I think a test for this can be easily written. We train a sequence and save the learning cell. Then we call reset, and then train the same sequence - we will see that the same learning cell is not selected by comparing both cells.

That's not a sufficient test. We have to show that it is actually not performing well by some measure. Is it missing some predictions that it should otherwise make? Is it taking longer than it should? Is it making some other error?

@cogmission
Copy link
Member

I think @floybix was the one making qualitative assessments. I simply verified that the condition existed for myself, and attempted to communicate that. Beyond that, I couldn't say whether it results in attenuated behavior or not - that's up to you? I just thought that there being a "theoretical" problem, (which I would assume the training of multiple learning cells for the same sequence - would be) - would mean you might want to be on alert to get around that algorithmically. I do think @floybix wants to avoid this? @floybix ?

@floybix
Copy link
Member Author

floybix commented Oct 30, 2015

@subutai A test is the sequence of letters "hello hello hello hello hello" with resets between each word. If you do this with random cell selection (and with initial distal permanence below the connected threshold) then it never predicts the transition from "h" to "e".

But yeah, we can talk about it at the meetup if you want.

@subutai
Copy link

subutai commented Oct 30, 2015

@floybix I don't believe this is true. The routine bestMatchingSegment will pick the same winner cell each time in the "e" columns. This is true even if initial permanences are below the connected threshold. It won't ever get to the condition where it selects a random winner cell.

@floybix
Copy link
Member Author

floybix commented Oct 30, 2015

@subutai but the "h" cells are different every time (random, because of the
reset), so this is a completely new SDR, it won't match any segments on "e"
columns.

On Friday, 30 October 2015, Subutai Ahmad notifications@github.com wrote:

@floybix https://github.com/floybix I don't believe this is true. The
routine bestMatchingSegment will pick the same winner cell each time in
the "e" columns. This is true even if initial permanences are below the
connected threshold. It won't ever get to the condition where it selects a
random winner cell.


Reply to this email directly or view it on GitHub
#34 (comment)
.

Felix Andrews / 安福立
http://www.neurofractal.org/felix/

@floybix
Copy link
Member Author

floybix commented Oct 30, 2015

@subutai sorry that was unclear. Yes you are right that it will pick the
same winner cell each time in the "e" columns. Because all the "h" columns
are bursting. But because the "h" winners are different each time, the
synapses won't be reinforced, so it never gets above the connected
threshold, thus, "e" is never predicted.

On 30 October 2015 at 08:43, Felix Andrews felix@nfrac.org wrote:

@subutai but the "h" cells are different every time (random, because of
the reset), so this is a completely new SDR, it won't match any segments
on "e" columns.

On Friday, 30 October 2015, Subutai Ahmad notifications@github.com
wrote:

@floybix https://github.com/floybix I don't believe this is true. The
routine bestMatchingSegment will pick the same winner cell each time in
the "e" columns. This is true even if initial permanences are below the
connected threshold. It won't ever get to the condition where it selects a
random winner cell.


Reply to this email directly or view it on GitHub
#34 (comment)
.

Felix Andrews / 安福立
http://www.neurofractal.org/felix/

Felix Andrews / 安福立
http://www.neurofractal.org/felix/

@robjfr
Copy link

robjfr commented Oct 30, 2015

@floybix If I understand the discussion, it sounds like bursting should reset predicted cells, but not predicting cells.

@robjfr
Copy link

robjfr commented Oct 30, 2015

If predictions grow connections to the columns they predict, those connections should remain, even if the column bursts, should they not?

@floybix
Copy link
Member Author

floybix commented Oct 30, 2015

@robjfr No, you're not understanding; bursting doesn't cause a reset, rather a reset causes bursting.

This is all about when we impose a "reset" - a break to completely separate some new input from what came before. An edge case, really.

@cogmission
Copy link
Member

@subutai @floybix

The segment from bestMatchingSegment for the "h" columns are always null because there are no prevActiveCells following a reset. This results in bestMatchingCell calling getLeastUsedCell which randomly picks from the ff-active cells the cell with the least segments (thus biasing away from any highly trained "h" cell).

Because of this:

if bestSegment is not None:
        learningSegments.add(bestSegment)

The compute cycle for the "e" columns has no learning segments. because a "bestSegment" was not returned from bestMatchingSegment in "t - 1" or during the "h" cycle.

Still working through the code from this point... will comment tomorrow...

@robjfr
Copy link

robjfr commented Oct 30, 2015

@floybix Ah, thanks. I was confusing "reset" with "bursting".

Perhaps the underlying problem is that a "reset" is not very biologically plausible in the first place (is it?)

If it is just a fix to simplify in the short term, I guess that gives wide scope to implement it however convenient, in the knowledge that eventually it will not matter because resets will not occur.

But I'm probably off target with the issues on this one. It may be posited as a mechanism for attention or other (with which I'd probably disagree.)

@rhyolight
Copy link

@robjfr said:

If it is just a fix to simplify in the short term, I guess that gives wide scope to implement it however convenient, in the knowledge that eventually it will not matter because resets will not occur.

Yes, this.

To everyone else, we're going to be together in a few weeks. This is certainly better placed on a whiteboard and we'll have lots of time there. I know Subutai has at least two talks to prepare for on top of his regular duties (one of those talks is for the HTM Challenge 😉). Can we hold off on this conversation?

Also, @robjfr can you make it to the community meetup?

@robjfr
Copy link

robjfr commented Oct 30, 2015

Unlikely @rhyolight But thanks for the welcome.

@subutai
Copy link

subutai commented Oct 30, 2015

But because the "h" winners are different each time, the
synapses won't be reinforced, so it never gets above the connected
threshold, thus, "e" is never predicted.

Ah I see. The logic is that permanence updates only use the previously "active" state. When you add new synapses you use the winner cells. So synapses will get reinforced and it will get above threshold. The downside is that it may also add a few extra synapses from random winner cells but it shouldn't do any harm.

@floybix
Copy link
Member Author

floybix commented Oct 31, 2015

Thanks. As Matt says, let's whiteboard it at the meetup.

@cogmission
Copy link
Member

Now that I know I'm going to the meetup, I'm holding off on my comments too.

@floybix
Copy link
Member Author

floybix commented Oct 31, 2015

(just to clear this up)...
groan. This is embarrassing. You are right @subutai and I find it disturbing that I had confused myself like that. After investigating why I saw that result from my test, I found a bug in Comportex where it was reinforcing only against learning cells, instead of all active cells (fixed in 3e6096b).

Anyway, it is still worthwhile for me to start on consistent cells after a reset, if only to display those states consistently on my Cell SDRs diagram.

@cogmission
Copy link
Member

@floybix I still don't understand how the previously trained "e" cell (the presynaptic cell for the previously trained "h" cell) gets found and has its segments reinforced? What am I missing, because it appears to me like a new presynaptic cell-to-segment relationship is going to be formed every time the same sequence is entered?

@floybix
Copy link
Member Author

floybix commented Oct 31, 2015

@cogmission Let's whiteboard it at the meetup.

@subutai
Copy link

subutai commented Oct 31, 2015

After investigating why I saw that result from my test, I found a bug in Comportex where it was reinforcing only against learning cells, instead of all active cells

@floybix No worries. This stuff is extremely tricky and very easy to miss. Even with bugs like this the overall system often still generally works ok, which makes it pretty hard to debug. Believe me we've had our share of bugs like this too. See you at the meetup!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants