-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
select cells consistently, esp. when beginning sequences #34
Comments
You may be right. I think this is what I was referring to as the need for "consolidation" across contexts: cells would need to be consolidated to the columns they predict before we judge whether they indicate similar contexts. Setting to a consistent cell would solve this. But it did occur to me that there might be an advantage in having different cells for the same context at different locations in the global sequence. In a way coding with a different cell each time gives us more information. It complicates things a little, but we don't lose anything. And, perhaps importantly, it gives us a means to code the strength of a transition when cells are merged. We could judge the strength of a transition by counting how many different cells linked the respective columns. |
Felix! I'm going to quote your thoughts and share them with NuPIC Community - this is true and a very insightful realization. Do you mind? This is the kind of stuff that I was talking about previously that shouldn't get lost and will inevitably get lost if it's not shared... Cheers, Sent from my iPhone
|
I'll wait for your response before sharing, just in case you have another Cheers, On Sun, Oct 25, 2015 at 9:21 AM, David Ray cognitionmission@gmail.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
Here's an idea. Maybe some insight can be gained from the fact that a Just a simple observation... On Sun, Oct 25, 2015 at 9:28 AM, cogmission (David Ray) <
With kind regards, David Ray Cortical.io http://cortical.io/ |
@cogmission I don't have a problem with you sharing this, although at this stage my statement of the problem and my proposals aren't really backed up with evidence. Not sure I agree with you that resets are unnatural. I think that's what happens when you shift attention. |
@cogmission ah yes I see what you mean. Even within a period of unbroken attention, one can distinguish parts, even out of context. That seems to be a higher-level property like temporal pooling. Whether appropriate cell selection is best thought of as a prerequisite, or a consequence, of temporal pooling: I'm not sure. |
Ok. But I don't think you have to wait to present your "suspicions" or I'm not trying to tell you what to do or anything - I just think the work Anyway, sorry for the distraction... Cheers, On Sun, Oct 25, 2015 at 9:55 AM, Felix Andrews notifications@github.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
Idea: do both. e.g. maybe select two cells. One of them represents the possibility that this is the beginning of a new sequence. The other represents the possibility that it's a continuation of the current sequence. It's similar to the Escher-like perspective switches that Rob talks about. Let both be true for a while, until one of them fizzles out. A useful example ("use full exam pull") is disambiguation of sequences of syllables. At all points we need to try both possibilities: (1) the syllable is the start of a new sequence, or (2) the syllable is a continuation of the previous sequence. |
We just asked Subutai on the Challenge Hangout. In NuPIC all cells in the initial active columns A become active, so prediction is passed to all second time step cells. Whichever feedforward input comes in next will use all A cells to learn from. |
Fergal, I think it will select the best matching cell of those - to learn Here's the poignant line in temporal_memory.burstColumns() On Mon, Oct 26, 2015 at 12:06 PM, Fergal Byrne notifications@github.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
David, I think you're referring to the "learning" cells. Fergal, I think you're referring to the "learnable" cells. Both NuPIC and Comportex select the best match for the "learning" cell. But Comportex goes on to use that as the "learnable" cell in the next time step, whereas NuPIC treats all previously active cells as "learnable". (I'm still forming my thoughts on this, just wanted to untangle that.) |
In other words, what Fergal said makes sense. |
Please explain the difference between "learn-(ing)" and "learn-(able)" as The learn(ing) cell in NuPIC is the Cell chosen either:
The above mechanism avoids the situation that Felix is inquiring into. Now, what a "learn(able)" cell is, I have no idea but how is that used in Cheers, On Mon, Oct 26, 2015 at 12:23 PM, Marcus Lewis notifications@github.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
Sorry, please substitute the word "Synapse" for "Segment" above where it On Mon, Oct 26, 2015 at 12:34 PM, cogmission (David Ray) <
With kind regards, David Ray Cortical.io http://cortical.io/ |
Ok, that's a good distinction. In step 2, NuPIC treats all step 1 active bursting cells as potential precursors (learnable), and so builds the best predictive connection back to step 1 using all of them. This is the correct approach, since at time 1 we have no context - that's what reset() means. This will build a distribution of predictability from reset()-step 1-step 2 which depends on what has been seen. So eg, a letter-reading HTM reading Wikipedia and seeing reset()-Q will predict U 95%, T 2% (the graphics toolkit), I 1% (Stephen Fry's show), and so on. By these percentages I mean that 95% of the predictive cells will be in U columns, 2% in T columns, etc. Upon getting U next, you burst all the predictive U cells (which beat their column cells) and carry on. The average next predicted letter is another distribution - I, O, A, E at the top, almost no U prediction, almost no consonant prediction. I'm using reset()-Q as a pathological example to force home this point. Arguably, you could use reset()-T and get H, O, I, A, V... as your distribution. But you need to burst as NuPIC does to get this working - the next step acts as if all cells in the first column were active after the reset. Does this make sense? |
Hi Fergal, From what I reason, your first paragraph rings true for me, and is aligned You guys might also want more detail, because the process I spelled out is If you are not interested in the hard-core details and only want to If I were going to spell this out specifically, I would say that there is Then there is a set of cells which are the "activeCells (A1)" in "t - 1". For each Cell (C1) in unpredictedActiveCells, a segment (S1) leading to a Best matching Segments then have their synapses adjusted up or down as part That's the best pseudo code explanation I can muster, in case that's Cheers, On Mon, Oct 26, 2015 at 1:58 PM, Fergal Byrne notifications@github.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
Hi David, The example I'm using is for letter-by-letter sequence learning. The letter Q is almost always followed by U, but can be followed by a small number of other letters due to acronyms etc. |
Right. Then the "activeCells" (predictive in "t - 1") are empty and that On Mon, Oct 26, 2015 at 2:56 PM, Fergal Byrne notifications@github.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
Morning...
After reading the NuPIC code, I think that (second sentence) is not entirely correct.
And this was a rephrasing of the same thing, so also not correct. (I contend that NuPIC and Comportex have the same basic treatment of learnable cells). There are two parts to learning: growing new synapses, and reinforcing existing synapses. Reinforcement does apply to synapses from all bursting cells, assuming they already exist on an active -- sufficiently connected -- segment. However, growing new synapses in the first place only considers the winner cells; that is the part I was talking about. NuPIC details |
Hi Felix, I guess the question is whether you are introducing a scenario that hasn't Cheers, On Mon, Oct 26, 2015 at 8:17 PM, Felix Andrews notifications@github.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
@cogmission just the scenario I laid out in the original issue description. There can be no "bestMatchingCell" directly after a reset because there is no distal excitation. So the winner cells chosen in each column after a reset will be random (and in fact, biased away from any existing segments). And on the next step, new synapses grow only to those (random) winner cells. |
Hi Felix, Yep you're right... The code is really hard to keep track of, but after combing through it a On Mon, Oct 26, 2015 at 9:41 PM, Felix Andrews notifications@github.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
I'm reminded of @mrcslws's "flippant suggestion" in his essay:
i.e. instead of picking a single random winner as the representative to learn from, grow synapses to all active cells. Effectively treating all cells in bursting columns as "winners" (therefore learnable) in the case where there are no partial matches. If there are partial matches -- with segment activation above the learning threshold -- those would be the winners in a bursting column. For bursting within a sequence, the eventual representative cell for some context would emerge on subsequent occurrences as the best matching cell, according to whichever ended up with the most active synapses (synapses are grown by random selection from candidate source cells). But it would keep more options open, I think. In a novel sequence we'd be doing a lot of work: growing segments on all bursting cells. So it would be slower. It's all quite confusing. Needs testing. @mrcslws I also liked your idea about treating every step as possibly the beginning of a sequence. But there are probably several ways to go about that kind of recognition / re-evaluation / backtracking and I'm not convinced real-time cell choice is the right way. |
I wonder if Alex @BoltzmannBrain is currently looking at this, in regard to Marcus's comment? |
From the essay, the idea of "connections from cells to columns" would apply to bursting columns, as Felix mentions, but it would also apply to non-bursting columns. In the essay I held this up as a way of learning first-order sequences, but actually it's a way of learning sequences that start with the previous input. The sequences might be first-order, or they might be longer. My comment above suggests something related but different:
This might only apply to bursting columns, or it might happen all the time. I liked my example:
These two ideas could converge into one. E.g. use the first cell in the column to represent the input being the beginning of a sequence. Or other ways I'm sure we could imagine. We'd need a coherent story for distal learning. Maybe we apply the above idea for growing new synapses, but the synapse reinforcement follows slightly different rules, favoring cells that contain context whenever possible. I still consider these suggestions flippant. I'd like to distance myself from these untested ideas. :) But I am a little excited by the idea that "re-evaluation / backtracking" is an illusion, that the alternate possibilities are actually being maintained in realtime and we just don't notice them until the others are nullified. |
@rcrowder I'm just now catching up on this discussion; echoing David's suggestion, the nupic discuss listserve would be a much better forum. Are you asking if a possible mechanism would be to grow synapses to all cells in a column, and then later prune such that one remains as the "learning" cell? This does not seem like a biologically accurate approach to me. |
For bursting columns that wouldn't be too radical. But yes, doing it for bursting and non-bursting columns was roughly the "flippant suggestion" from my essay. I'd never bring this up on the mailing list without testing it. Untested solutions are a dime a dozen. Also, that idea was intentionally ugly, leaving room for it to resolve into a biologically-plausible equivalent. It was more about shining a light on the downside of high-order sequence memory. Also, more context: on this thread I'm just a sideshow. :) |
Essentially, the point is that it makes no sense to assign a random selection of cells after a reset because it can never be reproduced. It is a meaningless signal to learn. An example (I just ran) is the sequence of letters "hello hello hello hello hello" with resets between each word. If you do this with random cell selection (and with initial distal permanence below the connected threshold) then it never predicts the transition from "h" to "e". I just looked at the old TP.py in NuPIC and that does have "start cells". In fact here it says:
|
@subutai you might want to consider a similar change to NuPIC's |
@subutai This is crucial. I agree whole heartedly. @rhyolight is you be We must account for this: The point that following a reset, and given the absence of Felix we can't be sure @subutai is now listening to this can we? On Thu, Oct 29, 2015 at 2:51 AM, Felix Andrews notifications@github.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
@cogmission Is this the same issue you and Fergal discussed at the hangout on Monday? It sounds like it. If so, I don't believe it is an issue in temporal_memory.py. @floybix We could discuss this at the community meet up on the 13th? Please don't look at TP.py or TP10x.py - that implements a very different (hacky) algorithm (see Chetan's presentation from a couple of hackathons ago). |
Hi @subutai, Yep, same one - but after combing the code, it in fact looks like an issue. Cheers, On Thu, Oct 29, 2015 at 10:34 AM, Subutai Ahmad notifications@github.com
With kind regards, David Ray Cortical.io http://cortical.io/ |
@cogmission It is best to do this on a whiteboard - very difficult to do in github comment fields. One other criterion is that we should be able to come up with simple test cases that demonstrate the failure using temporal_memory.py in NuPIC. Do such tests exist for this change? That would help me understand the need for core algorithm changes. |
@subutai I agree, It's a pain in the butt to type out minutiae painstakingly. :-P I think a test for this can be easily written. We train a sequence and save the learning cell. Then we call reset, and then train the same sequence - we will see that the same learning cell is not selected by comparing both cells. Does that sound like a good plan/test @floybix ? Now what the ramifications are of not training the same learning cell, is another point that @floybix is investigating which you @subutai would probably have the best handle on...? I would write the test but I couldn't submit it because I'm scared to death to update my version of NuPIC (it's been 6 months), because all my work hinges on me having a running Python version that I can reference and getting Java and Python running side by side is no easy task (at least for me because I'm not as comfortable in the Python universe). |
That's not a sufficient test. We have to show that it is actually not performing well by some measure. Is it missing some predictions that it should otherwise make? Is it taking longer than it should? Is it making some other error? |
I think @floybix was the one making qualitative assessments. I simply verified that the condition existed for myself, and attempted to communicate that. Beyond that, I couldn't say whether it results in attenuated behavior or not - that's up to you? I just thought that there being a "theoretical" problem, (which I would assume the training of multiple learning cells for the same sequence - would be) - would mean you might want to be on alert to get around that algorithmically. I do think @floybix wants to avoid this? @floybix ? |
@subutai A test is the sequence of letters "hello hello hello hello hello" with resets between each word. If you do this with random cell selection (and with initial distal permanence below the connected threshold) then it never predicts the transition from "h" to "e". But yeah, we can talk about it at the meetup if you want. |
@floybix I don't believe this is true. The routine |
@subutai but the "h" cells are different every time (random, because of the On Friday, 30 October 2015, Subutai Ahmad notifications@github.com wrote:
Felix Andrews / 安福立 |
@subutai sorry that was unclear. Yes you are right that it will pick the On 30 October 2015 at 08:43, Felix Andrews felix@nfrac.org wrote:
Felix Andrews / 安福立 |
@floybix If I understand the discussion, it sounds like bursting should reset predicted cells, but not predicting cells. |
If predictions grow connections to the columns they predict, those connections should remain, even if the column bursts, should they not? |
@robjfr No, you're not understanding; bursting doesn't cause a reset, rather a reset causes bursting. This is all about when we impose a "reset" - a break to completely separate some new input from what came before. An edge case, really. |
The segment from Because of this:
The compute cycle for the "e" columns has no learning segments. because a "bestSegment" was not returned from Still working through the code from this point... will comment tomorrow... |
@floybix Ah, thanks. I was confusing "reset" with "bursting". Perhaps the underlying problem is that a "reset" is not very biologically plausible in the first place (is it?) If it is just a fix to simplify in the short term, I guess that gives wide scope to implement it however convenient, in the knowledge that eventually it will not matter because resets will not occur. But I'm probably off target with the issues on this one. It may be posited as a mechanism for attention or other (with which I'd probably disagree.) |
@robjfr said:
Yes, this. To everyone else, we're going to be together in a few weeks. This is certainly better placed on a whiteboard and we'll have lots of time there. I know Subutai has at least two talks to prepare for on top of his regular duties (one of those talks is for the HTM Challenge 😉). Can we hold off on this conversation? Also, @robjfr can you make it to the community meetup? |
Unlikely @rhyolight But thanks for the welcome. |
Ah I see. The logic is that permanence updates only use the previously "active" state. When you add new synapses you use the winner cells. So synapses will get reinforced and it will get above threshold. The downside is that it may also add a few extra synapses from random winner cells but it shouldn't do any harm. |
Thanks. As Matt says, let's whiteboard it at the meetup. |
Now that I know I'm going to the meetup, I'm holding off on my comments too. |
(just to clear this up)... Anyway, it is still worthwhile for me to start on consistent cells after a reset, if only to display those states consistently on my Cell SDRs diagram. |
@floybix I still don't understand how the previously trained "e" cell (the presynaptic cell for the previously trained "h" cell) gets found and has its segments reinforced? What am I missing, because it appears to me like a new presynaptic cell-to-segment relationship is going to be formed every time the same sequence is entered? |
@cogmission Let's whiteboard it at the meetup. |
@floybix No worries. This stuff is extremely tricky and very easy to miss. Even with bugs like this the overall system often still generally works ok, which makes it pretty hard to debug. Believe me we've had our share of bugs like this too. See you at the meetup! |
When beginning a sequence (or after a sequence reset/break), there is no distal input, so no basis for choosing a winner/learning cell in each column. Cells are then chosen at random.
That random selection is a problem because when the same sequence is presented several times (in isolation) they will begin on different cells; and will consequently not reinforce previous learning, but will have partial learning spread across several cells. This can be seen in repeated sequence demos, where the whole sequence is learned but it keeps bursting.
Proposal - I think it would be better to start on the same cell consistently. The first cell.
Perhaps more generally the choice of winner/learning cell (when there are no predictive cells in a column) should not be completely random but should be a deterministic function of the set of previously-active cells. And it should be a robust function, so that similar activity consistently selects the same cells.
Proposal - Select cell number as
(mod depth)
of each distal input bit, and take the mode of that. Offset by the current column number (mod depth again), otherwise all cells would be synchronised and we lose combinatorial capacity (see #31).Needs testing.
The text was updated successfully, but these errors were encountered: