Spa subvocab #699

xchoo · 2015-04-16T16:43:58Z

Redid implementation of vocabulary subsets. Subsets now reference the parent vocabulary, and handles transform_to correctly (doesn't add new terms as it did before). Also added a read_only flag for vocabularies to prevent unwanted addition of items to vocabularies.

xchoo · 2015-04-16T16:44:27Z

@tcstewar Please review! =D

tbekolay · 2015-04-16T16:48:10Z

nengo/spa/vocab.py

+        self.read_only = False
+
+        # The parent of a non-subset vocabulary is itself
+        self.parent = self


Hmm... this seems like a red flag to me that this implementation is overly complex. Do you really need parent references? It makes managing memory and ensuring that these object get garbage collected very complicated...

Oh. Hah. This is so that I can shortcut the comparison in the transform_to function. XD

Doesn't Python garbage collection handle cyclical references?

It's optional and not guaranteed. It's not so much the garbage collection that I'm worried about, it just seems like an antipattern to me so my eyebrow instinctively raises. If there's absolutely no other way to do this, then so be it, but I suspect there's another way that ends up being simpler and easier.

tcstewar · 2015-04-19T14:51:40Z

I think I'm closer to understanding what this Subset stuff is being used for, but I there are still a few questions in my mind:

What is the use case for adding terms to a VocabularySubset? All the examples I've heard of the subset use (helpers for plotting, and creating AssociativeMemorys) don't require adding terms. It seems to me like a VocabularySubset should be read_only. That would eliminate the behaviour where adding terms to a subset automatically adds those terms to the parent (which seems a bit surprising to me, but that could just be because I'm not seeing the use case).
Should VocabularySubset be a subclass of Vocabulary, or should we get rid of it entirely and just have create_subset return a suitably configured Vocabulary instance?
What are the use cases for extend_subset? If this functionality is needed, shouldn't it be in the base class?

xchoo · 2015-04-19T15:20:05Z

What is the use case for adding terms to a VocabularySubset?

I was thinking of use cases outside of the networks we build in nengo. It is conceivable that people can use the subsets to do purely mathematical operations on vocabularies. And I can see that in some cases (sequential programming, or reusing the same subset for different purposes), subsets can be extended. I very nearly put in the ability to remove items from the vocabulary, but I didn't (yet.. haha)

Should VocabularySubset be a subclass of Vocabulary, or should we get rid of it entirely and just
have create_subset return a suitably configured Vocabulary instance?

Even with the read_only functionality of the Vocabularies, this will create problems when connecting two modules with different subsets. Suppose I had this:

num = spa.Buffer(vocab=num_subset)  # num_subset = {'ONE', 'TWO', 'THR'} of vocab
pos = spa.Buffer(vocab=pos_subset)  # pos_subset = {'P1', 'P2', 'P3'} of vocab
mem = spa.Buffer(vocab=vocab)  # vocab = {'ONE', 'TWO', ..., 'P1', 'P2', ...} + PAIRS

actions = spa.Actions("mem = num * pos")
cortical = spa.Cortical(actions)

In this case, because everything is (essentially) using the one vocabulary, there should be no transforms between either of the connections. If the subset returned a read-only vocabulary, the conv_effect will add a transform from (e.g. the num buffer) {'ONE', 'TWO', 'THR'} of num_subset to {'ONE', 'TWO', 'THR'} of vocab (which is not an identity matrix), which then adds noise to the calculations.

What are the use cases for extend_subset? If this functionality is needed, shouldn't it be in the base class?

See point 1 for use cases. Should it be in the base class? Maybe? I just added it as a shortcut for vocab.parse('+'.join(keys)).

tcstewar · 2015-04-19T15:52:50Z

I was thinking of use cases outside of the networks we build in nengo. It is conceivable that people can use the subsets to do purely mathematical operations on vocabularies. And I can see that in some cases (sequential programming, or reusing the same subset for different purposes), subsets can be extended. I very nearly put in the ability to remove items from the vocabulary, but I didn't (yet.. haha)

Hmm... I have a fairly strong resistance to adding functionality based on possible conceivable future uses, rather than on actual use-cases that have come up. I'd prefer to add functionality as we need it, rather than trying to guess what functionality we might want in the future.

tcstewar · 2015-04-19T16:06:40Z

In this case, because everything is (essentially) using the one vocabulary, there should be no transforms between either of the connections.

Ah, I think this example helps clarify things for me. In the style of programming I was thinking of, I would definitely have each of those using the same vocab, since, as you point out, they are all using the same vocabulary, and so I think the code should actually indicate that. However, the way you're using vocabs, you're wanting to indicate something about the eventual visual display: you only want {'ONE', 'TWO', 'THR'} to be shown for num and {'P1', 'P2', 'P3'} to be shown for pos. And so you're defining subsets to do that. Is that a fair summary of what you're doing?

If so, I've always been working from the assumption that if you want identity transforms then you should be using the same vocabulary on your components. And if you want to control aspects of the visual display, you should do that outside of the core model code, since it's just about the visual display.

So I'm a bit worried about this change. I think we should be working to make the spa module clearer and with less automatic magic (or at least a small set of particular magic that we are explicit about), rather than adding new magic. This change feels like new magic to me, but it could just be because I'm not thinking about it in the right way.

tcstewar · 2015-04-19T16:12:50Z

Should VocabularySubset be a subclass of Vocabulary, or should we get rid of it entirely and just have create_subset return a suitably configured Vocabulary instance?
Even with the read_only functionality of the Vocabularies, this will create problems when connecting two modules with different subsets.

On re-reading your comment, I meant to refer to a different issue with this question. I was meaning that VocabularySubset shouldn't exist. There'd still be the parent reference in Vocabulary, though, so transform_to would still be perfectly capable of handling this case (indeed, that's exactly what it does right now).

tcstewar · 2015-04-19T16:15:16Z

Should it be in the base class? Maybe? I just added it as a shortcut for vocab.parse('+'.join(keys)).

I think that's a useful functionality to have. I was just wondering why it was only being put in the VocabularySubset -- that looked to me like it was an indication that it was something you only wanted to do in a Subset. But I'm also still unconvinced that Subset should exist at all, so I'm a bit weird. ;) But we certainly do that vocab.parse('+'.join(keys)) thing a fair bit, and this would be much clearer, so I think it should be added to Vocabulary.

xchoo · 2015-04-19T17:20:04Z

Looking at my code, I actually don't use subsets within the model itself. So I've merged the subset logic (parent + the read only code) back into the Vocabulary class. I'm also of the opinion that within the model, all of the vocabularies (for modules using the same vocabulary) should not be using subsets, but it is a concept that should be better documented with the spa modules. I can see new users using subsets for modules because "this module should only be used for XXX subset" (when really it doesn't make a difference).

tcstewar · 2015-04-19T17:45:37Z

nengo/spa/vocab.py

+            List of semantic pointer names to be added to the vocabulary.
+
+        """
+        self.parse('+'.join(keys))


for clarity, it might be nicer to implement this as

for k in keys: self[k]

or perhaps even

for k in keys: if k not in self.keys: self.add(k, self.create_pointer())

or even

def extend(self, keys, unitary=False): for k in keys: if k not in self.keys: self.add(k, self.create_pointer(unitary=unitary))

tcstewar · 2015-04-19T17:51:11Z

Looking at my code, I actually don't use subsets within the model itself. So I've merged the subset logic (parent + the read only code) back into the Vocabulary class. I'm also of the opinion that within the model, all of the vocabularies (for modules using the same vocabulary) should not be using subsets, but it is a concept that should be better documented with the spa modules. I can see new users using subsets for modules because "this module should only be used for XXX subset" (when really it doesn't make a difference).

I like the look of it merged back in like that. :) And I think you're completely right that this subset concept is something that needs serous documentation and clarity. That said, I think this current PR is a clear improvement (the read_only parameter seems very useful). I think in a future PR we might look into moving away from subsets and towards explicit specification of sets of keys (but keeping the same underlying Vocabulary, so we'd do things like specifying what terms to show in the display, or specifying what terms to include in the AssociativeMemory). But that's for future work. :)

xchoo · 2015-04-19T17:56:51Z

The associative memory has already been re-coded to use a vocabulary and list of keys (see #702).

tcstewar · 2015-04-19T17:57:43Z

The associative memory has already been re-coded to use a vocabulary and list of keys (see #702).

Sweet! :)

tbekolay · 2015-04-20T19:37:28Z

nengo/spa/vocab.py

+            List of semantic pointer names to be added to the vocabulary.
+
+        """
+        if is_iterable(unitary):


What does this if/elif block do exactly? To me it reads like unitary can be a boolean or an iterable. Is that true? If so, why?

That is exactly how it reads. It reflects the behaviour of the unitary parameter in the constructor where you can either pass it a boolean (in which case, all vectors created will be / not be unitary), or a list of strings (in which case only semantic pointers that are within said list will be unitary).

I think, that should be documented because it is not obvious that unitary excepts a list of strings.

hunse · 2015-09-14T23:37:51Z

@xchoo, @tcstewar: What's the status of this? @tcstewar, are you happy with it?

hunse · 2016-01-28T20:40:18Z

Can this be rebased and @xchoo and @tcstewar sign off so we can get it done?

xchoo · 2016-01-28T20:41:38Z

Busy at the moment. Earliest I can look at this is next monday.

tbekolay · 2016-02-18T20:12:46Z

I've rebased this and squashed the commits together. Definitely the commit message should be looked at, but other than that, would be good for a re-review from @tcstewar and/or @xchoo!

tcstewar · 2016-03-01T18:49:55Z

nengo/spa/vocab.py

+                # If this vocabulary is a subset vocabulary, apply operation on
+                # parent vocab.
+                self.parent.add(key, p)
+                return


Why is this fall-back here for the read_only case? This allows this sort of thing to happen, which seems confusing to me:

import nengo.spa v = nengo.spa.Vocabulary(16) v.extend(['A', 'B', 'C']) v2 = v.create_subset(['A', 'B']) v2.add('Z', v2.create_pointer()) print 'v.keys', v.keys print 'v2.keys', v2.keys

v.keys ['A', 'B', 'C', 'Z'] v2.keys ['A', 'B']

It seems to me that it should raise a ValueError, rather than trying to create it in the parent.

This is to handle the cases where something is added to a vocabulary that is a subset of another vocabulary. Basically because of the magic that happens in the transform_to function (it add keys from one vocabulary to another) of the Vocabulary class.

If we made it a ValueError this would fail:

vocab1 = Vocabulary(32) vocab1.parse('A+B+C') vocab2 = vocab1.create_subset(['A', 'B']) with spa.SPA() as model: model.state_in = spa.State(vocab=vocab1) model.am = spa.AssociativeMemory(vocab2) model.state_out = spa.State(vocab=vocab1) cortical_actions = spa.Actions('am=state_in', 'state_out=am') model.cortical = spa.Cortial(cortical_actions)

The transform_to system will try to add 'C' to vocab2 and fail with the ValueError.

I'll add a note that this might not actually happen with the conditional code below. But there may be cases where this will happen (i.e. the transform_to magic adds keys from one vocab to another)

I think we should get rid of the transform_to magic and be explicit about transforms. But that should be the scope of another PR.

As per the discussion in the meeting, I think the solution for now is to adjust this PR such that if it's read only then it always raises the ValueError, but then also modify transform_to so that it handles that gracefully (basically if a vocab is read only and it doesn't have a key, then skip that key rather than trying to generate it).

And yes, in the long run, we want to get rid of the magic transform_to and be explicit about it when making the SPA Actions, but that's a separate PR.

tcstewar · 2016-03-29T19:20:09Z

This looks good to me now! Seems to work well, and is a nice improvement over the current situation. In the long run I still need to get rid of the auto-transform, but that's for the future. :)

tbekolay · 2016-03-29T19:48:44Z

Cool cool... is there anything in here worth mentioning in the changelog?

tcstewar · 2016-03-29T21:11:31Z

is there anything in here worth mentioning in the changelog?

I think just mentioning that spa.Vocabulary objects now have a read_only flag that is set for subvocabs.

tbekolay · 2016-03-30T14:38:08Z

OK, I added a few fixups here, to change the ValueError to ReadonlyError, rename read_only to readonly to be consistent with the rest of Nengo and add a changelog entry. @xchoo and @tcstewar I'll merge if these look good to both of you!

xchoo · 2016-03-30T14:39:17Z

👍

xchoo · 2016-03-30T14:40:39Z

You can remove
"- Adding items to subset vocabularies (i.e. read-only vocabularies
with a parent vocabulary) now applies this action to the parent
vocabulary instead of throwing an exception."
from the commit message. That change was reverted.

tbekolay · 2016-03-30T14:45:35Z

Thanks, will do :)

tcstewar · 2016-03-30T15:59:01Z

LGTM!

Added subset vocabulary and readonly attribute for standard vocabs. Other changes: - Subset vocabularies now reference parent vocabulary. - Subset vocabularies now handle transform_to correctly. - transform_to also modified to handle readonly vocabularies. - Removed ability to add items to vocabulary subsets with readonly. - Added extend method for adding multiple keys. Extend can optionally create unitary vectors with unitary flag.

tbekolay reviewed Apr 16, 2015
View reviewed changes

xchoo added the needs review label Apr 16, 2015

tcstewar reviewed Apr 19, 2015
View reviewed changes

xchoo removed the needs review label Apr 19, 2015

tbekolay reviewed Apr 20, 2015
View reviewed changes

xchoo mentioned this pull request Jun 10, 2015

Cortical rule modifies vocab #731

Closed

tbekolay added the needs review label Jun 13, 2015

tbekolay added this to the 2.1.0 release milestone Jun 13, 2015

jgosmann mentioned this pull request Jul 10, 2015

Associative memory breaks with vector vocab #783

Closed

hunse force-pushed the spa_subvocab branch from 5cbd582 to 6ebe119 Compare July 10, 2015 22:45

This was referenced Jul 11, 2015

Working with read-only vocabularies have unintended side effects #784

Closed

Proposal for clarifying vocabs #789

Closed

Seanny123 added the discussion label Sep 27, 2015

tbekolay force-pushed the spa_subvocab branch from 4474420 to cf7311b Compare February 18, 2016 20:11

tbekolay removed the discussion label Feb 20, 2016

Seanny123 assigned tcstewar Mar 1, 2016

tcstewar reviewed Mar 1, 2016
View reviewed changes

Seanny123 added needs changes and removed needs review labels Mar 1, 2016

xchoo added needs review and removed needs changes labels Mar 24, 2016

Change read_only -> readonly for consistency

7c15913

tbekolay force-pushed the spa_subvocab branch from f81fa57 to 3d0a9ca Compare March 30, 2016 14:36

tbekolay force-pushed the spa_subvocab branch from 3d0a9ca to f88ff75 Compare March 30, 2016 16:14

tbekolay merged commit f88ff75 into master Mar 30, 2016

tbekolay deleted the spa_subvocab branch March 30, 2016 17:02

tbekolay added reviewed and removed needs review labels Aug 8, 2016

tbekolay unassigned tcstewar Aug 8, 2016

Spa subvocab #699

Spa subvocab #699

Conversation

xchoo commented Apr 16, 2015

xchoo commented Apr 16, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tcstewar commented Apr 19, 2015

xchoo commented Apr 19, 2015

tcstewar commented Apr 19, 2015

tcstewar commented Apr 19, 2015

tcstewar commented Apr 19, 2015

tcstewar commented Apr 19, 2015

xchoo commented Apr 19, 2015

Choose a reason for hiding this comment

tcstewar commented Apr 19, 2015

xchoo commented Apr 19, 2015

tcstewar commented Apr 19, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hunse commented Sep 14, 2015

hunse commented Jan 28, 2016

xchoo commented Jan 28, 2016

tbekolay commented Feb 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tcstewar commented Mar 29, 2016

tbekolay commented Mar 29, 2016

tcstewar commented Mar 29, 2016

tbekolay commented Mar 30, 2016

xchoo commented Mar 30, 2016

xchoo commented Mar 30, 2016

tbekolay commented Mar 30, 2016

tcstewar commented Mar 30, 2016