-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spa subvocab #699
Spa subvocab #699
Conversation
@tcstewar Please review! =D |
self.read_only = False | ||
|
||
# The parent of a non-subset vocabulary is itself | ||
self.parent = self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... this seems like a red flag to me that this implementation is overly complex. Do you really need parent references? It makes managing memory and ensuring that these object get garbage collected very complicated...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Hah. This is so that I can shortcut the comparison in the transform_to function. XD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't Python garbage collection handle cyclical references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's optional and not guaranteed. It's not so much the garbage collection that I'm worried about, it just seems like an antipattern to me so my eyebrow instinctively raises. If there's absolutely no other way to do this, then so be it, but I suspect there's another way that ends up being simpler and easier.
I think I'm closer to understanding what this Subset stuff is being used for, but I there are still a few questions in my mind:
|
I was thinking of use cases outside of the networks we build in nengo. It is conceivable that people can use the subsets to do purely mathematical operations on vocabularies. And I can see that in some cases (sequential programming, or reusing the same subset for different purposes), subsets can be extended. I very nearly put in the ability to remove items from the vocabulary, but I didn't (yet.. haha)
Even with the read_only functionality of the Vocabularies, this will create problems when connecting two modules with different subsets. Suppose I had this: num = spa.Buffer(vocab=num_subset) # num_subset = {'ONE', 'TWO', 'THR'} of vocab
pos = spa.Buffer(vocab=pos_subset) # pos_subset = {'P1', 'P2', 'P3'} of vocab
mem = spa.Buffer(vocab=vocab) # vocab = {'ONE', 'TWO', ..., 'P1', 'P2', ...} + PAIRS
actions = spa.Actions("mem = num * pos")
cortical = spa.Cortical(actions) In this case, because everything is (essentially) using the one vocabulary, there should be no transforms between either of the connections. If the subset returned a read-only vocabulary, the conv_effect will add a transform from (e.g. the num buffer)
See point 1 for use cases. Should it be in the base class? Maybe? I just added it as a shortcut for |
Hmm... I have a fairly strong resistance to adding functionality based on possible conceivable future uses, rather than on actual use-cases that have come up. I'd prefer to add functionality as we need it, rather than trying to guess what functionality we might want in the future. |
Ah, I think this example helps clarify things for me. In the style of programming I was thinking of, I would definitely have each of those using the same vocab, since, as you point out, they are all using the same vocabulary, and so I think the code should actually indicate that. However, the way you're using vocabs, you're wanting to indicate something about the eventual visual display: you only want If so, I've always been working from the assumption that if you want identity transforms then you should be using the same vocabulary on your components. And if you want to control aspects of the visual display, you should do that outside of the core model code, since it's just about the visual display. So I'm a bit worried about this change. I think we should be working to make the |
On re-reading your comment, I meant to refer to a different issue with this question. I was meaning that VocabularySubset shouldn't exist. There'd still be the |
I think that's a useful functionality to have. I was just wondering why it was only being put in the VocabularySubset -- that looked to me like it was an indication that it was something you only wanted to do in a Subset. But I'm also still unconvinced that Subset should exist at all, so I'm a bit weird. ;) But we certainly do that |
Looking at my code, I actually don't use subsets within the model itself. So I've merged the subset logic (parent + the read only code) back into the Vocabulary class. I'm also of the opinion that within the model, all of the vocabularies (for modules using the same vocabulary) should not be using subsets, but it is a concept that should be better documented with the spa modules. I can see new users using subsets for modules because "this module should only be used for XXX subset" (when really it doesn't make a difference). |
List of semantic pointer names to be added to the vocabulary. | ||
|
||
""" | ||
self.parse('+'.join(keys)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for clarity, it might be nicer to implement this as
for k in keys:
self[k]
or perhaps even
for k in keys:
if k not in self.keys:
self.add(k, self.create_pointer())
or even
def extend(self, keys, unitary=False):
for k in keys:
if k not in self.keys:
self.add(k, self.create_pointer(unitary=unitary))
I like the look of it merged back in like that. :) And I think you're completely right that this subset concept is something that needs serous documentation and clarity. That said, I think this current PR is a clear improvement (the read_only parameter seems very useful). I think in a future PR we might look into moving away from subsets and towards explicit specification of sets of keys (but keeping the same underlying Vocabulary, so we'd do things like specifying what terms to show in the display, or specifying what terms to include in the AssociativeMemory). But that's for future work. :) |
The associative memory has already been re-coded to use a vocabulary and list of keys (see #702). |
Sweet! :) |
List of semantic pointer names to be added to the vocabulary. | ||
|
||
""" | ||
if is_iterable(unitary): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this if/elif block do exactly? To me it reads like unitary
can be a boolean or an iterable. Is that true? If so, why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is exactly how it reads. It reflects the behaviour of the unitary
parameter in the constructor where you can either pass it a boolean (in which case, all vectors created will be / not be unitary), or a list of strings (in which case only semantic pointers that are within said list will be unitary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, that should be documented because it is not obvious that unitary
excepts a list of strings.
Busy at the moment. Earliest I can look at this is next monday. |
4474420
to
cf7311b
Compare
# If this vocabulary is a subset vocabulary, apply operation on | ||
# parent vocab. | ||
self.parent.add(key, p) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this fall-back here for the read_only case? This allows this sort of thing to happen, which seems confusing to me:
import nengo.spa
v = nengo.spa.Vocabulary(16)
v.extend(['A', 'B', 'C'])
v2 = v.create_subset(['A', 'B'])
v2.add('Z', v2.create_pointer())
print 'v.keys', v.keys
print 'v2.keys', v2.keys
v.keys ['A', 'B', 'C', 'Z']
v2.keys ['A', 'B']
It seems to me that it should raise a ValueError, rather than trying to create it in the parent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to handle the cases where something is added to a vocabulary that is a subset of another vocabulary. Basically because of the magic that happens in the transform_to
function (it add keys from one vocabulary to another) of the Vocabulary
class.
If we made it a ValueError
this would fail:
vocab1 = Vocabulary(32)
vocab1.parse('A+B+C')
vocab2 = vocab1.create_subset(['A', 'B'])
with spa.SPA() as model:
model.state_in = spa.State(vocab=vocab1)
model.am = spa.AssociativeMemory(vocab2)
model.state_out = spa.State(vocab=vocab1)
cortical_actions = spa.Actions('am=state_in', 'state_out=am')
model.cortical = spa.Cortial(cortical_actions)
The transform_to system will try to add 'C'
to vocab2
and fail with the ValueError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a note that this might not actually happen with the conditional code below. But there may be cases where this will happen (i.e. the transform_to
magic adds keys from one vocab to another)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should get rid of the transform_to
magic and be explicit about transforms. But that should be the scope of another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the discussion in the meeting, I think the solution for now is to adjust this PR such that if it's read only then it always raises the ValueError, but then also modify transform_to so that it handles that gracefully (basically if a vocab is read only and it doesn't have a key, then skip that key rather than trying to generate it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, in the long run, we want to get rid of the magic transform_to and be explicit about it when making the SPA Actions, but that's a separate PR.
This looks good to me now! Seems to work well, and is a nice improvement over the current situation. In the long run I still need to get rid of the auto-transform, but that's for the future. :) |
Cool cool... is there anything in here worth mentioning in the changelog? |
I think just mentioning that spa.Vocabulary objects now have a read_only flag that is set for subvocabs. |
👍 |
You can remove |
Thanks, will do :) |
LGTM! |
Added subset vocabulary and readonly attribute for standard vocabs. Other changes: - Subset vocabularies now reference parent vocabulary. - Subset vocabularies now handle transform_to correctly. - transform_to also modified to handle readonly vocabularies. - Removed ability to add items to vocabulary subsets with readonly. - Added extend method for adding multiple keys. Extend can optionally create unitary vectors with unitary flag.
Redid implementation of vocabulary subsets. Subsets now reference the parent vocabulary, and handles transform_to correctly (doesn't add new terms as it did before). Also added a read_only flag for vocabularies to prevent unwanted addition of items to vocabularies.