Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spa subvocab #699

Merged
merged 2 commits into from
Mar 30, 2016
Merged

Spa subvocab #699

merged 2 commits into from
Mar 30, 2016

Conversation

xchoo
Copy link
Member

@xchoo xchoo commented Apr 16, 2015

Redid implementation of vocabulary subsets. Subsets now reference the parent vocabulary, and handles transform_to correctly (doesn't add new terms as it did before). Also added a read_only flag for vocabularies to prevent unwanted addition of items to vocabularies.

@xchoo
Copy link
Member Author

xchoo commented Apr 16, 2015

@tcstewar Please review! =D

self.read_only = False

# The parent of a non-subset vocabulary is itself
self.parent = self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... this seems like a red flag to me that this implementation is overly complex. Do you really need parent references? It makes managing memory and ensuring that these object get garbage collected very complicated...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Hah. This is so that I can shortcut the comparison in the transform_to function. XD

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't Python garbage collection handle cyclical references?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's optional and not guaranteed. It's not so much the garbage collection that I'm worried about, it just seems like an antipattern to me so my eyebrow instinctively raises. If there's absolutely no other way to do this, then so be it, but I suspect there's another way that ends up being simpler and easier.

@tcstewar
Copy link
Contributor

I think I'm closer to understanding what this Subset stuff is being used for, but I there are still a few questions in my mind:

  • What is the use case for adding terms to a VocabularySubset? All the examples I've heard of the subset use (helpers for plotting, and creating AssociativeMemorys) don't require adding terms. It seems to me like a VocabularySubset should be read_only. That would eliminate the behaviour where adding terms to a subset automatically adds those terms to the parent (which seems a bit surprising to me, but that could just be because I'm not seeing the use case).
  • Should VocabularySubset be a subclass of Vocabulary, or should we get rid of it entirely and just have create_subset return a suitably configured Vocabulary instance?
  • What are the use cases for extend_subset? If this functionality is needed, shouldn't it be in the base class?

@xchoo
Copy link
Member Author

xchoo commented Apr 19, 2015

  • What is the use case for adding terms to a VocabularySubset?

I was thinking of use cases outside of the networks we build in nengo. It is conceivable that people can use the subsets to do purely mathematical operations on vocabularies. And I can see that in some cases (sequential programming, or reusing the same subset for different purposes), subsets can be extended. I very nearly put in the ability to remove items from the vocabulary, but I didn't (yet.. haha)

  • Should VocabularySubset be a subclass of Vocabulary, or should we get rid of it entirely and just
    have create_subset return a suitably configured Vocabulary instance?

Even with the read_only functionality of the Vocabularies, this will create problems when connecting two modules with different subsets. Suppose I had this:

num = spa.Buffer(vocab=num_subset)  # num_subset = {'ONE', 'TWO', 'THR'} of vocab
pos = spa.Buffer(vocab=pos_subset)  # pos_subset = {'P1', 'P2', 'P3'} of vocab
mem = spa.Buffer(vocab=vocab)  # vocab = {'ONE', 'TWO', ..., 'P1', 'P2', ...} + PAIRS

actions = spa.Actions("mem = num * pos")
cortical = spa.Cortical(actions)

In this case, because everything is (essentially) using the one vocabulary, there should be no transforms between either of the connections. If the subset returned a read-only vocabulary, the conv_effect will add a transform from (e.g. the num buffer) {'ONE', 'TWO', 'THR'} of num_subset to {'ONE', 'TWO', 'THR'} of vocab (which is not an identity matrix), which then adds noise to the calculations.

  • What are the use cases for extend_subset? If this functionality is needed, shouldn't it be in the base class?

See point 1 for use cases. Should it be in the base class? Maybe? I just added it as a shortcut for vocab.parse('+'.join(keys)).

@tcstewar
Copy link
Contributor

I was thinking of use cases outside of the networks we build in nengo. It is conceivable that people can use the subsets to do purely mathematical operations on vocabularies. And I can see that in some cases (sequential programming, or reusing the same subset for different purposes), subsets can be extended. I very nearly put in the ability to remove items from the vocabulary, but I didn't (yet.. haha)

Hmm... I have a fairly strong resistance to adding functionality based on possible conceivable future uses, rather than on actual use-cases that have come up. I'd prefer to add functionality as we need it, rather than trying to guess what functionality we might want in the future.

@tcstewar
Copy link
Contributor

In this case, because everything is (essentially) using the one vocabulary, there should be no transforms between either of the connections.

Ah, I think this example helps clarify things for me. In the style of programming I was thinking of, I would definitely have each of those using the same vocab, since, as you point out, they are all using the same vocabulary, and so I think the code should actually indicate that. However, the way you're using vocabs, you're wanting to indicate something about the eventual visual display: you only want {'ONE', 'TWO', 'THR'} to be shown for num and {'P1', 'P2', 'P3'} to be shown for pos. And so you're defining subsets to do that. Is that a fair summary of what you're doing?

If so, I've always been working from the assumption that if you want identity transforms then you should be using the same vocabulary on your components. And if you want to control aspects of the visual display, you should do that outside of the core model code, since it's just about the visual display.

So I'm a bit worried about this change. I think we should be working to make the spa module clearer and with less automatic magic (or at least a small set of particular magic that we are explicit about), rather than adding new magic. This change feels like new magic to me, but it could just be because I'm not thinking about it in the right way.

@tcstewar
Copy link
Contributor

Should VocabularySubset be a subclass of Vocabulary, or should we get rid of it entirely and just have create_subset return a suitably configured Vocabulary instance?
Even with the read_only functionality of the Vocabularies, this will create problems when connecting two modules with different subsets.

On re-reading your comment, I meant to refer to a different issue with this question. I was meaning that VocabularySubset shouldn't exist. There'd still be the parent reference in Vocabulary, though, so transform_to would still be perfectly capable of handling this case (indeed, that's exactly what it does right now).

@tcstewar
Copy link
Contributor

Should it be in the base class? Maybe? I just added it as a shortcut for vocab.parse('+'.join(keys)).

I think that's a useful functionality to have. I was just wondering why it was only being put in the VocabularySubset -- that looked to me like it was an indication that it was something you only wanted to do in a Subset. But I'm also still unconvinced that Subset should exist at all, so I'm a bit weird. ;) But we certainly do that vocab.parse('+'.join(keys)) thing a fair bit, and this would be much clearer, so I think it should be added to Vocabulary.

@xchoo
Copy link
Member Author

xchoo commented Apr 19, 2015

Looking at my code, I actually don't use subsets within the model itself. So I've merged the subset logic (parent + the read only code) back into the Vocabulary class. I'm also of the opinion that within the model, all of the vocabularies (for modules using the same vocabulary) should not be using subsets, but it is a concept that should be better documented with the spa modules. I can see new users using subsets for modules because "this module should only be used for XXX subset" (when really it doesn't make a difference).

List of semantic pointer names to be added to the vocabulary.

"""
self.parse('+'.join(keys))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for clarity, it might be nicer to implement this as

for k in keys:
    self[k]

or perhaps even

for k in keys:
    if k not in self.keys:
        self.add(k, self.create_pointer())

or even

def extend(self, keys, unitary=False):
    for k in keys:
        if k not in self.keys:
            self.add(k, self.create_pointer(unitary=unitary))

@tcstewar
Copy link
Contributor

Looking at my code, I actually don't use subsets within the model itself. So I've merged the subset logic (parent + the read only code) back into the Vocabulary class. I'm also of the opinion that within the model, all of the vocabularies (for modules using the same vocabulary) should not be using subsets, but it is a concept that should be better documented with the spa modules. I can see new users using subsets for modules because "this module should only be used for XXX subset" (when really it doesn't make a difference).

I like the look of it merged back in like that. :) And I think you're completely right that this subset concept is something that needs serous documentation and clarity. That said, I think this current PR is a clear improvement (the read_only parameter seems very useful). I think in a future PR we might look into moving away from subsets and towards explicit specification of sets of keys (but keeping the same underlying Vocabulary, so we'd do things like specifying what terms to show in the display, or specifying what terms to include in the AssociativeMemory). But that's for future work. :)

@xchoo
Copy link
Member Author

xchoo commented Apr 19, 2015

The associative memory has already been re-coded to use a vocabulary and list of keys (see #702).

@tcstewar
Copy link
Contributor

The associative memory has already been re-coded to use a vocabulary and list of keys (see #702).

Sweet! :)

List of semantic pointer names to be added to the vocabulary.

"""
if is_iterable(unitary):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this if/elif block do exactly? To me it reads like unitary can be a boolean or an iterable. Is that true? If so, why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is exactly how it reads. It reflects the behaviour of the unitary parameter in the constructor where you can either pass it a boolean (in which case, all vectors created will be / not be unitary), or a list of strings (in which case only semantic pointers that are within said list will be unitary).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, that should be documented because it is not obvious that unitary excepts a list of strings.

@hunse
Copy link
Collaborator

hunse commented Sep 14, 2015

@xchoo, @tcstewar: What's the status of this? @tcstewar, are you happy with it?

@hunse
Copy link
Collaborator

hunse commented Jan 28, 2016

Can this be rebased and @xchoo and @tcstewar sign off so we can get it done?

@xchoo
Copy link
Member Author

xchoo commented Jan 28, 2016

Busy at the moment. Earliest I can look at this is next monday.

@tbekolay
Copy link
Member

I've rebased this and squashed the commits together. Definitely the commit message should be looked at, but other than that, would be good for a re-review from @tcstewar and/or @xchoo!

# If this vocabulary is a subset vocabulary, apply operation on
# parent vocab.
self.parent.add(key, p)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this fall-back here for the read_only case? This allows this sort of thing to happen, which seems confusing to me:

import nengo.spa

v = nengo.spa.Vocabulary(16)
v.extend(['A', 'B', 'C'])
v2 = v.create_subset(['A', 'B'])
v2.add('Z', v2.create_pointer())

print 'v.keys', v.keys
print 'v2.keys', v2.keys
v.keys ['A', 'B', 'C', 'Z']
v2.keys ['A', 'B']

It seems to me that it should raise a ValueError, rather than trying to create it in the parent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to handle the cases where something is added to a vocabulary that is a subset of another vocabulary. Basically because of the magic that happens in the transform_to function (it add keys from one vocabulary to another) of the Vocabulary class.

If we made it a ValueError this would fail:

vocab1 = Vocabulary(32)
vocab1.parse('A+B+C')
vocab2 = vocab1.create_subset(['A', 'B'])

with spa.SPA() as model:
    model.state_in = spa.State(vocab=vocab1)
    model.am = spa.AssociativeMemory(vocab2)
    model.state_out = spa.State(vocab=vocab1)

    cortical_actions = spa.Actions('am=state_in', 'state_out=am')

    model.cortical = spa.Cortial(cortical_actions)

The transform_to system will try to add 'C' to vocab2 and fail with the ValueError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a note that this might not actually happen with the conditional code below. But there may be cases where this will happen (i.e. the transform_to magic adds keys from one vocab to another)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get rid of the transform_to magic and be explicit about transforms. But that should be the scope of another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the discussion in the meeting, I think the solution for now is to adjust this PR such that if it's read only then it always raises the ValueError, but then also modify transform_to so that it handles that gracefully (basically if a vocab is read only and it doesn't have a key, then skip that key rather than trying to generate it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, in the long run, we want to get rid of the magic transform_to and be explicit about it when making the SPA Actions, but that's a separate PR.

@tcstewar
Copy link
Contributor

This looks good to me now! Seems to work well, and is a nice improvement over the current situation. In the long run I still need to get rid of the auto-transform, but that's for the future. :)

@tbekolay
Copy link
Member

Cool cool... is there anything in here worth mentioning in the changelog?

@tcstewar
Copy link
Contributor

is there anything in here worth mentioning in the changelog?

I think just mentioning that spa.Vocabulary objects now have a read_only flag that is set for subvocabs.

@tbekolay
Copy link
Member

OK, I added a few fixups here, to change the ValueError to ReadonlyError, rename read_only to readonly to be consistent with the rest of Nengo and add a changelog entry. @xchoo and @tcstewar I'll merge if these look good to both of you!

@xchoo
Copy link
Member Author

xchoo commented Mar 30, 2016

👍

@xchoo
Copy link
Member Author

xchoo commented Mar 30, 2016

You can remove
"- Adding items to subset vocabularies (i.e. read-only vocabularies
with a parent vocabulary) now applies this action to the parent
vocabulary instead of throwing an exception."
from the commit message. That change was reverted.

@tbekolay
Copy link
Member

Thanks, will do :)

@tcstewar
Copy link
Contributor

LGTM!

Added subset vocabulary and readonly attribute for standard vocabs.

Other changes:

- Subset vocabularies now reference parent vocabulary.
- Subset vocabularies now handle transform_to correctly.
- transform_to also modified to handle readonly vocabularies.
- Removed ability to add items to vocabulary subsets with readonly.
- Added extend method for adding multiple keys.
  Extend can optionally create unitary vectors with unitary flag.
@tbekolay tbekolay merged commit f88ff75 into master Mar 30, 2016
@tbekolay tbekolay deleted the spa_subvocab branch March 30, 2016 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants