Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add strictbidict that throws exception on any duplicate value assignment #21

Closed
lordmauve opened this issue Nov 24, 2015 · 9 comments
Closed

Comments

@lordmauve
Copy link
Collaborator

I see considerable value preventing people accidentally creating mappings that are not invertible. In our codebase, there are several hundred instances of code like

PRODUCT_REMAP = {... hundreds of lines ...}
REVERSE_PRODUCT_REMAP = dict((v, k) for k, v in PRODUCT_REMAP.iteritems())

A concern with these is that literals that were accidentally written to include duplicate values would create an incorrect inverse mapping.

Using bidict does little to improve matters. In code like that below, one of those keys will "win":

d = bidict({
     'foo': 1,
     ... hundreds of lines ...
     'bar': 1
})

This has the advantage that the inverse mapping will genuinely reflect the forward mapping, but it carries the disadvantage the mapping for one of the keys is lost. In many cases this is worse - for example, when the forward mapping is fundamental and the inverse mapping is just informational. The only reasonable solution in cases like these is to throw an exception for the programmer to deal with.

I suggest creating a "strictbidict" in which _put() throws a CollapseException if any value is duplicated, rather than dropping the previous key that maps to that value.

@jab
Copy link
Owner

jab commented Nov 25, 2015

I think there's some confusion in terminology here. Reviewing the docs here and here, I use the term "collapse" to describe only one very precise situation: a bidict already contains mappings (a, b) and (c, d), and the user attempts to insert the additional mapping (a, d).

By this definition, the scenario you're describing is not a collapse, but rather the overwrite of a key mapping to an existing value.

If Python itself found it worthwhile to offer some kind of strictdict, which raised an exception for something like dict([(1, 1), (1, 2)]) rather than turning it into {1: 2}, bidict probably would have had a strictbidict from the start. But since dict doesn't do this, it's not clear whether this is out of scope for bidict, and should be handled in the user's program.

I'll keep thinking about this though and will post here with any updates. And if you'd like to take a crack at a PR with a well-tested implementation of your idea to make a more convincing case, I'd be happy to take a look.

@jab jab closed this as completed Nov 25, 2015
@lordmauve
Copy link
Collaborator Author

I understood the collapsing terminology (after some analysis and experimentation). I guess the error class could be named something different.

The overwriting case is different than with a dict, because dict allows one type of overwriting (replacing the value for a key). Bidict allows replacing the key for a value using a key assignment, which might be surprising.

The value-overwrite operation might be useful in cases where bidicts are mutated after creation, but as with dict, most of the use cases I can see would be around simply creating a mapping that is expected to be invertible, and using it to look up things. In these cases protecting me from unexpected value collisions is the desired behaviour.

@jab
Copy link
Owner

jab commented Nov 25, 2015

In case you missed it, it sounds like frozenbidict could help you with some of what you're after.

Since that doesn't get you all the way there, let's discuss this further. The proposal is to add a strictbidict class that raises an exception (say, OneToOneException) in the following cases:

>>> strictbidict({1: 1, 2: 1})
Traceback (most recent call last):
    ...
OneToOneException: ((1, 1), (2, 1))
>>> s = strictbidict()
>>> s.update({1: 1, 2: 1})
Traceback (most recent call last):
    ...
OneToOneException: ((1, 1), (2, 1))
>>> strictbidict([(1, 1), (1, 2)])
Traceback (most recent call last):
    ...
OneToOneException: ((1, 1), (1, 2))

On the other hand, the following cases should succeed:

>>> # python drops one of these before bidict ever sees it, nothing we can do about it:
>>> strictbidict({1: 1, 1: 2})
strictbidict({1: 2})
>>> # overwriting is still supported:
>>> s = strictbidict({1: 1})
>>> s[1] = 2
>>> s.inv[2] = 3

There could also be a strictfrozenbidict to combine the behaviors of both.

Is this along the lines of what you're thinking? Please look through the entire bidict API (including MutableMapping), and give as many additional examples as possible to specify the behavior you're looking for.

@jab jab reopened this Nov 25, 2015
@jab
Copy link
Owner

jab commented Nov 25, 2015

Any rationale for adding this change could also argue for baking this directly into bidict.bidict rather than a separate class, and moving the current, looser behavior—that is, allowing non-1-to-1 inits and updates to silently proceed—into collapsingbidict. What do you think?

@lordmauve
Copy link
Collaborator Author

I think you've nailed exactly what I was thinking, except I imagined this case not raising an exception:

>>> strictbidict([(1, 1), (1, 2)])
strictbidict({1: 2})

This preserves this aspect from the dict documentation:

|  dict(iterable) -> new dictionary initialized as if via:
|      d = {}
|      for k, v in iterable:
|          d[k] = v

As you say, it makes sense to me for the "default" behaviour to be strict, and for particular ways of resolving key/value collisions to be subclasses.

@lordmauve
Copy link
Collaborator Author

I think what I'm getting at is a set of semantics where keys collisions overwrite (just like dict) but value collisions raise an exception. And then, naturally, the inverse is true for d.inv.

@jab
Copy link
Owner

jab commented Nov 27, 2015

Let me try on how I might document each of these two options.

Option 1: "If you try to instantiate or call update on a bidict with some non-bijective mappings (either some key maps to multiple values, or vice versa), bidict will raise a OneToOneConflict, to help you catch this early and give you a chance to decide how you want to resolve the conflict. If you want bidict to resolve the conflict automatically by having the last one win (like how dict([(1, 1), (1, 2)]) behaves), use bidict.loosebidict." (Would be renamed from collapsingbidict.)

Option 2: As above, except take out "or vice versa", and add: "Note that, in keeping with dict's behavior, bidict does not raise OneToOneConflict when multiple keys map to the same value, but rather silently lets the last one win. OneToOneConflict is only raised when the same key maps to multiple values."

Is it totally obvious which of these two options is better?

Let's back up. What we're talking about here are different resolution strategies, both for different keys mapping to the same value ("DKSV"), and for the same key mapping to different values ("SKDV")—e.g. last one wins, always raise, etc.

As for bidict.bidict's default behavior, you're proposing a different resolution strategy for "DKSV" than for "SKDV", in the same data structure. If it weren't for dict's behavior, that would seem unprincipled, but I can understand if the argument is that users are used to losing mappings for "SKDV" with dict, so bidict should follow that by default; but for "DKSV", they're not used to losing mappings, so bidict should raise. But I wonder if it's more principled to have a symmetric strategy for "DKSV" and "SKDV" by default, and results in less cognitive overhead to boot. What do you think?

@jab
Copy link
Owner

jab commented Nov 28, 2015

Guava's BiMap may provide some useful prior art. BiMap's put method allows overwriting the value associated with an existing key, but if you try to overwrite the key associated with an existing value, it throws an exception. You can use forcePut to overwrite always. This sounds more like Option 2 above.

Now that #25 is merged, I'm going to start a branch to experiment with this.

@jab
Copy link
Owner

jab commented Nov 28, 2015

Branch is at https://github.com/jab/bidict/tree/strict, PR at #26.

Correspondingly updated docs (temporarily) at: https://bidict.readthedocs.org/en/strict/

I ended up implementing Option 2 and being more like Guava's BiMap: attempting to overwrite the key of an existing mapping now causes ValueExistsException (which replaces CollapseException as it's more general), unless using forceput or a loosebidict (which replaces collapsingbidict).

@lordmauve, please let me know if this satisfies your request, or you have any other feedback.

jab added a commit that referenced this issue Dec 20, 2015
See the changelog for full details.

Closes #21.
jab added a commit that referenced this issue Dec 20, 2015
See the changelog for full details.

Closes #21.
jab added a commit that referenced this issue Dec 20, 2015
See the changelog for full details.

Closes #21.
jab added a commit that referenced this issue Dec 20, 2015
See the changelog for full details.

Closes #21.
jab added a commit that referenced this issue Dec 20, 2015
See the changelog for full details.

Closes #21.
jab added a commit that referenced this issue Dec 21, 2015
See the changelog for full details.

Closes #21.
jab added a commit that referenced this issue Dec 21, 2015
See the changelog for full details.

Closes #21.
@jab jab closed this as completed in 2074f4b Dec 21, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants