Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove duplicate checks to increase performance and add functionality #37

Closed
ofek opened this issue Jul 26, 2016 · 4 comments
Closed

remove duplicate checks to increase performance and add functionality #37

ofek opened this issue Jul 26, 2016 · 4 comments

Comments

@ofek
Copy link

ofek commented Jul 26, 2016

Would a pull request be considered that would make the inverse dict a defaultdict(list) therefore allowing that same functionality as a regular map? Removing the duplicate checks would also speed up bidict.

@jab
Copy link
Owner

jab commented Jul 28, 2016

Would a pull request be considered that would make the inverse dict a defaultdict(list) therefore allowing that same functionality as a regular map?

I'm not sure what you mean by "therefore allowing that same functionality as a regular map", given that the inverse of a bidict already implements the Mapping interface:

>>> from collections import Mapping
>>> b = bidict(H='hydrogen')
>>> isinstance(b.inv, Mapping)
True

Maybe it would help if you gave an example of the code you'd like to be able to write, and an explanation of how the current implementation is hindering you.

In any case, be aware that under the hood, a bidict and its inverse are both just wrappers around the same two backing dicts, with just the pointers to the forward and the inverse dicts swapped. Also note that b.inv.inv is b is a fundamental desired property that we've had for ages, and which changing b.inv as you're proposing would break. Any advantages you think this would give us would have to outweigh the disadvantages of making this change.

Removing the duplicate checks would also speed up bidict.

If you skip the duplication checks, you end up with an incorrect implementation that suffers from bugs like this:

>>> b = bidict({1: 2})
>>> b.inv
bidict({2: 1})
>>> b[3] = 2
>>> b.inv
bidicit({2: 3})
>>> b  # this should be bidict({3: 2}), but with no dup check, you miss your chance to make sure the bidict stays consistent, and produce incorrect results:
bidict({1: 2, 3: 2})

Did you read any of the pertinent documentation? https://bidict.readthedocs.io/basic-usage.html#uniqueness-of-values and https://bidict.readthedocs.io/addendum.html#performance include relevant rationale about safety and performance.

Thanks for your interest in bidict and for taking the time to propose improvements you're interested in. At this point I don't think there's anything here but you're more than welcome to clarify.

@ofek
Copy link
Author

ofek commented Jul 28, 2016

Thanks for your kind reply!

What I meant by "therefore allowing that same functionality as a regular map" is that it is advantageous, even expected, to allow different keys to have the same value.

I'm not proposing simply removing the checks. I'm thinking you could store the inverse dict's values in a list or set, something like:

from collections import defaultdict

class BidirectionalMapping(Mapping):
    def __init__(self, *args, **kw):
        ...
        self._inv = defaultdict(list)  # Could be set instead, but list can maintain
                                                 # insert order which may be handy
        ...


class bidict(BidirectionalMapping, MutableMapping):
    def __delitem__(self, key):
        del self._fwd[key]
        inv = self._inv  # save dot-lookup time in more common no duplicate case

        inv[val].remove(key)
        if not inv[val]:
            del inv[val]

    def __setitem__(self, key, val):
        self._fwd[key] = val
        self._inv[val].append(key)  # Maintains insert order

@jab
Copy link
Owner

jab commented Jul 29, 2016

it is advantageous, even expected, to allow different keys to have the same value.

Citation needed :)

Here are some counterexamples:

  • Google Guava's [BiMap](https://google.github.io/guava/releases/19.0/api/docs/com/google/common/collect/BiMap.html#put%28K, V%29) - put() "throws IllegalArgumentException if the given value is already bound to a different key in this bimap"

  • Apache Commons Collections' BidiMap - "This map enforces the restriction that there is a 1:1 relation between keys and values, meaning that multiple keys cannot map to the same value."

  • Boost's BiMap:

    bm.insert( bm_type::value_type( 1, "one" ) );
    bm.insert( bm_type::value_type( 1, "1"   ) ); // No effect!
    bm.insert( bm_type::value_type( 2, "one" ) ); // No effect!
    
  • Literally the definition of "bidirectional map" on Wikipedia - "In computer science, a bidirectional map, or hash bag, is an associative data structure in which the (key,value) pairs form a one-to-one correspondence. Thus the binary relation is functional in each direction: value can also act as a key to key. A pair (a,b) thus provides a unique coupling between a and b so that b can be found when a is used as a key and a can be found when b is used as a key."

(((
Also in your code snippet inv[val].remove(key) would cause AttributeError if inv were a defaultdict(list), since defaultdict's don't implement a remove() method. If it were a set, which does implement remove(), then the following line, if not inv[val]:, would raise "TypeError: 'set' object does not support indexing".
)))

I'm going to close this as it still seems pretty off the mark, but I'm grateful for your interest in bidict and hope you find it useful!

@jab jab closed this as completed Jul 29, 2016
@jab
Copy link
Owner

jab commented Aug 2, 2016

@Ofekmeister I just came across @mahmoud's boltons library for the first time and noticed it has something that looks a lot more like what you're looking for:

https://boltons.readthedocs.io/en/latest/dictutils.html#boltons.dictutils.OrderedMultiDict.inverted

Wish I'd realized this sooner so I could have referred you to it right off the bat. For future reference, what you're talking about (where the same key can be associated with multiple values) is called a "multidict", and is commonly seen in http-related or -inspired libraries, e.g. [1,2].

bidict is intended to address a different set of use cases from the ones that multidicts address. But it looks like you can use boltons to get what you're looking for.

Hope this helps!

[1] https://github.com/aio-libs/multidict/
[2] http://werkzeug.pocoo.org/docs/0.11/datastructures/#werkzeug.datastructures.MultiDict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants