# Multiset

### _aka_ bag, mset

### _pl._ multizbiór, wielozbiór

**Plan**

1. Definiton of multiset

2. Implementation of multiset

3. Use cases of multiset

# Definition

It's "a modification of the concept of a set that, unlike a set, **allows for multiple instances for each of its elements**". [[source](https://en.wikipedia.org/wiki/Multiset)]

But what _instance of an element_ means? "Instance" is not a programming term here. It means -- more or less -- occurrence, which... also is not very helpful. So let's say that two instances/occurrences an element means that both are **equal to each other**.

In [1]:
# two occurences of `1` are... `1` and `1`, becuase:
assert 1 == 1

# ;D

To put it simply: **multiset is a set that can keep multiple equal elements**. 

So this is a multiset: `⟨1, 1, 2⟩`.

Also, multisets written this way: `⟨1, 1, 2⟩`, `⟨1, 2, 1⟩`, `⟨2, 1, 1⟩`, are identical.

# Implementation

What about the implementation?

In terms of Python's collections, multiset is similar to: 

- `set` (as keeping insertion order is not required)

- `list`/`tuple` (as both keep multiple "equal" elements)

## Multiset as `list` / `tuple`

First option: `list` (or `tuple`). It keeps insertion order, but it is not a requirement for multiset not to keep it.

So `[1, 1, 2, 2, 3]` and `[3, 1, 1, 2, 2]` would be both identical multisets.

**Problems**

1. There is no out-of-the-box way to check for identity (`[1, 2, 1] == [1, 1, 2]` will return `False`).

2. We don't have any (optimized) set operations out-of-the-box.

3. We cannot benfit from set's great feature: constant time (`O(1)`) membership checking.

So generally implementing multiset using `list` or `tuple` sux.

## Multiset as `dict`

Let's try a different approach: a dict with key of multiset element and value of list of all equal elements for a key.

So multiset of `[42, 42, 51, 51, 60]` would be:

In [2]:
 {
     42: [42, 42],
     51: [51, 51],
     60: [60],
 }

{42: [42, 42], 51: [51, 51], 60: [60]}

But why bother building a list of repeated identical elements if we can keep only a count them.

In this implementation multiset of `[41, 41, 52, 52, 60]` would be:

In [3]:
{
    41: 2,
    52: 2,
    60: 1,
}

{41: 2, 52: 2, 60: 1}

We would increment the count on adding new element to multiset and decrement it on removing.

## Multiset as `Counter`

It turns out that we already have this kind of datatype in Python: `Counter`.

In [4]:
from collections import Counter

In [5]:
my_fruit = ['apple', 'banana', 'pear', 'apple', 'apple']

my_fruit_counter = Counter(my_fruit)

my_fruit_counter

Counter({'apple': 3, 'banana': 1, 'pear': 1})

In [6]:
my_fruit_counter['banana'] += 4
my_fruit_counter

Counter({'apple': 3, 'banana': 5, 'pear': 1})

In [7]:
'pear' in my_fruit_counter

True

### Constant time (`O(1)`) membership checking

In [38]:
large_counter = Counter(range(10**6 + 1))
number = 10**6
%timeit number in large_counter

106 ns ± 2.18 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In fact `Counter` inherits from `dict`, so it's not surprising ;)

In [39]:
# compared to list
large_list = list(range(10**6 + 1))
number = 10**6
%timeit number in large_list

11 ms ± 26.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Counter operations

In [10]:
apple_apple_pear_banana = Counter('apple apple pear banana'.split())
apple_apple_pear_banana

Counter({'apple': 2, 'pear': 1, 'banana': 1})

In [11]:
pear_pear_orange = Counter('pear pear orange'.split())
pear_pear_orange

Counter({'pear': 2, 'orange': 1})

#### Equalily

In [12]:
apple_apple_pear_banana == pear_pear_orange

False

In [13]:
apple_apple_pear_banana == Counter('pear banana apple apple'.split())

True

#### Add

Add counts from two counters.

In [14]:
apple_apple_pear_banana + pear_pear_orange

Counter({'apple': 2, 'pear': 3, 'banana': 1, 'orange': 1})

#### Subtract

Subtract count, but keep only results with positive counts.

In [15]:
apple_apple_pear_banana - pear_pear_orange

Counter({'apple': 2, 'banana': 1})

In [16]:
pear_pear_orange - apple_apple_pear_banana

Counter({'pear': 1, 'orange': 1})

#### Union

Union is the maximum of value in either of the input counters.

In [17]:
apple_apple_pear_banana | pear_pear_orange

Counter({'apple': 2, 'pear': 2, 'banana': 1, 'orange': 1})

#### Intersection

Intersection is the minimum of corresponding counts.

In [18]:
apple_apple_pear_banana & pear_pear_orange

Counter({'pear': 1})

#### Update

Like `dict.update()` but add counts instead of replacing them.


In [19]:
c = Counter('Monty')
c

Counter({'M': 1, 'o': 1, 'n': 1, 't': 1, 'y': 1})

In [20]:
c.update('Python')
c

Counter({'M': 1, 'o': 2, 'n': 2, 't': 2, 'y': 2, 'P': 1, 'h': 1})

#### Enumerating all elements

In [21]:
list((apple_apple_pear_banana.elements()))

['apple', 'apple', 'pear', 'banana']

#### Most common elements

In [22]:
c = Counter('Do you use static type hints in your code?')

In [23]:
c.most_common()

[(' ', 8),
 ('o', 4),
 ('t', 4),
 ('y', 3),
 ('u', 3),
 ('s', 3),
 ('e', 3),
 ('i', 3),
 ('c', 2),
 ('n', 2),
 ('D', 1),
 ('a', 1),
 ('p', 1),
 ('h', 1),
 ('r', 1),
 ('d', 1),
 ('?', 1)]

In [24]:
c.most_common(3)

[(' ', 8), ('o', 4), ('t', 4)]

### Counter pros and cons

#### Pros

- blazingly fast membership checking (vs list/tuple)

- equality checking (vs list/tuple)

- additional operations: `+`, `-`, `&`, `|` (vs dict)

#### Cons

- Counter is a dict, so we cannot store there elements that are not hashable (vs list/tuple).

- It's useless when we want to store all occurrences of equal elements (vs dict).

So what if we want to store all occurrences of equal elements?

In [25]:
class Person:    
    def __init__(self, id_, name, nationality):
        self.id_ = id_
        self.name = name
        self.nationality = nationality
    
    def __hash__(self):
        return hash((self.id_, self.name))
    
    def __eq__(self, other):
        if isinstance(other, Person):
            return self.id_ == other.id_ and self.name == other.name
        return False
    
    def __repr__(self):
        return f'Person({self.id_}, {self.name}, {self.nationality})'

In [26]:
p1 = Person(id_=1, name='Bob', nationality='US')
p2 = Person(id_=1, name='Bob', nationality='UK')
p3 = Person(id_=2, name='Kasia', nationality='PL')
p4 = Person(id_=3, name='Taras', nationality='UA')

In [27]:
Counter([p1, p2, p3, p4])

Counter({Person(1, Bob, US): 2,
         Person(2, Kasia, PL): 1,
         Person(3, Taras, UA): 1})

## Multidict with many occurrences of equal elements

Wrapper on `defaultdict`.

In [47]:
from collections import defaultdict

class Multiset:
    def __init__(self, items=None):
        self.__data = defaultdict(list)
        items = items or []
        for item in items:
            self.__data[hash(item)].append(item)
        
    def __repr__(self):
        return f'Multiset({dict(self.__data)})'
    
    def __contains__(self, other):
        return hash(other) in self.__data
    
    def __eq__(self, other):
        return self.__data == other.__data
        
    def update(self, items):
        for item in items:
            self.__data[hash(item)].append(item)
            
    def remove(self, item):
        self.__data[hash(item)].remove(item)

In [48]:
Multiset('abracadabra')

Multiset({-1782423740351546067: ['a', 'a', 'a', 'a', 'a'], -1471118788425205866: ['b', 'b'], 8133629476857026148: ['r', 'r'], -8287738640347290112: ['c'], 3806206302144811855: ['d']})

In [49]:
10 in Multiset([5, 7, 10])

True

In [50]:
print(p1)
print(p2)
print(p3)
print(p4)

Person(1, Bob, US)
Person(1, Bob, UK)
Person(2, Kasia, PL)
Person(3, Taras, UA)


In [51]:
p1 == p2

True

In [52]:
m1 = Multiset([10, 21, 21, 32])
m2 = Multiset([10, 21, 21, 32])
m1 == m2

True

In [53]:
people = Multiset()
people

Multiset({})

In [33]:
people.update([p1])
people

Multiset({-251910950974417478: [Person(1, Bob, US)]})

In [34]:
people.update([p2])
people

Multiset({-251910950974417478: [Person(1, Bob, US), Person(1, Bob, UK)]})

In [35]:
people.update([p3])
people

Multiset({-251910950974417478: [Person(1, Bob, US), Person(1, Bob, UK)], 124825052286908048: [Person(2, Kasia, PL)]})

In [36]:
people.remove(p1)
people

Multiset({-251910950974417478: [Person(1, Bob, UK)], 124825052286908048: [Person(2, Kasia, PL)]})

In [37]:
people.remove(p2)
people

Multiset({-251910950974417478: [], 124825052286908048: [Person(2, Kasia, PL)]})

### Pros and cons

#### Pros

- Fast membership checking.

- We can store multiple occurences of equal items

#### Cos

- No set operations included -- we need to add them ourselves.

- It's a dict too -- we cannot store there elements that are not hashable (vs list/tuple).



## Multidict use cases

### Counter

### Multidict with copies