# Multiset

### _aka_ bag, mset

### _pl._ multizbiór, wielozbiór

**Plan**

1. Definiton of multiset

2. Implementation of multiset

3. Use cases of multiset

## Definition

It's "a modification of the concept of a set that, unlike a set, **allows for multiple instances for each of its elements**". [[source](https://en.wikipedia.org/wiki/Multiset)]

But what _instance of an element_ means? "Instance" is not a programming term here. It means -- more or less -- occurence, which... also is not very helpful. So let's say that two instances/occurences an element means that both are **equal to each other**.

In [24]:
# two occurences of `1` are... `1` and `1`, becuase:
assert 1 == 1

# ;D

To put it simply: **multiset is a set that can keep multiple equal elements**. 

So this is a multiset: `⟨1, 1, 2⟩`.

Also, multisets written this way: `⟨1, 1, 2⟩`, `⟨1, 2, 1⟩`, `⟨2, 1, 1⟩`, are identical.

## Implementation

What about the implementation?

In terms of Python's collections, multiset is similar to: 

- `set` (as keeping insertion order is not required)

- `list`/`tuple` (as both keep multiple "equal" elements)

### Multiset as `list` / `tuple`

First option: `list` (or `tuple`). It keeps insertion order, but it is not a requirement for multiset not to keep it.

So `[1, 1, 2, 2, 3]` and `[3, 1, 1, 2, 2]` would be both identical multisets.

**Problems**

1. There is no out-of-the-box way to check for identity (`[1, 2, 1] == [1, 1, 2]` will return `False`).

2. We don't have any (optimized) set operations out-of-the-box.

3. We cannot benfit from set's great feature: constant time (`O(1)`) membership checking.

So generally implementing multiset using `list` or `tuple` sux.

### Multiset as `dict`

Let's try a different approach: a dict with key of multiset element and value of list of all equal elements for a key.

So multiset of `[42, 42, 51, 51, 60]` would be:

In [13]:
 {
     42: [42, 42],
     51: [51, 51],
     60: [60],
 }

{42: [42, 42], 51: [51, 51], 60: [60]}

But why bother building a list of repeated identical elements if we can keep only a count them.

In this implementation multiset of `[41, 41, 52, 52, 60]` would be:

In [4]:
{
    41: 2,
    52: 2,
    60: 1,
}

{41: 2, 52: 2, 60: 1}

We would increment the count on adding new element to multiset and decrement it on removing.

### Multiset as `Counter`

It turns out that we already have this kind of datatype in Python: `Counter`.

In [16]:
from collections import Counter

In [17]:
my_fruit = ['apple', 'banana', 'pear', 'apple', 'apple']

my_fruit_counter = Counter(fruit)

my_fruit_counter

Counter({'apple': 3, 'banana': 1, 'pear': 1})

In [18]:
my_fruit_counter['banana'] += 4
my_fruit_counter

Counter({'apple': 3, 'banana': 5, 'pear': 1})

In [38]:
'pear' in my_fruit_counter

True

#### Constant time (`O(1)`) membership checking

In [208]:
large_counter = Counter(range(10**6 + 1))
number = 10**6
%timeit number in large_counter

109 ns ± 2.45 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In fact `Counter` inherits from `dict`, so it's not surprising ;)

In [209]:
# compared to list
large_list = list(range(10**6 + 1))
number = 10**6
%timeit number in large_list

10.8 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


#### Counter operations

In [217]:
apple_apple_pear_banana = Counter('apple apple pear banana'.split())
apple_apple_pear_banana

Counter({'apple': 2, 'pear': 1, 'banana': 1})

In [194]:
pear_pear_orange = Counter('pear pear orange'.split())
pear_pear_orange

Counter({'pear': 2, 'orange': 1})

##### Equalily

In [218]:
apple_apple_pear_banana == pear_pear_orange

False

In [220]:
apple_apple_pear_banana == Counter('pear banana apple apple'.split())

True

##### Add

Add counts from two counters.

In [196]:
apple_apple_pear_banana + pear_pear_orange

Counter({'apple': 2, 'pear': 3, 'banana': 1, 'orange': 1})

##### Subtract

Subtract count, but keep only results with positive counts.

In [197]:
apple_apple_pear_banana - pear_pear_orange

Counter({'apple': 2, 'banana': 1})

In [198]:
pear_pear_orange - apple_apple_pear_banana

Counter({'pear': 1, 'orange': 1})

##### Union

Union is the maximum of value in either of the input counters.

In [200]:
apple_apple_pear_banana | pear_pear_orange

Counter({'apple': 2, 'pear': 2, 'banana': 1, 'orange': 1})

##### Intersection

Intersection is the minimum of corresponding counts.

In [228]:
apple_apple_pear_banana & pear_pear_orange

Counter({'pear': 1})

##### Update

Like `dict.update()` but add counts instead of replacing them.


In [231]:
c = Counter('Monty')
c

Counter({'M': 1, 'o': 1, 'n': 1, 't': 1, 'y': 1})

In [232]:
c.update('Python')
c

Counter({'M': 1, 'o': 2, 'n': 2, 't': 2, 'y': 2, 'P': 1, 'h': 1})

##### Enumerating all elements

In [221]:
list((apple_apple_pear_banana.elements()))

['apple', 'apple', 'pear', 'banana']

##### Most common elements

In [226]:
c = Counter('Do you use static type hints in your code?')

In [227]:
c.most_common()

[(' ', 8),
 ('o', 4),
 ('t', 4),
 ('y', 3),
 ('u', 3),
 ('s', 3),
 ('e', 3),
 ('i', 3),
 ('c', 2),
 ('n', 2),
 ('D', 1),
 ('a', 1),
 ('p', 1),
 ('h', 1),
 ('r', 1),
 ('d', 1),
 ('?', 1)]

In [210]:
c.most_common(3)

[(' ', 8), ('o', 4), ('t', 4)]

#### Pros

- blazingly fast membership checking (vs list)

- additional operations: `+`, `-`, `&`, `|` (vs dict)

- equality checking (vs list)

#### Cons

- Counter is a dict, so we cannot store there elements that are not hashable.

## Counter use cases

???