# Multiset

### _aka_ bag, mset

### _pl._ multizbiór, wielozbiór

**Plan**

1. Definiton of multiset

2. Implementation of multiset

3. Use cases of multiset

# Definition

It's "a modification of the concept of a set that, unlike a set, **allows for multiple instances for each of its elements**". [[source](https://en.wikipedia.org/wiki/Multiset)]

But what _instance of an element_ means? "Instance" is not a programming term here. It means -- more or less -- occurrence, which... also is not very helpful. So let's say that two instances/occurrences an element means that both are **equal to each other**.

In [1]:
# two occurences of `1` are... `1` and `1`, becuase:
assert 1 == 1

# ;D

To put it simply: **multiset is a set that can keep multiple equal elements**. 

So this is a multiset: `⟨1, 1, 2⟩`.

Also, multisets written this way: `⟨1, 1, 2⟩`, `⟨1, 2, 1⟩`, `⟨2, 1, 1⟩`, are identical.

# Implementation

What about the implementation?

In terms of Python's collections, multiset is similar to: 

- `set` (as keeping insertion order is not required)

- `list`/`tuple` (as both keep multiple "equal" elements)

## Multiset as `list` / `tuple`

First option: `list` (or `tuple`). It keeps insertion order, but it is not a requirement for multiset not to keep it.

So `[1, 1, 2, 2, 3]` and `[3, 1, 1, 2, 2]` would be both identical multisets.

**Problems**

1. There is no out-of-the-box way to check for identity (`[1, 2, 1] == [1, 1, 2]` will return `False`).

2. We don't have any (optimized) set operations out-of-the-box.

3. We cannot benfit from set's great feature: constant time (`O(1)`) membership checking.

So generally implementing multiset using `list` or `tuple` sux.

## Multiset as `dict`

Let's try a different approach: a dict with key of multiset element and value of list of all equal elements for a key.

So multiset of `[42, 42, 51, 51, 60]` would be:

In [None]:
 {
     42: [42, 42],
     51: [51, 51],
     60: [60],
 }

But why bother building a list of repeated identical elements if we can keep only a count them.

In this implementation multiset of `[41, 41, 52, 52, 60]` would be:

In [None]:
{
    41: 2,
    52: 2,
    60: 1,
}

We would increment the count on adding new element to multiset and decrement it on removing.

## Multiset as `Counter`

It turns out that we already have this kind of datatype in Python: `Counter`.

In [None]:
from collections import Counter

In [None]:
my_fruit = ['apple', 'banana', 'pear', 'apple', 'apple']

my_fruit_counter = Counter(fruit)

my_fruit_counter

In [None]:
my_fruit_counter['banana'] += 4
my_fruit_counter

In [None]:
'pear' in my_fruit_counter

### Constant time (`O(1)`) membership checking

In [None]:
large_counter = Counter(range(10**6 + 1))
number = 10**6
%timeit number in large_counter

In fact `Counter` inherits from `dict`, so it's not surprising ;)

In [None]:
# compared to list
large_list = list(range(10**6 + 1))
number = 10**6
%timeit number in large_list

### Counter operations

In [None]:
apple_apple_pear_banana = Counter('apple apple pear banana'.split())
apple_apple_pear_banana

In [None]:
pear_pear_orange = Counter('pear pear orange'.split())
pear_pear_orange

#### Equalily

In [None]:
apple_apple_pear_banana == pear_pear_orange

In [None]:
apple_apple_pear_banana == Counter('pear banana apple apple'.split())

#### Add

Add counts from two counters.

In [None]:
apple_apple_pear_banana + pear_pear_orange

#### Subtract

Subtract count, but keep only results with positive counts.

In [None]:
apple_apple_pear_banana - pear_pear_orange

In [None]:
pear_pear_orange - apple_apple_pear_banana

#### Union

Union is the maximum of value in either of the input counters.

In [None]:
apple_apple_pear_banana | pear_pear_orange

#### Intersection

Intersection is the minimum of corresponding counts.

In [None]:
apple_apple_pear_banana & pear_pear_orange

#### Update

Like `dict.update()` but add counts instead of replacing them.


In [None]:
c = Counter('Monty')
c

In [None]:
c.update('Python')
c

#### Enumerating all elements

In [None]:
list((apple_apple_pear_banana.elements()))

#### Most common elements

In [None]:
c = Counter('Do you use static type hints in your code?')

In [None]:
c.most_common()

In [None]:
c.most_common(3)

### Counter pros and cons

#### Pros

- blazingly fast membership checking (vs list/tuple)

- equality checking (vs list/tuple)

- additional operations: `+`, `-`, `&`, `|` (vs dict)

#### Cons

- Counter is a dict, so we cannot store there elements that are not hashable (vs list/tuple).

- It's useless when we want to store all occurrences of equal elements (vs dict).

## Multidict with many occurrences of equal elements

What if we want to store all elements, after all. 

In [None]:
class Person:    
    def __init__(self, id_, name, nationality):
        self.id_ = id_
        self.name = name
        self.nationality = nationality
    
    def __hash__(self):
        return hash((self.id_, self.name))
    
    def __eq__(self, other):
        if isinstance(other, Person):
            return self.id_ == other.id_ and self.name == other.name
        return False
    
    def __repr__(self):
        return f'Person({self.id_}, {self.name}, {self.nationality})'

In [None]:
p1 = Person(id_=1, name='Bob', nationality='US')
p2 = Person(id_=1, name='Bob', nationality='UK')
p3 = Person(id_=2, name='Kasia', nationality='PL')
p4 = Person(id_=3, name='Taras', nationality='UA')

In [None]:
Counter([p1, p2, p3, p4])

### Wrapper on `defaultdict`

In [None]:
from collections import defaultdict

class Multiset:
    def __init__(self):
        self.__data = defaultdict(list)
        
    def __repr__(self):
        return f'Multiset({dict(self.__data)})'
        
    def update(self, items):
        for item in items:
            self.__data[hash(item)].append(item)
            
    def remove(self, item):
        self.__data[hash(item)].remove(item)
        if len(self.__data[hash(item)]) == 0:
            del self.__data[hash(item)]

In [None]:
people = Multiset()
people

In [None]:
people.update([p3])
people

In [None]:
people.update([p1, p2])
people

In [None]:
people.remove(p1)
people

In [None]:
people.remove(p2)
people

### Pros and cons

#### Pros

- ...

#### Cos

- ....



## Multidict use cases

### Counter

### Multidict with copies