$\newcommand{\is}{\mathrel{\mathop:=}}$
$\newcommand{\range}{\mathop{ran}}$
$\newcommand{\setof}[1]{\left \{ #1 \right \}}$
$\newcommand{\card}[1]{\left | #1 \right |}$
$\newcommand{\tuple}[1]{\left \langle #1 \right \rangle}$
$\newcommand{\emptytuple}{\left \langle \right \rangle}$
$\newcommand{\tuplecat}{\cdot}$
$\newcommand{\stringcat}{\cdot}$
$\newcommand{\emptystring}{\varepsilon}$
$\newcommand{\String}[1]{\mathit{#1}}$
$\newcommand{\LeftEdgeSymbol}{\rtimes}$
$\newcommand{\RightEdgeSymbol}{\ltimes}$
$\newcommand{\LeftEdge}{\LeftEdgeSymbol}$
$\newcommand{\RightEdge}{\RightEdgeSymbol}$
$\newcommand{\mult}{\times}$
$\newcommand{\multisum}{\uplus}$
$\newcommand{\multimult}{\otimes}$
$\newcommand{\freqsymbol}{\mathrm{freq}}$
$\newcommand{\freq}[1]{\freqsymbol(#1)}$
$\newcommand{\prob}{P}$
$\newcommand{\counts}[2]{\card{#2}_{#1}}$
$\newcommand{\inv}[1]{#1^{-1}}$
$\newcommand{\Lex}{\mathit{Lex}}$
$\newcommand{\length}[1]{\left | #1 \right |}$
$\newcommand{\suc}{S}$
$\newcommand{\sprec}{<}$
$\newcommand{\Rcomp}[2]{#1 \circ #2}$
$\newcommand{\domsymbol}{\triangleleft}$
$\newcommand{\idom}{\domsymbol}$
$\newcommand{\pdom}{\domsymbol^+}$
$\newcommand{\rdom}{\domsymbol^*}$
$\newcommand{\indegree}[1]{\mathrm{in(#1)}}$
$\newcommand{\outdegree}[1]{\mathrm{out(#1)}}$
$\newcommand{\cupdot}{\cup\mkern-11.5mu\cdot\mkern5mu}$
$\newcommand{\mymatrix}[1]{\left ( \matrix{#1} \right )}$
$\newcommand{\id}{\mathrm{id}}$

**Prerequisites**

- sets (notation, operations, comparisons)

# Sets with counters: Multisets

Sets are an abstract mathematical object that we may think of as collections in a lose sense.
They are sometimes likened to bags, but this can be misleading.
While bags convey the intuition that sets are unordered, they are unlike sets in that idempotency does not hold: a bag containing two apples is not the same thing as a bag containing one apple.
Bags thus are closer to what mathematicians call **multisets**.
Like sets, multisets are unordered, but idempotency does not hold; a multiset can contain multiple instances of the same object.

<div class=definition>
    A **multiset** is a set where each element has a numerosity.
    We indicate that a set $S$ is a multiset by subscripting it with $M$, as in $S_M$.
    For all $s \in S_M$, $S_M(s)$ indicates the numerosity or **count** of $s$ in $S_M$.
    We may write $S_M(s) = 0$ instead of $s \notin S_M$.
</div>

<div class=example>
    The multiset $\setof{a,a,b}_M$ contains two occurrences of $a$ and one occurrence of $b$.
    Hence $\setof{a,a,b}_M \neq \setof{a,b}_M$ even though $\setof{a,a,b} = \setof{a,b}$.
    However, it still holds that order does not matter: $\setof{a,a,b}_M = \setof{a,b,a}_M = \setof{b, a, a}_M$, just like $\setof{a,b} = \setof{b,a}$.
</div>

<div class=exercise>
Fill in the gaps with $=$ and $\neq$ as appropriate.

<ol>
<li>
$\setof{5, 5, 7, 8}_M \_ \setof{7, 5, 8, 7}_M$
</li>
<li>
$\setof{5, 3, 4} \_ \setof{5,3,4,4,3,5,5,4,3}$
</li>
<li>
$\setof{\text{peanut butter}, \text{jelly}}_M \_ \setof{\text{peanut butter}, \text{jelly}}_M$
</li>
<li>
$\setof{\text{John}, \text{John}, \text{John}}_M \_ \setof{\text{John}}_M$
</li>
$\setof{a}_M \_ \setof{a,a}$
</ol>
</div>

The notation for multisets is much less standardized than that for sets, and not everybody follows the convention of subscripting multisets with $M$.
If an author uses multisets, pay close attention to how they define their notation.
Also, it is often convenient to explicitly list the count of each element rather than listing the element multiple times.

<div class=example>
    The multiset $A_M \is \setof{a,a,a,b,b,c}$ can be more conveniently written as $A_M \is \setof{a: 3, b: 2, c: 1}$.
    Elements with a count of $0$ are usually omitted, but may be included if this is relevant information.
</div>

In [1]:
from collections import Counter

def set_equals(A, B):
    print("{} same set as {}?".format(A,B), set(A) == (B))

def multiset_equals(A, B):
    print("{} same multiset as {}?".format(A,B), Counter(A) == Counter(B))

multiset_equals(["a", "a", "b", "b", "c"], ["a", "b", "c", "c", "d"])
set_equals(["a", "a", "b", "b", "c"], ["a", "b", "c", "c", "d"])
multiset_equals(["a", "a", "b", "b", "c"], ["a", "b", "c", "a", "b"])

['a', 'a', 'b', 'b', 'c'] same multiset as ['a', 'b', 'c', 'c', 'd']? False
['a', 'a', 'b', 'b', 'c'] same set as ['a', 'b', 'c', 'c', 'd']? False
['a', 'a', 'b', 'b', 'c'] same multiset as ['a', 'b', 'c', 'a', 'b']? True


<div class=exercise>
Represent all the multisets in the exercise above with explicit counts.
</div>

With this kind of notation, it also becomes possible to define multisets with set-builder notation.
For example, $\setof{n: 2n \mid n \in \mathbb{N}}$ is the set that contains $2n$ occurrences of every natural number $n$: $\setof{0:0, 1:2, 2:4, 3:6, \ldots}$.

In [2]:
multiset = Counter({n: 2*n for n in range(10)})
print(multiset)

Counter({9: 18, 8: 16, 7: 14, 6: 12, 5: 10, 4: 8, 3: 6, 2: 4, 1: 2, 0: 0})


<div class=exercise>
Write down the multiset defined by each set-builder expression.
These are not entirely straight-forward, and you'll have to make some educated guesses as to how to handle special cases.

<ol>
<li>
$\setof{n: 10 - n \mid 0 \leq n \leq 10}$
</li>
<li>
$\setof{a: b, b: a \mid a,b \geq 0, a + b = 10}$
</li>
</ol>
</div>