UintSet
is an alternative set type in Elixir,
designed to hold sets of small, non-negative integers.
UintSet
emulates the full MapSet
interface,
except for MapSet.size
which is replaced by UintSet.length
.
Many of the UintSet
doctests and unit tests were adapted from MapSet
.
UintSet
illustrates the construction of a functional data structure from scratch,
implementing protocols—Inspect
, Enumerable
, Collectable
—and a stream.
All the content of an UintSet
is represented by a single integer,
which in Elixir is limited only by available memory.
Each bit in that integer represents one element:
a bit set to 1
at position n
means the number n
is present in the set.
iex> s = UintSet.new([0, 2, 3])
#UintSet<[0, 2, 3]>
iex> s.bits
13
iex> s.bits |> Integer.to_string(2)
"1101"
Using an integer as a bit vector we can use fast bitwise operators
for set operations like intersection and difference.
See the source code of UintSet.intersection
and UintSet.difference
.
Documentation with examples: https://hexdocs.pm/uint_set/UintSet.html.
This package was inspired by the excellent intset
example from chapter 6 of
The Go Programming Language, by Alan. A. A. Donovan and Brian W. Kernighan.
Here is how Donovan & Kernighan introduce the example:
A set represented by a map is very flexible but, for certain problems, a specialized representation may outperform it. For example, in domains such as dataflow analysis where set elements are small non-negative integers, sets have many elements, and set operations like union and intersection are common, a bit vector is ideal.
Storing a set as a bit vector is efficient only for sets of small integers, or high-density sets where a large percentage of the possible elements are present. The memory usage is proportional to the largest element stored, not to the number of elements in the set. For example, if the largest element is 1_000_000, the bit vector size will be 125_000 bytes (⅛), regardless of the number of elements in the set.
Donovan & Kerninghan's example uses an uint64[]
slice (dynamic array) to store elements in blocks of 64 bits. They loop over slices performing bitwise operations in each 64-bit block, which
in Elixir we do in a single expression like bits1 &&& bits2
because the bit vector is stored in a single integer. Go also has a big.Int
type, but they chose not to use it.