# Purely Functional Data Structures

#### Terminology
 - **Persistent** data structure - *always* preserves the previous version of itself when it is modified (even concurrently)
 - **Ephemeral** data structure - might not preserve its complexity characteristics when accessed and modified concurrently even when implemented as immutable (especially amortized times)

## Queues

|    instance   | persistence | empty | isEmpty |     snoc     | head |     tail     |
|:-------------:|:-----------:|:-----:|:-------:|:------------:|:----:|:------------:|
| Batched Queue |  ephemeral  |  O(1) |   O(1)  | O(n) / O(1)* | O(1) | O(n) / O(1)* |

** amortized time*

In [None]:
class Queue q where
    
    -- | Construct new (empty) queue
    empty :: q a
    
    -- | Check whether a queue is empty
    isEmpty :: q a -> Bool
    
    -- | Append new item to the back of a queue
    snoc :: q a -> a -> q a
    
    -- | Retrieve the head item of a non-empty queue
    head :: q a -> a
    
    -- | Remove the head of a non-empty queue and retrieve the rear
    tail :: q a -> q a

### Batched Queue

In [None]:
data BatchedQueue a = BQ [a] [a]

-- | Helper function that maintains the 'BatchedQueue' invariant:
-- | *A queue is empty iff the front part is empty*
-- | 
-- | This invariant is preserved by reversing the rear and replacing the front
-- | whenever the front is empty.
checkf :: ([a], [a]) -> ([a], [a])
checkf ([], r) = (reverse r, [])
checkf q = q

instance Queue BatchedQueue where
    
    -- | Constructs new queue in O(1)
    empty = BQ [] []

    -- | Checks whether the queue is empty in O(1) steps.
    -- | Note: This queue maintains the invariant that if the front part is empty, the queue is empty
    -- |       check the 'checkf' helper function.
    isEmpty (BQ f _) = null f
    
    -- | Appends new element to the back of the queue.
    -- | Runs in O(n), amortized time with 'tail' is O(1).
    snoc (BQ f r) x = let (f', r') = checkf(f, x : r) in BQ f' r'
    
    -- | Retrieves the head of the queue in O(1)
    head (BQ [] _) = error "Queue is empty"
    head (BQ (x:_) _) = x
    
    -- | Removes the head of the queue and returns the rest.
    -- | Runs in O(n), amortized time with 'snoc' is O(1).
    tail (BQ [] _) = error "Queue is empty"
    tail (BQ (x:f) r) = let (f', r') = checkf (f, r) in BQ f' r'

## Sets

|      instance      | persistence | empty |   member  |   insert  |
|:------------------:|:-----------:|:-----:|:---------:|:---------:|
| Binary Search Tree |  ephemeral  |  O(1) |    O(n)   |    O(n)   |
|   Red-Black Tree   |  ephemeral  |  O(1) | O(log(n)) | O(log(n)) |

In [None]:
class Set s where

    -- | Construct new (empty) set
    empty :: Ord a => s a
    
    -- | Check whether a set contains given item
    member :: Ord a => a -> s a -> Bool
    
    -- | Add new item to a set while maintaining the item uniqueness property
    insert :: Ord a => a -> s a -> s a

### Unbalanced Set
Implementation of an unbalanced set via a *Binary Search Tree (BST)*.

In [None]:
data Tree a = Empty | Node (Tree a) a (Tree a)

instance Set Tree where

    -- | Construct an empty set in O(1).
    empty = Empty
    
    -- | Check whether this set contains given item.
    -- | Since the underlying BST may not be balanced, this function may take O(n) steps in the worst case.
    member _ Empty = False
    member x (Node a y b) = case (compare x y) of
        EQ -> True
        LT -> member x a
        GT -> member x b
    
    -- | Add new item to this set if it's not present yet.
    -- | Similarly to 'member', for an unbalanced instance this may take up to O(n) steps.
    insert x Empty = Node Empty x Empty
    insert x s @ (Node a y b) = case (compare x y) of
        EQ -> s
        LT -> Node (insert x a) y b
        GT -> Node a y (insert x b)

### Balanced Set
Implementation of a balanced set via a [Red-Black Tree](https://en.wikipedia.org/wiki/Red%E2%80%93black_tree) without any fancy optimizations. Specifically, in `ins` (e.g. for the left child) dosn't have to check for all the red-red violations in `balance` (actually it does not have to check the color of any node not on the search path).

In [None]:
data Color = R | B

data Tree a = Empty | Node Color (Tree a) a (Tree a)

-- | Re-balance and locally repair the RBT color property by pushing
-- | one of two consecutive red nodes with a black parent up the path to the root.
balance :: Color -> Tree a -> a -> Tree a -> Tree a
balance B (Node R (Node R a x b) y c) z d = Node R (Node B a x b) y (Node B c z d)
balance B (Node R a x (Node R b y c)) z d = Node R (Node B a x b) y (Node B c z d)
balance B a x (Node R (Node R b y c) z d) = Node R (Node B a x b) y (Node B c z d)
balance B a x (Node R b y (Node R c z d)) = Node R (Node B a x b) y (Node B c z d)
balance color a x b = Node color a x b

instance Set Tree where 

    -- | Construct an empty set in O(1).
    empty = Empty
    
    -- | Check whether this set contains given item.
    -- | Since the underlying RBT is balanced, this function takes O(log(n)) steps in the worst case.
    member _ Empty = False
    member x (Node _ a y b) = case (compare x y) of
        EQ -> True
        LT -> member x a
        GT -> member x b
    
    -- | Add new item to this set if it's not present yet.
    -- |
    -- | Call to 'insert' takes at most O(log(n)) steps because the tree is kept balanced by
    -- | 'balance' when backing up after adding new node and the fact that in a RB tree the deepest
    -- | leaf is at most twice as far from the root as the shallowest leaf is.
    insert x Empty = Node R Empty x Empty
    insert x s = let (Node _ a y b) = ins s in Node B a y b
        where
            ins Empty = Node R Empty x Empty
            ins s @ (Node color a y b) = case (compare x y) of
                EQ -> s
                LT -> balance color (ins a) y b
                GT -> balance color a y (ins b)

## Heaps

|    instance   | persistence | empty | isEmpty |       insert      |       merge       |           findMin          |     deleteMin     |
|:-------------:|:-----------:|:-----:|:-------:|:-----------------:|:-----------------:|:--------------------------:|:-----------------:|
|  Leftist Heap |  ephemeral  |  O(1) |   O(1)  |     O(log(n))     |     O(log(n))     |            O(1)            |     O(log(n))     |
| Binomial Heap |  ephemeral  |  O(1) |   O(1)  | O(log(n)) / O(1)* |     O(log(n))     |     O(log(n)) / O(1)**     |     O(log(n))     |
|   Splay Heap  |  ephemeral  |  O(1) |   O(1)  | O(n) / O(log(n))* | O(n) / O(log(n))* | O(n) / O(log(n))* / O(1)** | O(n) / O(log(n))* |

** amortized time*

*** with explicit reference to the minimum element*

In [None]:
class Heap h where

    -- | Construct new (empty) heap
    empty :: Ord a => h a
    
    -- | Check whether a heap is empty
    isEmpty :: h a -> Bool
    
    -- | Add new item to a heap
    insert :: Ord a => a -> h a -> h a
    
    -- | Merge two heaps together
    merge :: Ord a => h a -> h a -> h a
    
    -- | Retrieve the minimum element of a heap
    findMin :: Ord a => h a -> a
    
    -- | Remove the minimum element of a heap and return resulting heap
    deleteMin :: Ord a => h a -> h a

### Leftist Heap
This implementation is based on a [Leftist tree](https://en.wikipedia.org/wiki/Leftist_tree) without any fancy optimization techniques. Notably, in this implementation `merge` takes two passes:
 1. top-down pass consisting of calls to `merge`
 1. bottom-up pass consisting of calls to `makeNode`

In [None]:
data LeftistHeap a = Empty | Node Int a (LeftistHeap a) (LeftistHeap a)

-- | Extract the rank of a heap node (zero for empty heap)
rank :: LeftistHeap a -> Int
rank Empty = 0
rank (Node r _ _ _) = r

-- | Create new heap node with given sub-trees while maintaining that
-- | the rank of the left sub-tree is at least as large as the right one.
makeNode :: Ord a => a -> LeftistHeap a -> LeftistHeap a -> LeftistHeap a
makeNode x a b = if ra >= rb then Node (rb + 1) x a b else Node (ra + 1) x b a
    where
        ra = rank a
        rb = rank b

instance Heap LeftistHeap where

    -- | Construct an empty heap in O(1).
    empty = Empty
    
    -- | Check whether given heap is empty in O(1).
    isEmpty Empty = True
    isEmpty _ = False
    
    -- | Add new item to the heap.
    -- |
    -- | The item is first turned to a trivial heap and merged into.
    -- | Therefore this call runs in O(log(n)) steps in the worst case.
    insert x h = merge (Node 1 x Empty Empty) h
    
    -- | Merge two heaps together in O(log(n)) steps.
    -- | Note: The tree is kept balanced because 'makeNode' balances sub-trees via 'rank'.
    merge h Empty = h
    merge Empty h = h
    merge h1 @ (Node _ x a1 b1) h2 @ (Node _ y a2 b2) = 
        if x <= y then makeNode x a1 (merge b1 h2)
        else makeNode y a2 (merge h1 b2)
    
    -- | Retrieve the minimum element of a heap in O(1) worst case time.
    findMin Empty = error "Heap is empty"
    findMin (Node _ x _ _) = x
    
    -- | Remove the minimum element of a heap.
    -- | 
    -- | Becasue the minimum element is the root, this funciton just calls 'merge' on
    -- | both root sub-trees and thus runs in O(log(n)) worst case time.
    deleteMin Empty = error "Heap is empty"
    deleteMin (Node _ x a b) = merge a b

### Binomial Heap
This implementation is based on a set (forest) of binomial trees described for instance [here](https://en.wikipedia.org/wiki/Binomial_heap).

There are no fancy optimization techniques, for instance the heap described below does not track the minimum element.

#### Benefits of Binomial Heap
 - Compared to standard *Binary Heap* supports merging (melding) in O(log(n)) time
 - Compared to *Leaftist Heap* has O(1) amortized inserts and merging is based on the size of the larger heap

In [None]:
data BinomialTree a = Node Int a [BinomialTree a]

newtype BinomialHeap a = BinomialHeap [BinomialTree a]

-- | Extract the rank of a binomial tree
rank :: BinomialTree a -> Int
rank (Node r _ _) = r

-- | Extract the root element of a binomial tree
root :: BinomialTree a -> a
root (Node _ x _) = x

-- | Link two binmoial trees of the same rank together in O(1) time.
--
-- | Tree having the greater root is added as the leftmost child of of the other.
-- | Resulting tree has rank incremented by one.
link :: Ord a => BinomialTree a -> BinomialTree a -> BinomialTree a
link t1 @ (Node r x1 c1) t2 @ (Node _ x2 c2) =
    if x1 <= x2 then Node (r + 1) x1 (t2 : c1)
    else Node (r + 1) x2 (t1 : c2)

-- | Insert a tree in given heap.
-- |
-- | Since there are at most O(log(n)) trees, 'link' takes O(1) and in the worst case we traverse
-- | all the trees, the time complexity is logarithmic.
insTree :: Ord a => BinomialTree a -> BinomialHeap a -> BinomialHeap a
insTree t (BinomialHeap []) = BinomialHeap [t]
insTree t (BinomialHeap ts @ (t' : ts')) =
    -- if the tree rank is not present in the heap then just add the tree
    if rank t < rank t' then BinomialHeap (t:ts)
    -- otherwise fold over the heap trees while linking them
    else insTree (link t t') (BinomialHeap ts')

-- | Finds and removes a tree with the minimum element from a heap.
-- |
-- | This function returns both the minimum tree and resulting heap and runs in logarithmic worst case time
-- | as there are only O(log(n)) trees in any heap and we need to check only roots.
removeMinTree :: Ord a => BinomialHeap a -> (BinomialTree a, BinomialHeap a)
removeMinTree (BinomialHeap []) = error "Heap is empty"
removeMinTree (BinomialHeap [t]) = (t, empty)
removeMinTree (BinomialHeap (t:ts)) = let (t', (BinomialHeap ts')) = removeMinTree (BinomialHeap ts) in
    if root t <= root t' then (t, BinomialHeap ts)
    else (t', BinomialHeap (t:ts'))

instance Heap BinomialHeap where
    
    -- | Construct an empty heap in O(1).
    empty = BinomialHeap []
    
    -- | Check whether given heap is empty in O(1).
    isEmpty (BinomialHeap ts) = null ts
    
    -- | Add new item to the heap.
    -- |
    -- | The item is first turned to a trivial binomial tree of rank 0 and inserted into
    -- | the heap via 'insTree'.
    -- |
    -- | The worst case time complexity is O(log(n)) but can be amortized by other updates to just O(1).
    insert x h = insTree (Node 0 x []) h
    
    -- | Merge two heaps together in O(log(n)) steps.
    merge h (BinomialHeap []) = h
    merge (BinomialHeap []) h = h
    merge h1 @ (BinomialHeap (t1:ts1')) h2 @ (BinomialHeap (t2:ts2')) =
        case compare (rank t1) (rank t2) of
            LT -> let (BinomialHeap ts) = merge (BinomialHeap ts1') h2 in BinomialHeap (t1:ts)
            GT -> let (BinomialHeap ts) = merge h1 (BinomialHeap ts2') in BinomialHeap (t2:ts)
            EQ -> insTree (link t1 t2) (merge (BinomialHeap ts1') (BinomialHeap ts2'))
    
    -- | Retrieve the minimum element of a heap in O(log(n)) worst case time.
    -- | Note: The time complexity follows from the analysis of 'removeMinTree'.
    findMin h = let (t, _) = removeMinTree h in root t
    
    -- | Remove the minimum element of a heap in O(log(n)) worst case time.
    -- | Note: The time complexity follows from 'removeMinTree' and 'merge' and the fact that
    -- |       there is at most logarithmic number of trees in a heap (for 'reverse').
    deleteMin h = let ((Node _ _ ts1), h') = removeMinTree h in merge (BinomialHeap (reverse ts1)) h'

### Splay Heap
This `Heap` instance is based on a [Splay Tree](https://en.wikipedia.org/wiki/Splay_tree).

#### Benefits of Splay Heaps
 - Self-balancing (splaying) moves frequently accessed items closer to the root where they can be accessed more quickly in the future
 - One of the fastest heaps when persistence and `merge` are not required (with explicit minimum pointer)
 - Sorting an already sorted array with a `SplayHeap` takes just `O(n)` time
 - Balancing does not require any additional data stored in the tree (resp. nodes)

In [None]:
data SplayHeap a = Empty | Node (SplayHeap a) a (SplayHeap a)

-- | Splay a pivot element to the root of given Splay tree (heap) and
-- | return a pair of its left sub-tree (smaller than pivot) and the rest (greater than pivot).
-- |
-- | Splaying involves two optimistic balancing operations: Zig-Zig and Zig-Zag and
-- | balances the tree to shorten the leftmost spine (the leftmost element is the minimum).
-- |
-- | Despite splaying, the height of a Splay tree may in the worst case still be linear.
-- | However, the amortized complexity of heap access and update operations is linearithmic.
partition :: Ord a => a -> SplayHeap a -> (SplayHeap a, SplayHeap a)
partition _ Empty = (Empty, Empty)
partition pivot t @ (Node a x b) =
    if x <= pivot then case b of
        Empty -> (t, Empty)
        Node b1 y b2 -> if y <= pivot then let (small, big) = partition pivot b2 in (Node (Node a x b1) y small, big)
                        else let (small, big) = partition pivot b1 in (Node a x small, Node big y b2)
    else case a of
        Empty -> (Empty, t)
        Node a1 y a2 -> if y <= pivot then let (small, big) = partition pivot a2 in (Node a1 y small, Node big x b)
                        else let (small, big) = partition pivot a1 in (small, Node big y (Node a2 x b))

instance Heap SplayHeap where
    
    -- | Construct an empty heap in O(1).
    empty = Empty
    
    -- | Check whether given heap is empty in O(1).
    isEmpty Empty = True
    isEmpty _ = False
    
    -- | Add new item to the heap.
    -- |
    -- | The item is used as a pivot to partition given heap and splayed as new root with
    -- | the smaller elemnts and greater elements as the left and right sub-trees respectively.
    -- |
    -- | As discussed in the 'partition' function, the heap might end up imbalanced, so the
    -- | worst case complexity is O(n) with amortized time complexity O(log(n)).
    insert x h = let (a, b) = partition x h in Node a x b
    
    -- | Merge two heaps together in O(log(n)) amortized steps (O(n) worst case).
    merge Empty h = h
    merge (Node a x b) h = let (ha, hb) = partition x h in Node (merge ha a) x (merge hb b)
    
    -- | Retrieve the minimum element of a heap in O(log(n)) amortized time.
    -- |
    -- | The minimum element in a Splay Tree is the leftmost element so 'findMin' simply
    -- | traverses the leftmost spine in O(log(n)) amortized time.
    -- |
    -- | Note: The time complexity can be reduced to O(1) by wrapping the whole data structure
    -- |       and explicitly tracking a reference to the minimum element.
    findMin Empty = error "Heap is empty"
    findMin (Node Empty x _) = x
    findMin (Node a _ _) = findMin a
    
    -- | Remove the minimum element of a heap.
    -- | 
    -- | The removal procedure is analogous to 'findMin' and with the same argument runs in
    -- | O(log(n)) amortized time.
    deleteMin Empty = error "Heap is empty"
    deleteMin (Node Empty x b) = b
    deleteMin (Node (Node Empty x b) y c) = Node b y c
    deleteMin (Node (Node a x b) y c) = Node (deleteMin a) x (Node b y c)