In [1]:
%%html
<style>h1{text-align:center;}h1{text-transform:none;}.rendered_html h4{color:#17b6eb;font-size: 1.6em;}img[alt=dia1]{width:35%;}img[alt=book]{width:20%;font-size: 3em;}img[alt=dia2]{width:50%;}.author{font-size:8px;}</style>

# Lecture 10: Invariants, 2-3 Trees, Red-Black Trees

## 0. Data Structure Invariants
<div class="author">src: chalmers.instructure.com</div>

### 0.1 Recap last week - BSTs

A *binary search tree* is a binary tree, where:

- each node has a __key__ 
- each node's key is greater than all the keys in the left subtree, and less than all the keys in the right subtree

![dia1](img/10bst.png)

### 0.2 Invariants

A so-called __invariant__ on a data structure restricts valid elements of a type. Dependent types can capture such invariants, so that only valid elements are well-typed.

The property

    *each node's key is greater than all the keys in the left subtree, and less than all the keys in the right subtree*
    
is an example of a __data structure invariant__.

A data structure invariant is a propoerty of the data structure that always needs to hold.

### 0.2.1 Maintaining Invariants

- When implementing data structure operations, it is fairly simple to make sure that the invariant holds
- When modifying an invariant data structure, it is often challenging to make sure that the invariant will hold afterwards. This is called: __maintaining the invariant__

## 1. 2-3 trees

### 1.1 Recap BST performance

![dia2](img/9bstbalanced.png)
<div class="author">src: chalmers.instructure.com</div>

__Best Case = "balanced"__ - height of tree is $O(\log n)$

__Average Case = "random order"__ - if keys are added in random order, height of tree is $O(\log n)$

__Worst Case = "unbalanced"__ - height of tree is $O(n)$ => BAD

### 1.2 Using invariants to balance BSTs

Let's try the following invariant:

    *height of tree must be logarithmic*
    
![book](img/10balanced.png)

__Conclusion:__ This invariant is too restrictive. It only works if the number of nodes is $2n + 1$, for example: (1, 3, 7, 15...)

### 1.2 2-3 tree

A better solution is the a so-called __2-3 tree__, which is a __perfectly balanced BST__.

__Perfectly balanced__ means that every path from the root to each leaf has the same length.

Hence, the 2-3 tree data structure guarantees worst case __O(log n)__ time complexity for search and insert operations.

### __Invariant__: All childred of each node *always* have the same height

- A __2-node__ contains a __single value__ and has __two children__ (unleass it is a leaf) ![book](img/10twonode.png)

- A __3-node__ contains __two values__ and has __three children__ (unless it is a leaf) ![book](img/2threenode.png)


A __4-node__, with three data elements, may be temporarily created during manipulation of the tree but is never persistently stored in the tree. 

We say that $T$ is a 2–3 tree if and only if one of the following statements hold:

- $T$ is empty. In other words, $T$ does not have any nodes.
- $T$ is a 2-node with data element $a$ if $T$ has left child $p$ and right child $q$, then
    - $p$ and $q$ are 2–3 trees of the same height;
    - $a$ is greater than each element in $p$ and
    - $a$ is less than each data element in $q$.
- $T$ is a 3-node with data elements $a$ and $b$, where $a < b$ if $T$ has left child $p$, middle child $q$, and right child $r$, then
    - $p$, $q$, and $r$ are 2–3 trees of equal height;
    - $a$ is greater than each data element in $p$ and less than each data element in $q$ and
    - $b$ is greater than each data element in $q$ and less than each data element in $r$.
        
<div class="author">src: wikipedia.org</div>

##### Simplified properties

- every internal node is a 2-node or a 3-node
- all leaves are at the same level
- all data is kept in sorted order
- __the tree is always perfectly balanced!__

![dia2](img/1023example.png)
<div class="author">src: chalmers.instructure.com</div>

#### Exercise 1


Which of the following trees are 2-3 trees?
![dia2](img/10ex1.png)
<div class="author">src: chalmers.instructure.com</div>

##### Solution:

C

### 1.3 Operations

##### 1.3.1 Searching

When search a key $K$ in a given 2-3 tree $T$:

__Base cases__

1. return *False* if $T$ is empty
2. return *True* if current node contains value equal $K$
3. retrurn *False* if current node is leaf-node and node value is not equal to $K$

__Recursive calls__

1. if $K <$ currentNode.leftVal -> explore left subtree
2. else if currentNode $< K <$ currentNode.rightVal -> explore middle subtree
3. else if $K >$ currentNode.rightVal -> explore right subtree

##### Example: Searching for number 5
![dia2](img/10search.png)
<div class="author">src: geeksforgeeks.org</div>

##### 1.3.2 Inserting

There are 3 possible cases for insertion:

1. insert into a node with only __one__ data element
2. insert into a node with __two__ data elements whose parent contains only __one__ data element
3. insert into a node with __two__ data elements whose parent containse __two__ data elements

![](img/10insert.png)
<div class="author">src: CC BY-SA 4.0 Diaa abdelmoneim via wikimedia.org</div>

##### Splitting a temporary 4-node

A temporary 4-node is split into several 2-nodes, which will create an extra level in the tree. The *pink* node is then absorbed (elevated) into its parent.

![dia2](img/10split.png)
<div class="author">src: chalmers.instructure.com</div>

### 1.4 Summary

- 2-3 trees are *perfectly balanced*, which is enabled by allowing 3 children
- invariant is maintained using *absorption* and *splitting* (for 4-nodes)
- complexity is logarithmic
- conceptional simple
- implementation of deletion is complicated

## 2. Red-black tree

A red–black tree is a kind of self-balancing binary search tree. 

Each node stores an extra bit representing "color" ("red" or "black"), used to ensure that the tree remains balanced during insertions and deletions.

When the tree is modified, the new tree is rearranged and "repainted" to restore the coloring properties that constrain how unbalanced the tree can become in the worst case. The properties are designed such that this rearranging and recoloring can be performed efficiently. 

![dia1](img/10redblack.png)
<div class="author">src: CC BY-SA 4.0 Nomen4Omen via wikimedia.org</div>

### 2.1 Properties

- each internal node has two children
- each internal node has a color, such that:
    - the root is black
    - all leaf nodes (NIL nodes) are black
    - for each node, all paths to descendant leaves contain the same number of black nodes
    - if a node is red, then both its children are black
- the __null__ pointers in a BST are replaced by pointers to special null-vertices that do not carry any object-data

![dia1](img/10blackred2.png)
<div class="author">src: Ernst Mayr, in.tum.de</div>

Each node has the following attributes:

- color
- key
- leftChild
- rightChild
- parent (except root)

### 2.2 Rebalancing

Red-black tree is effective for search, insert, and delete operations. The re-balancing is not perfect, but guarantees searching in __O(log ⁡ n)__ time, where $n$ is the number of entries. The insert and delete operations, along with the tree rearrangement and recoloring, are also performed in __O(log ⁡ n)__ time.

|Type|Average|Worst case|
|:---|:---|:---|
|Space|O(log n)|O(n)|
|Search|O(log n)|O(log n)|
|Insert|O(log n)|O(log n)|
|Delete|O(log n)|O(log n)|

##### 2.2.1 Rebalancing by Rotation

In rotation operation, the positions of the nodes of a subtree are interchanged.

Rotation operation is used for maintaining the properties of a red-black tree when they are violated by other operations such as insertion and deletion. It is possible to "*left rotate*" and "*right rotate*".

##### Left Rotation

After rotating on node $x$, node $y$ will become the new root of the subtree and its left child will become $x$. The previous left child of $y$ will now become the right child of $x$.


![dia1](img/10leftrot.png)
![book](img/10leftrot.gif)
<div class="author">src: codesdope.com</div>

##### Right Rotation
![dia1](img/10rightrot.png)
![book](img/10rightrot.gif)
<div class="author">src: codesdope.com</div>

A 3-node becomes two BST nodes. 3-nodes are always translated into a node and its left child:

![dia2](img/10rbt.png)

<div class="author">src: chalmers.instructure.com</div>

### 2.4 Inserting into a Red-Black Tree
<div class="author">src: programiz.com</div>

New nodes are insert into a red-black tree in a similar way as in normal BSTs. The newly inserted node will be colored red. Doing so can violate some properties of red-black trees. Hence, we have to create and call a function to fix any kind of violations afterwards. 

However, inserting a black node would be more difficult since rotation and discoloration can be much easier on a path that contains two successive red nodes than on a path that contains an extra black node.

The new node will replace an existing leaf. A series of rotation(s) and recoloring(s) may have to occur until the tree is balanced.

__Algorithm to insert a node__

1. let `y` be the leaf (i.e. NIL) and `x` be the root of the tree.
2. check if the tree is empty. If yes, insert `newNode` as a root node and color it black.
3. else, repeat following steps until leaf (NIL) is reached:
    - compare `newKey` with `rootKey`
    - if `newKey` > `rootKey`, traverse through right subtree
    - else traverse through left subtree
4. assign the parent of the leaf as a parent of `newNode`
5. if `leafKey` > `newKey`, make `newNode` as `rightChild`
6. else, make `newNode` as `leftChild`
7. assign `NULL` to the `leftChild` and `rightChild` of `newNode`
8. assign RED color to `newNode`
9. call insertFix-Algorithm (below) to recolor/rotate tree to note violate any properties

![dia1](img/10rbinsert.png)
<div class="author">src: codesdope.com</div>

__insertFix Algorithm__

1. if root node does not exist, `newNode` ($N$) becomes the new root node and is colored black
2. if the `parentNode` ($P$) is black, no adjustments are necessary
3. if `parentNode` and `uncleNode` ($U$) are red, `parentNode` and `uncleNode` will be colored black. `grandfatherNode` ($G$) will be colored red. This adjustment will occur recursively.

![dia2](img/10ins1.png)


<div class="author">src: cc.edu.tw</div>

__insertFix Algorithm (continued)__

4. if `parentNode` is red AND `uncleNode` is either black OR nonexistent, `newNode` if `parentNode`'s left child AND `parentNode` is `grandfatherNode`'s left child, a right rotation will occur on `grandfatherNode`.
![dia2](img/10ins2.png)

5. if `parentNode` is red AND `uncleNode` is either black OR nonexistent, `newNode` if `parentNode`'s right child AND `parentNode` is `grandfatherNode`'s left child, a left rotation will be performed on `parentNode`. Then (4) is executed.
![dia2](img/10ins3.png)

6. Cases 3 to 5 are symmetric if `parentNode`is a right child of `grandfatherNode`.

<div class="author">src: cc.edu.tw</div>

##### Pseudocode
```
INSERT(T, n)
  y = T.NIL
  temp = T.root

  while temp != T.NIL
      y = temp
      if n.data < temp.data
          temp = temp.left
      else
          temp = temp.right
  n.parent = y
  if y==T.NIL
      T.root = n
  else if n.data < y.data
      y.left = n
  else
      y.right = n

  n.left = T.NIL
  n.right = T.NIL
  n.color = RED
  inserFix(T, n)
```

## 3. Tree Exercises 
#### Exercise 2

Insert items with the following keys into an initially empty binary search tree: 30, 40, 24, 58, 48, 26, 11, 13

##### Solution
![dia2](img/10ex2.png)


#### Exercise 3

Choose a set of 7 distinct, positive, integer keys. Draw binary search trees for your set of height 2 and 5.

##### Solution
![dia2](img/10ex3-1.png)
![dia2](img/10ex3-2.png)


#### Exercise 4

Are the following statements TRUE or FALSE? Justify your answer.

1. The subtree of the root of a red-black tree is always itself a red-black tree.


2. The sibling of a null child reference in a red-black tree is either another null child reference or a red node.

##### Solution

1. FALSE: The root of a red-black must be black, by definition. It is possible for the child of the root of a
red-black tree to be red. Therefore, it is possible for the subtree of the root of a red-black tree to have a red root,
meaning that it can not be a red-black tree.

2. TRUE: Let x represent the parent of the null reference and suppose x.right is the
null reference. Suppose x.left refers to a black node. Then, the number of black nodes on the path from the
root to the x.right null reference must be less than the number of black nodes from the root to all nodes in the
subtree rooted at x.left. This violates the definition of red-black tree. So, x.left must be a null reference or a red
node.
![dia1](img/10blackred2.png)
<div class="author">src: Ernst Mayr, in.tum.de</div>

#### Exercise 5

Draw the red-black tree that results after inserting the following integer keys into an initially empty red-black tree: 21, 32, 64, 75, 15.

Clearly show the tree at each step and indicate recolorings and rotations.

##### Solution
![](img/10ex5.png)
