# Fundamental Data Structures and Algorithms 05 - Trees

## Unit 3: Basic Data Structures (continued)
---

### Objective

- Introduce the tree data structure
 - Binary Search Tree
 - AVL Tree
---

### Recap
> What is a *linked list*?

![linked_list](https://i.ibb.co/gzXZfrm/Slide7.png)

---

## Trees

Trees in nature  
![image.png](https://i.ibb.co/vhgTqHH/trees-01b.png)

Trees in computers  
![image.png](https://i.ibb.co/JpkZqRB/trees-01b.png)

The *tree* basically resembles a family tree: parent and child  
![Family Tree](https://i.ibb.co/PMYYJr9/Slide15.png)

---

## Tree Structures and Terminologies

- with the exception of the first node, each node has a ***parent*** node and zero or more ***child*** nodes
- think of linked list but have multiple `next`'s
- In this course, at most 2 child nodes: ***left*** child and ***right** child
 - Such trees are called ***binary trees***

Binary Search Tree is a node-based binary tree data structure which has the following properties:  

The left subtree of a node contains only nodes with keys lesser than the node’s key.
The right subtree of a node contains only nodes with keys greater than the node’s key.
The left and right subtree each must also be a binary search tree. 
There must be no duplicate nodes.
 
![image.png](attachment:image.png)

The above properties of Binary Search Tree provides an ordering among keys so that the operations like search, minimum and maximum can be done fast. If there is no ordering, then we may have to compare every key to search for a given key.

![Tree_structures_and_terminologies](https://i.ibb.co/6ZrNLN6/Slide16.png)

- Top node is also known as the ***root***
 - used as an 'access point' into the tree data structure

- Each node can be the root of its own ***subtree***
 - Therefore, trees are recursive

- A node is ***external*** if it has no children
 - also refered to as a ***leaf*** node
- A node is ***internal*** if it has one or more children

- an ***edge*** of a tree is a pair of ndoes such that the pair have a parent-child relationship.
- a ***path*** is a sequence of nodes such that any two consecutive nodes in the sequence form an edge

- nodes in a tree are categorized into ***levels***
 - root node of the entire tree is at level 0
- depth of a tree is the distance from the root, the distance being the number of levels that separate the two

- the ***height*** of a tree is the number of levels in the tree

- the ***width*** of a tree is the number of nodes on the level containing the most nodes

- the ***size*** of a tree is the number of nodes within the tree

- ***empty tree*** has a height, width and size equal to 0

---

### Types of Binary Trees

- A ***full / proper*** binary tree is when each node has either 0 or 2 children.

- A ***complete*** binary tree is a binary tree where every level, except possibly the last, is completely filled and all nodes in the last level are as far left as possible.

- A ***perfect*** binary tree is a full binary tree in which all leaf nodes are at the same level. The perfect tree has all possible node slots filled from top to bottom with no gaps

![Full_and_Complete_Binary_Trees](https://i.ibb.co/CbS0My4/Slide17.png)

Tree 1 is a full binary tree but it is not proper as the the leaf nodes are not left aligned. Tree 2 is an not a full binary tree but it is a complete binary tree.

---

## Binary Search Tree

A binary search tree (BST) is an ordered binary tree with the following properties:
1. Every node has <u>at most two children</u>.
2. Most importantly, <u>the left child precedes a right child in the order of children of a node.</u> This also means that a binary search tree should never contain duplicate nodes.

**Binary Search Tree Implementation**

In [2]:
class Node: 
    def __init__(self, key): 
        self.left = None
        self.right = None
        self.val = key 
  
    # A utility function to insert a new node with the given key 
def insert(root, key): 
    if root is None: 
        return Node(key) # This becomes the root node as Node(key), i.e. constructor is called.
    else:
        if root.val == key: 
            return root 
        elif root.val < key: 
            root.right = insert(root.right, key) 
        else: 
            root.left = insert(root.left, key) 
    return root 
        
# A function to do inorder tree traversal 
def printInorder(root): 
    if root: 
        # First recur on left child 
        printInorder(root.left) 
  
        # then print the data of node 
        print(root.val)
  
        # now recur on right child 
        printInorder(root.right) 
        
# A function to do postorder tree traversal 
def printPostorder(root): 
    if root: 
        # First recur on left child 
        printPostorder(root.left) 
  
        # the recur on right child 
        printPostorder(root.right) 
  
        # now print the data of node 
        print(root.val),
  
  
# A function to do preorder tree traversal 
def printPreorder(root):   
    if root: 
        # First print the data of node 
        print(root.val)
  
        # Then recur on left child 
        printPreorder(root.left) 
  
        # Finally recur on right child 
        printPreorder(root.right) 
        
# A utility function to search a given key in BST 
def search(root,key):   
   
    # Base Cases: root is null
    if root is None:
        print(f"Element {key} does not exist")
   
    # key is present at root  
    elif root.val == key:
        print(f"Element {root.val} found in the Tree")
        return True
   
    # Key is greater than root's key
    elif root.val < key:
        return search(root.right,key)
   
    # Key is smaller than root's key
    elif root.val > key:
        return search(root.left,key)
   
    #key is not present
    else:
        print("Element not found in Tree")
        return False

![Binary_Search_Tree](https://i.ibb.co/nc3rXsd/Slide18.png)


To create the same structure as shown above, we can do the following:

In [3]:
r = Node(8) 
insert(r, 3) 
insert(r, 10) 
insert(r, 1) 
insert(r, 6) 
insert(r, 9) 
insert(r, 12)
insert(r, 4)
insert(r, 7)
insert(r, 11)
insert(r, 14) 

<__main__.Node at 0x1c58718fe08>

---

**Tree Traversal**

- a *traversal* of a tree is a systematic way of accessing, or "visiting", all the nodes of the tree.

![image.png](attachment:image.png)

#### Depth First Traversals:

(a) Inorder (Left, Root, Right) : 4 2 5 1 3<br> <mark>IN</mark>: <mark>Root</mark> is printed <mark>IN</mark>-between left and right.<br>
(b) Preorder (Root, Left, Right) : 1 2 4 5 3<br> <mark>PRE</mark>: <mark>Root</mark> is printed <mark>PRE</mark> of left and right.<br>
(c) Postorder (Left, Right, Root) : 4 5 2 3 1<br> <mark>POST</mark>: <mark>Root</mark> is printed <mark>POST</mark> of left and right.<br>

To traverse a tree <u>inorder, you print the left tree, then the root, and then the right tree.</u> To verify we can print the BST by invoking `printInorder(root)` method. This is one of the three tree traversals methods.

1. *Inorder* traversals of a BST will always give nodes in a sorted (ascending) order. 

In [4]:
print("\nInorder traversal of binary tree is")
printInorder(r)


Inorder traversal of binary tree is
1
3
4
6
7
8
9
10
11
12
14


What about `postOrder` and `preOrder`?
It may seem like it is merely printing the nodes in different orders but there are several use cases. You may think of the print function as a placeholder for other things i.e. it can be replaced by other functions depending on the application.

2. One of the use cases for the *preorder* traversal is that it is typically used to create a copy of the BST. If we were to invoke `printPreOrder(root)`, you will see that the order at which the nodes are printed can be used to create a copy of the BST <u>because the contents of the root appear before the contents of the children</u>.

Another interesting use case of a *preorder* traversal is used in this notes - in particular, accessing chapters or sub-chapter of a book or document.

![preorder_traversal](https://i.ibb.co/pQk5LJN/Slide25.png)

In [5]:
print("Preorder traversal of binary tree is")
printPreorder(r) 

Preorder traversal of binary tree is
8
3
1
6
4
7
10
9
12
11
14


3. On the other hand, *postorder* traversal is typically used to delete the tree. <u>If you print the subtrees first and then the root node, you get the order of traversal which is called postorder.</u>

In [6]:
print("\nPostorder traversal of binary tree is")
printPostorder(r) 


Postorder traversal of binary tree is
1
4
7
6
3
9
11
14
12
10
8


If we follow the nodes listed above we can see that the nodes being called, are all leaf nodes, which can be deleted without affecting the other parts of the tree. However, this is form of deletion is only applicable if the nodes we intend to delete are leaf nodes. To remove interior nodes, we have to consider a different approach. This will be left as an exercise for readers.

---

**Search, Minimum and Maximum Values**

Binary search trees are particularly useful for searching items. Searching in a BST follows a very simple rules:  
1. Start from the root node.
2. If target node < root node, search the left subtree.
3. If target node > root node, search the right subtree.
In fact, this can be seen in the `insert()` method.

**Example:** 

![Binary_Search_Tree](https://i.ibb.co/nc3rXsd/Slide18.png)

<u>Search for Node 11</u>  
We simply have to compare the node we are visiting with node $11$:
- moving to the right subtree if node $11$ is greater than the visited node or left subtree if is lesser than the visited node. 
  - Using this simple algorithm, we can see that we will visit the root node $8$, then node $10$, and then node $12$ before finally reaching the targeted node $11$.

In [None]:
search(r, 11)

<u>Search for Node 5</u>
- Starting from the root node, we move the left child (since $5$ is smaller than $8$) which is node $3$.
- We then repeat the comparison process by visiting node $6$ and finally node $4$.
- However, since node $5$ is greater than node $4$, we expect node $5$ to be the right child of node $4$ but we find that node $4$ is a leaf node thus we can return a *node not found* message or an error.

In [None]:
search(r,5)

Based on the same algorithm, we can easily find the minimum and maximum values contained in the BST. To reach the minimum value, all we have to do is just recursively visit the left subtree until a leaf node is reached and thus we can ignore step 3 entirely. The maximum value can also be found in a similar fashion. 

In [None]:
def minValue(root): 
     
    # loop down to find the lefmost leaf 
    while(root.left is not None): 
        root = root.left 
  
    return root.val 

minValue(r)

In [None]:
def maxValue(root): 
    # loop down to find the lefmost leaf 
    while(root.right is not None): 
        root = root.right
  
    return root.val 

maxValue(r)

---

## Efficiency of Binary Search Trees

- Assume tree contains *n* nodes.
- In searching a target node, the function starts at the root node and works its way down into the tree until either the node is located or a null link is encountered. 
- The worst case time for the search operation depends on the number of nodes  $n$ that have to be examined. Think of Tree made up of only single children (linked list)
- Similar argument for insertion and traversal
- Therefore, each operation in BST has a time complexity of O(n)

*Class Discussion*
> So what's the point of BST?

Answer:  
BST is only useful if the tree is <u>balanced</u>.

---

## AVL Tree

- As mentioned, worst case for each operation in BST is O(n) (linked list)
- However, best case for BST is O($\log{n}$) (balance tree)
- How can we achieve this?
 - Answer: build a BST with height equals to $\log{n}$)
 - Next question is How?
 - Solution: AVL tree 
 
AVL refers to the inventors:
- Adel'son-Verlskii, G. M.
- Landis, Y. M.

What is AVL tree?
- an improved version of BST
- always generate a tree is height balanced
- this significantly improves the efficiency of operations on the tree

*Recall*
> What is considered a **balanced** tree?  
<u>Answer: If the heights of left and the right subtrees of every node differ by at most 1</u>

**Balance Factor**
- indicates the height difference between left and right sub-trees
- the BF can be one of three states:
 1. left high - When the left subtree is higher than the right subtree.
 2. equal high - When the two subtrees have equal height.
 3. right high - When the right subtree is higher than the left subtree.  

- BF illustrated by the following symbols:
 - $>$ for left high state
 - $=$ for the equal high state
 - $<$ for a right high state.
 - When a node is out of balance, we will use either $<<$ or $>>$ to indicate which subtree is higher.

---  

## Efficiency of AVL Trees

- The search and traversal operations are the same with an AVL tree as with a binary search tree i.e. O($n$)
- The insertion and deletion operations have to be modified in order to maintain the balance property of the tree as new nodes are inserted and existing ones removed. - By maintaining a balanced tree, we ensure its height never exceeds O($\log{n}$) time operations even in the worst case.

---

**Insertions**

- begins with same process used as BST
 - search for the new node in the tree
 - add node where we 'fall off' the tree (i.e. reached leaf node)
- difference is in after insertion
 - must check if tree is balance after insertion
 - if unbalanced, have to rebalance

**Example 1:**  
![AVL_insertion](https://i.ibb.co/D1ZZrcq/Slide27.png)  

- Inserting node 120 to tree (a)
- Figure shows a simple insertion into an AVL tree: (a) original; and (b) with node 120 inserted.
- tree remains balanced since the insertion does not change the height of any subtree
- does cause a change in the balance factors
- After the node is inserted, the balance factors have to be adjusted in order to determine if any subtree is out of balance.
- There is a limited set of nodes that can be affected when a new node is added. This set is limited to the nodes along the path to the insertion point.
- Tree (b) also shows the new balance factors after node 120 is added.

**Example 2:**  
![AVL_insertion_unbalance](https://i.ibb.co/tLffwqR/Slide28.png)

- Inserting node 28 to tree (b)
- new node is inserted as the left child of node 30
- When the balance factors are recalculated, we can see all of the subtrees along the path that are above node 30 are now out of balance
- correct the imbalance by *rotating* the subtree rooted at node 35. 

---

**Rotations**

- Any binary search tree containing at least two nodes can be drawn in one of the two forms:  

![tree rotations](https://i.ibb.co/V2LxKxf/Slide26.png)

- B and D are the required two nodes to be rotated
- A, C and E are binary search sub-trees (any of which may be empty)
- any nodes in sub-tree A would simply be shifted up the tree by a right rotation, and any nodes in sub-tree E would simply be shifted up the tree by a left rotation.

 - Multiple subtrees can become unbalanced after inserting a new node, all of which have roots along the insertion path. 
  - But only one will have to be rebalanced: the one deepest in the tree and closest to the new node.
 - After inserting the node, the balance factors are adjusted during the unwinding of the recursion.
 - The first subtree encountered that is out of balance has to be rebalanced.
 - The root node of this subtree is known as the *pivot node*.
 - An AVL subtree is rebalanced by performing a rotation around the pivot node.
 - This involves rearranging the links of the pivot node, its children, and possibly one of its grandchildren
 - The actual modifications depend on which descendant's subtree of the pivot node the new node was inserted into and the balance factors.

There are 4 possible cases:

---

**Case 1**

- occurs when the balance factor of the pivot node (P) is left high before the insertion and the new node is inserted into the left child (C) of the pivot node.

![tree_rotations_case_1](https://i.ibb.co/BLj6Fq8/Slide29.png)

- To rebalance the subtree, the pivot node has to be rotated right over its left child.
- The rotation is accomplished by changing the links such that P becomes the right child of C and the right child of C becomes the left child of P.

---

**Case 2**

- involves three nodes: the pivot (P), the left child of the pivot (C), and the right grandchild (G) of P
- For this case to occur, the balance factor of the pivot is left high before the insertion and the new node is inserted into either the right subtree of C

![tree_rotations_case_2](https://i.ibb.co/PZ26Bhv/Slide30.png)

- Node C has to be rotated left and the pivot node has to be rotated right.
- The link modifications required to accomplish this rotation include
 - setting the right child of G as the new left child of the pivot node,
 - changing the left child of G to become the right child of C, and
 - setting C to be the new left child of G.

---

**Case 3 and 4**

- The third case is a mirror image of the first case and the fourth case is a mirror image of the second case.
- The difference is the new node is inserted in the right subtree of the pivot node or a descendant of its right subtree.

---  

**More on Balance Factors**

- When a new key is inserted into the tree, the balance factors of the nodes along the path from the root to the insertion point may have to be modified to reflect the insertion.

- The balance factor of a node along the path changes if the subtree into which the new node was inserted grows taller.

- The new balance factor of a node depends on its current balance factor and the subtree into which the new node was inserted.

- The resulting balance factors are provided here:

| Current BF | New BF<br />(after inserting into left subtree) | New BF<br />(after inserting right subtree) |
| ---------- | ----------------------------------------------- | ------------------------------------------- |
| >          | >>                                              | =                                           |
| =          | >                                               | <                                           |
| <          | =                                               | <<                                          |

- Modifications to the balance factors are made in reverse order as the recursion unwinds.

- When a node has a left high balance and the new node is inserted into its left child or it has a right high balance and the new node is inserted into its right child, the node is out of balance and its subtree has to be rebalanced.

- After rebalancing, the subtree will shrink by one level, which results in the balance factors of its ancestors remaining the same.

- The balance factors of the ancestors will also remain the same when the balance factor changes to equal high.

- After a rotation is performed, the balance factor of the impacted nodes have to be changed to reflect the new node heights.

- The changes required depend on which of the four cases triggered the rotation.

- The balance factor settings in cases 2 and 4 depend on the balance factor of the original pivot nodes grandchild (the right child of node L or the left child of node R).

- The new balance factors for the nodes involved in a rotation are provided below:

| Case | original G      | new P           | new L           | new  R          | new G           |
| ---- | --------------- | --------------- | --------------- | --------------- | --------------- |
| 1    | .               | =               | =               | .               | .               |
| 2    | ><br />=<br />< | <<br />=<br />= | =<br />=<br />> | .<br />.<br />. | =<br />=<br />= |
| 3    | .               | =               | .               | =               | .               |
| 4    | ><br />=<br />< | =<br />=<br />= | .<br />.<br />. | =<br />=<br />= | <<br />=<br />> |

---

**Example**

Figure below illustrates the construction of an AVL tree by inserting the nodes from the list [60, 25, 35, 100, 17, 80], one node at a time. Each tree in the figure shows the results after performing the indicate operation. Two double rotations are required to construct the tree: one after node 35 is inserted and one after node 80 is inserted.

| ![Building AVL tree 1](https://i.ibb.co/S34tsdn/Slide31.png) |
| :----------------------------------------------------------: |
| ![Building AVL tree 2](https://i.ibb.co/Qvr8Fqm/Slide32.png) |
| ![Building AVL tree 3](https://i.ibb.co/jL7TLvb/Slide33.png) |
| Fig 3.18. Building an AVL tree from the list of keys [60, 25, 35, 100, 17, 80]. |

---

**Deletions**

- When an entry is removed from an AVL tree, we must ensure the balance property is maintained.
- As with the insert operation, deletion begins by using the corresponding operation from the binary search tree.
- After removing the targeted entry, subtrees may have to be rebalanced.

**Example**

![Building AVL tree 1](https://i.ibb.co/KDwVNsv/Slide34.png)

**Remove node 17 from AVL tree (a)**

- After removing the leaf node, the subtree rooted at node 25 is out of balance, as shown in AVL tree (b)

- As with an insertion, the only subtrees that can become unbalanced are those along the path from the root to the original node containing the target.

- Remember, if the key being removed is in an interior node, its successor is located and copied to the node and the successor's original node is removed. 

- In the insertion operation, at most one subtree can become unbalanced.
 - After the appropriate rotation is performed on the subtree, the balance factors of the node's ancestors do not change.
 - Thus, it restores the height-balance property both locally at the subtree and globally for the entire tree. 
<br>
<br>
- This is not the case with a deletion.
 - When a subtree is rebalanced due to a deletion, it can cause the ancestors of the subtree to then become unbalanced.
 - This effect can ripple up all the way to the root node.
<br>
<br>
- Therefore, all of the nodes along the path have to be evaluated and rebalanced if necessary.