In [2]:
%%html
<style>h1{text-align:center;}h1{text-transform:none;}.rendered_html h4{color:#17b6eb;font-size: 1.6em;}img[alt=dia1]{width:35%;}img[alt=book]{width:20%;font-size: 3em;}img[alt=dia2]{width:50%;}.author{font-size:8px;}</style>

# Lecture 9: Trees

## 0. Review - Sets and Maps
<div class="author">src: chalmers.instructure.com</div>

### 0.1 Sets

A __set__ is a collection of items, in which duplicates are not allowed.

$$\{1, 3, 5, 6, 8, 9, 10\}$$

Basic operations include:

- adding items
- removing items
- checking if items are there
- looping through all items

### 0.2 Maps

A __map__ is a *set* of keys, in which each key is associated with a value (a set of *key-value pairs*)

$$\{1 → "strawberry", 4 → "banana", 8 → "apple", ...\}$$

Basic operations include:

- adding key-value pairs
- removing entry by passing the key
- check if key is there and also which value it is associated with
- looping through all keys

### 0.2.1 Maps are useful!

Maps can be used in:

- __Databases__: to look up an address with a person's name: the person's name is mapped to the address

- __File Systems__: to find a file with its name: name mapped to file content

- __Search Engines__: to find a website with a word or phrase: word is mapped to containing pages

- __etc__: everywhere!

### 0.2.2 Maps naming

Maps are referred to by various names:
    
    - Symbol tables
    - Dictionaries (Python)
    - Associative arrays
    
In Python maps are natively built in as dictionaries. 

In [3]:
dict = {"501ALGO": "Algorithmen und Datenstrukturen"}

dict["501ALGO"]

'Algorithmen und Datenstrukturen'

### 0.2.3 Implementing maps

Maps can be implemented using:
    
- A __dynamic array of key-value pairs__ (search complexity: O(n))
- A __linked list__ (search complexity: O(n))
- A __sorted dynamic array__ (search complexity of binary search: O(log n))

#### Exercise 1: 
Is it possible to create a map where searching takes logarithmic time?

##### Solution

Yes, using trees! (See next chapter)

## 1. Trees
<div class="author">src: wikipedia.org</div>

- In computer science, a __tree__ is a widely used abstract data type that represents a __hierarchical__ tree structure with a set of connected nodes. 

- Each node in the tree can be connected to many children (depending on the type of tree), but must be connected to exactly one parent, except for the root node, which has no parent. 

- These constraints mean there are no cycles or "loops" (no node can be its own ancestor), and also that each child can be treated like the root node of its own subtree, making recursion a useful technique for tree traversal. 

- In contrast to linear data structures, many trees cannot be represented by relationships between neighboring nodes in a single straight line. 

### 1.1. Tree Properties
<div class="author">src: Problem Solving with Algorithms and Data Structures Using Python, N. Miller</div>

- Hierarchical
- Child nodes are all independent
- Path to leaf nodes are unique
- Subtrees


### 1.2 Tree Examples

![dia2](img/9example1.jpg)
<div class="author">src: Problem Solving with Algorithms and Data Structures Using Python, N. Miller</div>

![dia2](img/9example2.jpg)
<div class="author">src: Problem Solving with Algorithms and Data Structures Using Python, N. Miller</div>

![dia2](img/9example3.jpg)
<div class="author">src: Problem Solving with Algorithms and Data Structures Using Python, N. Miller</div>

### 1.3 Terminology

- __Node:__ A node is a fundamental part of a tree. It can have a name, which we call the “key"

- __Edge:__ An edge connects two nodes to show that there is a relationship between them (incoming/outgoing)

- __Root:__ The root of the tree is the only node in the tree that has no incoming edges

![dia1](img/9tree1.webp)
<div class="author">src: programiz.com</div>

- __Path:__ A path is an ordered list of nodes that are connected by edges

- __Children:__ The set of nodes c that have incoming edges from the same node to are said to be the children of that node

- __Parent:__ A node is the parent of all the nodes it connects to with outgoing edges

- __Sibling:__ Nodes in the tree that are children of the same parent are said to be siblings

- __Subtree:__ A subtree is a set of nodes and edges comprised of a parent and all the descendants of that parent

- __Leaf Node:__ A leaf node is a node that has no children


- __Level/Depth:__ The level of a node n is the number of edges on the path from the root node to n

- __Height of a Tree:__ The height of a tree is equal to the maximum level of any node in the tree

![dia1](img/9tree2.webp)
<div class="author">src: programiz.com</div>

- __Height of a Node:__ The height of a node is the number of edges from the node to the deepest leaf (ie. the longest path from the node to a leaf node)

- __Degree of a Node:__ The degree of a node is the total number of branches of that node.

- __Forest:__ A collection of disjoint trees is called a forest.

![dia1](img/9tree3.webp)
<div class="author">src: programiz.com</div>

### 1.4 Requirements

For a collection of nodes to be a tree:
    
- each node must have a __unique parent__, except for the root
- one node of the tree is designated as the __root node__
- there must be exactly __one root node__ - every node n, except the root node, is connected by an edge from exactly one other node p, where p is the parent of n
- a unique path traverses from the root to each node
- __cycling__ is not allowed

#### Exercise 2:

Which of the following collection of nodes are trees or non-trees?
![](img/9ex2.jpg)



##### Solution

- A: Non-tree - two non-connected parts, more than one root
- B: Non-tree - cycling, more than one parent
- C: Non-tree - cycling, more than one parent
- D: Tree
- E: Non-tree - cycling, root and parent at the same time

### 1.5 Recursive Definition

A tree $T$ is either empty or it consists of a root and zero or more non-empty subtrees $T_1, T_2,...,T_k$ each of whose roots are connected by an edge from the root.

![dia2](img/9recursivedefinition.jpg)
<div class="author">src: Problem Solving with Algorithms and Data Structures Using Python, N. Miller</div>

### 1.6 Tree Traversal

Traversing a tree means visiting every node in the tree.

Unlike linear data structures, which have only one logical way to traverse them, trees can be traversed in different ways.

###### 1.6.1 Inorder Traversal

1. visit all nodes in the left subtree
2. visit the root node
3. visit all nodes in the right subtree

```
inorder(root->left)
display(root->data)
inorder(root->right)
```

###### 1.6.1 Preorder Traversal

1. visit the root node
2. visit all nodes in the left subtree
3. visit all nodes in the right subtree

```
display(root->data)
preorder(root->left)
preorder(root->right)
```

##### 1.6.3 Postorder Traversal

1. visit all nodes in the left subtree
2. visit all nodes in the right subtree
3. visit the root node

```
postorder(root->left)
postorder(root->right)
display(root->data)
```

### 1.7 Implementation

##### Example: Inorder Tree Traversal

![book](img/9inordertraversal.png)
<div class="author">src: maxnilz.com</div>

First, left subtree, then root node, then right subtree.

![](img/9inorderstack.jpg)
<div class="author">src: programiz.com</div>

First left subtree, then root node, then right subtree. To keep track, everything is put into a __stack__. Then the subtree pointed on the top of the stack is traversed.  

In [4]:
# Tree traversal in Python, source: programiz.com
class Node:
    def __init__(self, item):
        self.left = None
        self.right = None
        self.val = item

def inorder(root):
    if root:
        # Traverse left
        inorder(root.left)
        # Traverse root
        print(str(root.val) + "->", end='')
        # Traverse right
        inorder(root.right)

def postorder(root):
    if root:
        # Traverse left
        postorder(root.left)
        # Traverse right
        postorder(root.right)
        # Traverse root
        print(str(root.val) + "->", end='')

def preorder(root):
    if root:
        # Traverse root
        print(str(root.val) + "->", end='')
        # Traverse left
        preorder(root.left)
        # Traverse right
        preorder(root.right)


root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.left = Node(4)
root.left.right = Node(5)

print("Inorder traversal ")
inorder(root)

print("\nPreorder traversal ")
preorder(root)

print("\nPostorder traversal ")
postorder(root)

Inorder traversal 
4->2->5->1->3->
Preorder traversal 
1->2->4->5->3->
Postorder traversal 
4->5->2->3->1->

#### Exercise 3:

In [5]:
 '''
 Using the class above, construct the following tree
               1
             /   \
            /     \
           2       3
          /      /   \
         /      /     \
        4      5       6
              / \
             /   \
            7     8
     
and perform an inorder, preorder and postorder traversal   
'''

'\nUsing the class above, construct the following tree\n              1\n            /              /               2       3\n         /      /           /      /            4      5       6\n             /             /              7     8\n    \nand perform an inorder, preorder and postorder traversal   \n'

In [6]:
#Solution
root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.left = Node(4)
root.right.left = Node(5)
root.right.right = Node(6)
root.right.left.left = Node(7)
root.right.left.right = Node(8)
 
inorder(root)
print("\n")
preorder(root)
print("\n")
postorder(root)

4->2->1->7->5->8->3->6->

1->2->4->3->5->7->8->6->

4->2->7->8->5->6->3->1->

## 2. Binary Search Trees (BST)
<div class="author">src: wikipedia.org</div>

- In computer science, a binary search tree (BST), also called an ordered or sorted binary tree, is a rooted binary tree data structure whose internal nodes each store a key greater than all the keys in the node's left subtree and less than those in its right subtree. 

- The time complexity of operations on the binary search tree is directly proportional to the height of the tree.

- Binary search trees allow binary search for fast lookup, addition, and removal of data items, and can be used to implement dynamic sets and lookup tables. 

### 2.1 Properties

- each tree node has a maximum of two children

- it can be used to search for a number in $O(log(n))$ time

- all nodes of the left subtree are less than the root node

- all nodes of the right subtree are more than the root node

- both subtress of each node are also BSTs

![](img/9bst.jpg)
<div class="author">src: chalmers.instructure.com</div>

### 2.2 Searching BSTs

Searching for a key target in a BST:

- if the tree is empty: FAIL
- if the target matches to root node's key: FOUND
- if the target is *less* than the root node's key, recursively search the left subtree
- if the target is *greater* than the root node's key, recursively search the right subtree

![dia1](img/9bstsearch.jpg)
<div class="author">src: techiedelight.com</div>

### 2.2 Inserting in BSTs

Inserting a key-value pair in a BST:

- start by searching for the key
- if *NULL* (empty tree), make a node for the key-value pair and place it there
- if key already exists: UPDATE

![dia1](img/9bstinsert.png)
<div class="author">src: techiedelight.com</div>

### 2.3 Finding min/max in BSTs

Finding maximum key in a BST:

- repeatedly go right from the root
- when reaching node whose right child is null: MAX FOUND

### 2.4 Deleting a node from a BST

__CASE 1__: Deleting a leaf (node with no children)

- remove the leaf



![dia1](img/9bstdel1.png)
<div class="author">src: techiedelight.com</div>

__CASE 2__: Deleting a node with one child

- replace the node with its child



![dia1](img/9bstdel3.png)
<div class="author">src: techiedelight.com</div>

__CASE 3__: Deleting a node with two children

- find and delete the *biggest* key in the *left subtree* 
- overwrite the node with the key we just deleted
- to find the biggest key: repeatedly descend into the right child until you find a node without a right child
- the biggest key has no right child, deleting is simple



![dia1](img/9bstdel2.png)
<div class="author">src: techiedelight.com</div>

### 2.5. BST Performance

![dia2](img/9bstbalanced.png)
<div class="author">src: chalmers.instructure.com</div>

__Best Case = "balanced"__ - height of tree is $O(\log n)$

__Average Case = "random order"__ - if keys are added in random order, height of tree is $O(\log n)$

__Worst Case = "unbalanced"__ - height of tree is $O(n)$ => BAD