# TCSS503 - Week 1 Tree Review

In this simple interactive tutorial, we will create a Binary Search Tree (BST) using two underlying representations.

Before we do, we need to remind ourselves what a Binary Search Tree is, what properties it has, what is valuable about those properties, and how do we interact with a BST.

## Binary Search Tree

A Binary Search Tree is a special tree with the following special property:
* For each node in the tree all nodes in its **left** subtree are less than its value.
* For each node in the tree all nodes in its **right** subtree are greater than or equal to its value.

_Note: The left side can be less than or equal to (inclusive), and the right side can be greater than (exclusive).  That detail is left up to the implementation.  The right can also be lesser, and left be greater.  As long as it is consistent the value from these properties holds.

Given this property, we can search for any value in the tree using Binary Search, a $O(log n)$ algorithm that quickly locates a particular item in the tree.

### Common Methods
The most common methods for a BST are `put` or `insert` where a value is added to the data structure.  The second common method is `get` or `search` where data is pulled out of the data structure.  Lastly, `remove` or `delete` will first locate a particular value and remove it from the tree.  Details of the implementation will be explained in each section.


### Representations
There are two most common representations of a Binary Search Tree, `Dynamic` and `Sequential`.  A `Dymanic` representation uses Nodes connected through parent and child links.  A `Sequential` implementation of a tree uses an _Array_ of objects representing the complete structure of the tree.  Each method has advantages.  A `Sequential` implementation has random access to all nodes via index and has less overhead in maintaining links.  However, it assumes a complete tree, so for sparse trees, more memory is consumed than is needed.  The `Dynamic` representation only uses the memory needed for the number of nodes in the tree but tree traversal algorithms are required to scan the entire data structure if that is ever needed.


You may have a tree representation that looks something like this in a Dynamic structure, where each integer represents a node and the lines repreent the links for left and right.

        10
       /  \
      5    11
     / \     \
    3   6     15
    
 The sequential representation would like like this, where index 0 is the root, 1 and 2 are the 1st level, and 3-6 are the subsequent levels.  Empty nodes are represnted by None or null.


    +--------+-----+-----+-----+-----+-----+------+-----+
    | Index  | 0   | 1   | 2   | 3   | 4   | 5    | 6   |
    +--------+-----+-----+-----+-----+-----+------+-----+
    | Value  | 10  | 5   | 11  | 3   | 6   | NULL | 15  |
    +--------+-----+-----+-----+-----+-----+------+-----+




# BST using Dynamic (Node-based) Representation 

## Creating the Class

**Note** we will normally construct methods inside class and reference "self" but for purposes of breaking down the code into cells for Jupyter we will be passing an instantiated object into functions to perform the necessary manipulations.  In a regular python script, put the functions into the class and change `bst` to `self`

As a side note, this kind of programming, where a common data structure is passed into functions like this is still Object Oriented Programming.  It is how you can do OOP in languages that don't natively support classes (e.g. $C$)


## Node Representation (Subclass) Representation (TCSS501 Review)

The dynamic representation requires a node object, it can be easily implemented as a subclass as shown below.

In [9]:
class BinarySearchTreeDynamic:

    class TreeNode:
        def __init__(self, data):
            self.parent = None
            self.data = data
            self.left = None
            self.right = None

    def __init__(self, max_children=2):
        self.root = None


We are going to instantiate a bst and use it in susequential functions. 

In [10]:
bst = BinarySearchTreeDynamic()

## Insert (Dynamic Structure)

When inserting, starting with the root node, if it is none (the empty tree) simply set the root and be done.  If there is data in root, compare the new data to the data in root.  If it is less than root, traverse left, otherwise traverse right.  Continue comparing until you find a child node that does not exist (e.g. `None`).  Set that child node to the new node and return.

In [11]:
    def insert(bst, data):
        n = BinarySearchTreeDynamic.TreeNode(data)
        curr = bst.root
        if bst.root is None:
            bst.root = n
            return

        while True:
            if n.data < curr.data:
                if curr.left is None:
                    n.parent = curr
                    curr.left = n
                    return
                else:
                    curr = curr.left
            else:
                if curr.right is None:
                    n.parent = curr
                    curr.right = n
                    return
                else:
                    curr = curr.right


## Testing the Insert Statement.  If we insert correctly we should see the following

If we insert the following nodes in this order:
[10, 5, 20, 25, 15] we should see the following behavior:


- `bst.root.data` will be 10
- `bst.root.left.data` will be 5
- `bst.root.right.data` will be 20
- `bst.root.right.right.data` will be 25
- `bst.root.right.left.data` will be 15.


        10
       /  \
      5    20
          /  \
        15    25


In [12]:
bst = BinarySearchTreeDynamic()
inputs = [10, 5, 20, 25, 15]

for input in inputs:
    insert(bst, input)

print(f"bst.root.data = {bst.root.data}")
print(f"bst.root.left.data = {bst.root.left.data}")
print(f"bst.root.right.data = {bst.root.right.data}")
print(f"bst.root.right.right.data = {bst.root.right.right.data}")
print(f"bst.root.right.left.data = {bst.root.right.left.data}")

bst.root.data = 10
bst.root.left.data = 5
bst.root.right.data = 20
bst.root.right.right.data = 25
bst.root.right.left.data = 15


## STUDENT EXAMPLE

---
<span style="color:green">
    Using the below sample code, insert some records and determine where they will land in the data structure.
    Replace #'s with proper syntax.  Note the cell will NOT execute unless you update the code properly.
</span>

---

In [13]:
student_inputs = [#,#,#,#]

for input in student_inputs:
    insert(bst, input)

print(f"bst.root.#####.data = {bst.root.#####.data}")
print(f"bst.root.#####.data = {bst.root.#####.data}")
print(f"bst.root.#####.data = {bst.root.#####.data}")
print(f"bst.root.#####.data = {bst.root.#####.data}")
print(f"bst.root.#####.data = {bst.root.#####.data}") 
    

SyntaxError: invalid syntax (<ipython-input-13-716a7fe0e8f1>, line 3)

## Search (Dynamic Structure)

Searching follows a similar pattern to inserting where you begin at the root and traverse right or left until you find the value you are searching for, or reach the end of the tree.

In [14]:
    def search(bst, data):
        curr = bst.root

        while True:
            if curr is None:
                return None
            elif curr.data == data:
                return data
            elif curr.data > data:
                curr = curr.left
            else:
                curr = curr.right

## Testing Search

Lets insert a few values and search for their results.  We should test for when the tree is empty, as well as for data that is not in the tree.

In [15]:
bst = BinarySearchTreeDynamic()
r = search(bst, 10)
print(f"Testing the empty case.  This should be None: r = {r}")

Testing the empty case.  This should be None: r = None


In [16]:
inputs = [10, 5, 20, 25, 15]

for input in inputs:
    insert(bst, input)

queries = [10,15,15,20,21]
for query in queries:
    r = dsearch(bst,query)
    print(type(r))
#     print(f"Searching for {query}, found {type(search(bst,query)}")

NameError: name 'dsearch' is not defined

## STUDENT EXAMPLE

---
<span style="color:green">
Using the below code, insert some data and search for it in a brand new BST.  There are more ###'s marked out   below than previous examples. See if you can figure out what needs to go where!
</span>

---

In [17]:
student_bst = ###################

inputs = [######]
    
for ##### in ########:
    insert(####,####)

queries = [#######]

for ### in ####:
    r = search(####, ####)
    print(#############)


SyntaxError: invalid syntax (<ipython-input-17-f6c0201f28fd>, line 1)

## Removal (Dynamic Structure)
Removal of nodes is more difficult than simple insertion, because the tree needs to maintain its properties, and you may be removing a node from the anywhere in the free. A leaf node (no children), a node with a single child, or a node with two.

- **No Children:** This one is easy, simply delete the node.
- **One Child:** This one is almost as easy as no children. Simply set the links to the deleted nodes parent and child nodes to point to one another.
- **Two Children:** A little less intuitive, but replace the node with the next largest successor.

It should go without saying, but before you can delete a node you need to search for a node, so you know which to delete.  We are going to use the same algorith for search, but rather than getting the "data" out fo the search, we're going to return a pointer to the node object.  That will allow us access to it's parent and left and right children.

That search will be followed by case statement to determine how many children it has.  Based on the child count we will perform the necessary actions.

In [18]:
def node_search(bst, data):
    curr = bst.root

    while curr is not None:
        if curr.data == data:
            return curr
        elif curr.data > data:
            curr = curr.left
        else:
            curr = curr.right

In [19]:
    def remove(bst, data):

        node = node_search(bst, data)

        # IF NO NODE IS FOUND, DO NOT DELETE AND DO NOTHING
        if node is None:
            return False        

        parent = node.parent  # Thanks Tom for catching the bug.
        # CAPTURE CHILD COUNT
        children = 0
        if node.left and node.right:
            children = 2
        elif node.left or node.right:
            children = 1

        if children == 0:  # NO CHILDREN, JUST DELETE THE NODE FROM ITS PARENT
            if parent is None:  # THIS IS JUST THE ROOT NODE W/O CHILDREN, DELETE IT
                bst.root = None
            elif parent.right is node:
                parent.right = None
            else:
                parent.left = None

        elif children == 1:  # SINGLE CHILD, JUST BYPASS IT
            next_n = None
            if node.left:
                next_n = node.left
            else:
                next_n = node.right

            if parent is None:  # THIS IS THE ROOT NODE, SPECIAL CASE
                bst.root = next_n
            elif parent.left is node:
                parent.left = next_n
            else:
                parent.right = next_n

        else:  # TWO CHILDREN - TRAVERSE TO NEXT MOST SUCCESSOR, SWAP DATA AND DELETE THE SUCCESSOR NODE
            left_parent = node
            leftmost_node = node.right
            while leftmost_node.left:
                left_parent = leftmost_node
                leftmost_node = leftmost_node.left

            node.data = leftmost_node.data

            if left_parent.left == leftmost_node:
                left_parent.left = leftmost_node.right
            else:
                left_parent.right = leftmost_node.right


## Testing Remove

Time is short at the time of this writing, so this iteration we will only show that removing something with two children works as it is the most complicated.  When we construct the same tree we have done before we will have 10 as the root, and 15 will be its next largest successor.  We should see if we remote the root, that 15 is in its place.  We will also see that its previous position `bst.root.right.left` is now `None`

In [20]:
bst = BinarySearchTreeDynamic()
inputs = [10, 5, 20, 25, 15]

for input in inputs:
    insert(bst, input)

print(f"bst.root.data = {bst.root.data}")
print(f"bst.root.left.data = {bst.root.left.data}")
print(f"bst.root.right.data = {bst.root.right.data}")
print(f"bst.root.right.right.data = {bst.root.right.right.data}")
print(f"bst.root.right.left.data = {bst.root.right.left.data}")

bst.root.data = 10
bst.root.left.data = 5
bst.root.right.data = 20
bst.root.right.right.data = 25
bst.root.right.left.data = 15


In [21]:
remove(bst,10)

In [22]:
print(f"Root is now: {bst.root.data}")
print(f"bst.root.right.left is now: {bst.root.right.left}")

Root is now: 15
bst.root.right.left is now: None


# BST using Sequential (Array-based) Representation 

## Creating the Class

We will create a class like we did above that contains just the data and will use functions outside fo the class to pass in the data to allow for breaking down code for Jupyter

## Node Representation (Subclass) Representation (TCSS501 Review)

The sequential representation contains an initial size, as it has to grow over time.  The default for this example is 64.  We will resize it as the tree grows.

I'm including a few helper functions in this cell as well that will make the code later a little more clear.

To calculate the left child of a node based on the index, a simple expression of $2 * index + 1$ can be used.  It follows that the right child will be only one index to the right of the left child and thus $2 * index + 2$ can be used.

In [23]:
class BinarySearchTreeSequential:

    def __init__(self, init_size=64):
        self.data = [None] * init_size
        self.count = 0
        
    def __expand__(self):
        """ Replicates the existing data, doubling in size, inserting None into the newly allocated memory."""
        self.data = self.data + [None] * len(self.data)

    def __get_idx_of_left__(self, index):
        return 2 * index + 1

    def __get_idx_of_right__(self, index):
        return 2 * index + 2
    
    def __get_idx_of_parent__(self, index):
        """ Returns the parent of a given index.  Returns -1 for root."""
        offset = 1 if index % 2 == 1 else 2
        return (index - offset) // 2

## Insert (Sequential Structure)

When inserting in the sequential structure, the logic is similar to the dynamic structure.  However, rather than following parent and child links, you simply traverse directly to the specific location in the array based on the index of the current element.

There is one item to remember, because we are dealing with fixed allocated memory, the new location for a child may be outside of the array's range.  If that is the case, we must expand the data structure prior to inserting the element.

When inserting, starting with the root node, if it is none (the empty tree) simply set the root and be done.  If there is data in root, compare the new data to the data in root.  If it is less than root, traverse left, otherwise traverse right.  Continue comparing until you find a child node that does not exist (e.g. `None`).  Set that child node to the new node and return.

In [24]:
    def insert_s(bst, data):
        if bst.data[0] is None:
            bst.data[0] = data
        else:
            idx = 0
            while bst.data[idx] is not None:
                if data < bst.data[idx]:
                    idx = bst.__get_idx_of_left__(idx)
                else:
                    idx = bst.__get_idx_of_right__(idx)

                if idx > len(bst.data) - 1:  # YOU HAVE REACHED CAPACITY, EXPAND BEFORE INSERTING
                    bst.__expand__()

            bst.data[idx] = data
            bst.count += 1

## Testing Insert 

Using similar test inputs as before, we should see that the root (e.g. `bst.data[0]` should be 10).

* 10 (the first value / root) should be in position `data[0]`.
* 20 (the second inserted value should be in position `data[2]`.
* 5 (the third inserted value should be in position `data[1]`.
* 25 is inserted next to the right of 20, and thus position `data[6]`.
* 15 is larger than, smaller than 20, so it will be in position `data[5].`


In [25]:
bst = BinarySearchTreeSequential()

inputs = [10, 20, 5, 25, 15]

for input in inputs:
    insert_s(bst,input)

print(bst.data[0:10])

[10, 5, 20, None, None, 15, 25, None, None, None]


If we insert a number larger than 25, it will flow all the way to the right to position $6*2+2 =14$.

In [26]:
insert_s(bst, 27)

print(bst.data[0:15])

[10, 5, 20, None, None, 15, 25, None, None, None, None, None, None, None, 27]


And a number smaller than 25 but larger than 20 will fall into $6*2+1=13$
And a number smaller than 10 but larger than 5 will fall into $1*2+2=4$

In [27]:
insert_s(bst, 24)
insert_s(bst, 6)
print(bst.data[0:15])

[10, 5, 20, None, 6, 15, 25, None, None, None, None, None, None, 24, 27]


The below represents the values of the tree in a structural form. `N = None`

               10    
             _/  \_
           _/      \_
          5          20
        /   \       /  \
       N     6     15   25  
      / \   / \    / \  / \
     N   N N   N  N  N 24 27   
     
The below represents what indexes are represented by each location of the tree.

                0    
             _/   \_
           _/       \_
          1            2
        /   \         /  \
       3     4       5    6  
      / \   / \     / \  /  \
     7   8 9   10  11 12 13 14 
     
## Expanding the Tree
As you see, as nodes are added to the right of the tree (because we are not balancing this tree out) if we add successively larger numbers our tree will start to look like this, and our number of indexes needed will grow significantly.  THe number in parenthesis is the index value.  You can see when we are unbalanced, we have a very sparse tree and thus an array is **not** a very efficient way to store the values.

       0  (0)
        \
         1 (2)
          \
           2 (6)
            \ 
             3 (14)
              \
               4 (30)
                \
                 5 (62)
                  \ 
                   6 (126)
                    ...
                     \
                     ...
                      20 (4,194,302)
                      
It's for this (and other) reasons that we want to keep these trees balanced.  We will learn how to balance trees in week 2.

In [28]:
bst = BinarySearchTreeSequential()

for i in range(1, 20):
    insert(bst,i)

print(f"How bloated is it: BST Contains {bst.count} records in an array of size {len(bst.data)}")


AttributeError: 'BinarySearchTreeSequential' object has no attribute 'root'

## Searching (Sequential Structure)

One may be tempted to simply perform linear search because we haven an array. But remember, linear search is $O(n)$ where Binary Search is $O(log n)$.  Binary Search using a sequential structure works the same as dynamic.  We start at the root and then traverse "left" or "right" until we find the data we want, or the None.

In [29]:
    def search_s(bst, data):
        idx = 0
        while bst.data[idx] is not None:
            if data == bst.data[idx]:
                return idx
            elif data < bst.data[idx]:
                idx = bst.__get_idx_of_left__(idx)
            else:
                idx = bst.__get_idx_of_right__(idx)

            if idx > len(bst.data) - 1:
                return None

Let's create some basic input and just test it out, make sure it works.

For this implementation of search, we are returning the **index** of where the data live.  This is a choice, we could just return the data, but returning the exact information that was requested isn't very interesting.

In [30]:
bst = BinarySearchTreeSequential()

inputs = [10, 20, 5, 25, 15]

for input in inputs:
    insert_s(bst,input)

idx = search_s(bst,10)

print(f"I expect the index to be 0 for this one. idx:{idx} Pass:{idx==0}")

I expect the index to be 0 for this one. idx:0 Pass:True


## STUDENT EXAMPLE

---
<span style="color:green">
Using the below code, try and write a few inserts and test to make sure you understand how tree sequences work depending on the order of insertion!
</span>

---



In [31]:
student_bst = BinarySearchTreeSequential()

# inputs = [#,#,#,#,#,#,#,#,#,#]
# search_values = [#,#,#,#]
# expected_indexes = [#,#,#,#]

inputs = [1,2,3,4,5,6,7,8,9,10]
search_values = [1,2,3,4]
expected_indexes = [0,2,6,14]

for input in inputs:
    insert_s(student_bst,input)

for sv, exp in zip(search_values, expected_indexes):
    r = search_s(student_bst,sv)
    print(f"I searched for {sv}, expected: {exp} and got: {r}.  Pass: {exp==r}")


I searched for 1, expected: 0 and got: 0.  Pass: True
I searched for 2, expected: 2 and got: 2.  Pass: True
I searched for 3, expected: 6 and got: 6.  Pass: True
I searched for 4, expected: 14 and got: 14.  Pass: True


# Deleting from the Sequential BST

The method for deleting from a Sequential BST is much more involved than deleting from a dynamic tree.  Primarily because we have to shift entire substrees around in the array rather than just updating a few links.

We will discuss the methods for doing this in later lectures when we discuss more indept the various traversal alg