# Binary Search Tree

### Overview

This notebook showcases my implementation of a Binary Search Tree! 

For this I use a very simple csv that contains the top 25 highest grossing movies world wide as of june 2022!

For a closer look at the Code check out 'bst.py'!

---

#### Definition of a Binary Serach Tree
- A Binary Search Tree has a root

- Each node can have up to 2 children

- Each node holds a key that is

- less than any key in its right subtree

- greater than any key in its left subtree

---

#### Features for the time being are:

**Main features**
- Insertion: Inserts a node into the BST

- Search: Finds and returns a node in the BST

- Deletion: Deletes a node from the BST

**Additional features**
- In-order-traversal: Returns all of the nodes in the BST in an ascending order

- Maximum/Minimum: Returns the node with the max/min key

- Height: Return the height of the BST (or a subtree)
---
#### Time complexity

**Worst time complexity:** $O(n)$
> Worst time complexity occurs when the hight is equal to the amount of nodes in the tree $h = n$. 

> This essentially means our tree is functioning as a linked-list! (In the worst case: For a node to be inserted, found or deleted the algorithm would have to traverse every node in this 'tree'. This is consideration is especially important when inserting a list of already sorted nodes! A naive method to avoid this is to shuffle the data randomly. More sophisticated solutions are self-balancing trees such as AVL-Trees or R&B Trees.

**Average time complexity:** $O(\log n)$
> Average time complexity depends on the height of the tree which is expected to be $h = \log n$ in the average case.

*Notice:* Time complexity depends on the hight of the tree. It is how the hight is bound that determines worst case complexity.

---

**Sources**
- [Wikipedia](https://en.wikipedia.org/wiki/Binary_search_tree#Optimal_binary_search_trees)

- [GeeksforGeeks](https://www.geeksforgeeks.org/binary-search-tree-data-structure/?ref=lbp)

- [Visualization](https://www.cs.usfca.edu/~galles/visualization/BST.html)

### Importing and preparing data

In [1]:
# Import the necessary tools
from bst import *
import pandas as pd
import random

In [2]:
# Read in data with pandas
df = pd.read_csv('example_data.csv', delimiter = ';')

# Get an idea what our data looks like
df.head()

Unnamed: 0,Rank,Title,Release
0,1,Avatar,2009
1,2,Avengers: Endgame,2019
2,3,Titanic,1997
3,4,Star Wars: Episode VII - The Force Awakens,2015
4,5,Avengers: Infinity War,2018


In [3]:
# assign columns
rank = df['Rank']
title = df['Title']
year = df['Release']

In [4]:
# Create empty list that will contain all the nodes
nodes_list = []
# Loop on data to create Nodes
for i in range(0, len(title)):
    # The rank can be used as unique identifier
    # The title will be carried as data
    node = Node(key = rank[i], data = title[i])
    # After creation put node into nodes_list
    nodes_list.append(node)

# Get a look at a node
print(nodes_list[10])

Object with key: 11 and data: Frozen II


In [5]:
# Initialize our 'movie_tree'
movie_tree = BinarySearchTree()

## Insertion

In order to preserve the properties of a BST each node is inserted as a leaf. In my case the insertion is carried out iteratively.

In [6]:
# Shuffel the list so our tree doesn't become a linked list (sorted data will do that to our simple bst)
random.shuffle(nodes_list)

# Insert all of the nodes in our nodes_list into our movie_tree
for node in nodes_list:
    movie_tree.bst_insert(node)

In [7]:
# Get a look at our movie_tree
movie_tree.visualize()

        1: Avatar
    2: Avengers: Endgame
            3: Titanic
        4: Star Wars: Episode VII - The Force Awakens
5: Avengers: Infinity War
                6: Spider-Man: No Way Home
            7: Jurassic World
                8: The Lion King
        9: Marvel's The Avengers
                    10: Furious 7
                            11: Frozen II
                        12: Avengers: Age of Ultron
                13: Black Panther
                    14: Harry Potter and the Deathly Hallows
                            15: Star Wars: Episode VIII - The Last Jedi
                        16: Jurassic World: Fallen Kingdom
            17: Frozen
                        18: Beauty and the Beast
                    19: Incredibles 2
                        20: The Fate of the Furious
                21: Iron Man 3
    22: Minions
            23: Captain America
        24: Aquaman
            25: The Lord of the Rings: The Return of the King


## Search

Search takes a given key and compares every node on the path a node with that key would take to be inserted!

In [8]:
# E.g. we want to find the node that stores the 16th placed movie!
my_node = movie_tree.bst_search_iter(16)
# Let's see our result
print(my_node)
# And what about its parent node?
print(my_node.parent)

Object with key: 16 and data: Jurassic World: Fallen Kingdom
Object with key: 14 and data: Harry Potter and the Deathly Hallows


## Deletion

Deletion is a BST is a bit more tricky - we destinct between 3 cases!

The node to be removed is:

1. A leaf-node
2. A node with only one child
3. A node with 2 children

For cases 1 & 2 the procedure is kind of simple:
- Remove the node and if there is a single child 'pull it up' where its parent was.

For case 3 we have to get a little bit more creative
- Find a node with 1 or no children and replace node to be removed with it.

*See code in 'bst.py'*

In [9]:
# E.g. we want to remove node 11
node_11 = movie_tree.bst_search_iter(11)
movie_tree.bst_delete(node_11)
# Let's take a look and see if it worked
movie_tree.visualize()

        1: Avatar
    2: Avengers: Endgame
            3: Titanic
        4: Star Wars: Episode VII - The Force Awakens
5: Avengers: Infinity War
                6: Spider-Man: No Way Home
            7: Jurassic World
                8: The Lion King
        9: Marvel's The Avengers
                    10: Furious 7
                        12: Avengers: Age of Ultron
                13: Black Panther
                    14: Harry Potter and the Deathly Hallows
                            15: Star Wars: Episode VIII - The Last Jedi
                        16: Jurassic World: Fallen Kingdom
            17: Frozen
                        18: Beauty and the Beast
                    19: Incredibles 2
                        20: The Fate of the Furious
                21: Iron Man 3
    22: Minions
            23: Captain America
        24: Aquaman
            25: The Lord of the Rings: The Return of the King


Yay, I never liked Frozen II anyways.
Let's pretend a better movie achieved its ranking!

## In-order-traversal

In-order tree walk: Nodes from the left subtree get visited first, followed by the root node and right subtree

In [10]:
# Create new_node
new_node_11 = Node(11, data = 'Atlantis: The Lost Empire')
# Insert new_node
movie_tree.bst_insert(new_node_11)
# Let's take a look at our new list with the in-order-traversal method
movie_tree.in_order_traversal(movie_tree.root)

Object with key: 1 and data: Avatar
Object with key: 2 and data: Avengers: Endgame
Object with key: 3 and data: Titanic
Object with key: 4 and data: Star Wars: Episode VII - The Force Awakens
Object with key: 5 and data: Avengers: Infinity War
Object with key: 6 and data: Spider-Man: No Way Home
Object with key: 7 and data: Jurassic World
Object with key: 8 and data: The Lion King
Object with key: 9 and data: Marvel's The Avengers
Object with key: 10 and data: Furious 7
Object with key: 11 and data: Atlantis: The Lost Empire
Object with key: 12 and data: Avengers: Age of Ultron
Object with key: 13 and data: Black Panther
Object with key: 14 and data: Harry Potter and the Deathly Hallows
Object with key: 15 and data: Star Wars: Episode VIII - The Last Jedi
Object with key: 16 and data: Jurassic World: Fallen Kingdom
Object with key: 17 and data: Frozen
Object with key: 18 and data: Beauty and the Beast
Object with key: 19 and data: Incredibles 2
Object with key: 20 and data: The Fate of

## Maximum & Minimum
The right-most node of a BinarySearchTree is the node with the largest tree.

The left-most node of a BinarySearchTree is the node with the smallest tree.

In [11]:
# Let's find our highest ranking movie
first_rank_movie = movie_tree.bst_minimum()
print(first_rank_movie)
# And our lowest ranking movie
last_rank_movie = movie_tree.bst_maximum()
print(last_rank_movie)

Object with key: 1 and data: Avatar
Object with key: 25 and data: The Lord of the Rings: The Return of the King


## Height
Since the height of a tree determines the worst case complexity of operations it is important to keep track of it!

Our height-method takes a node as starting point. This allows to check out the height of sub-trees aswell.

In [12]:
# Let's retrieve the hight of the entire tree
my_root = movie_tree.root
height = movie_tree.bst_height(my_root)
print(height)

8


## Count

In [25]:
# Let's see what the expected value of our hight is
import math
# amount of nodes we earlier inserted
n = movie_tree.bst_count(my_root)
print(n)
# expected is the binary log of n
expected_hight = math.log2(n)
print(expected_hight)

25
4.643856189774724


There is definitely room for improvement!

This subject will be further discussed in my next ipynb where I plan to implement a balanced BST!