## PCWbook Session 9.1

## 1️⃣ - Readings 

### Q1 

Q: What is the role of randomization in the efficiency of several operations supported by a randomly-generated BST?

A: Randomization has an important role in the efficiency of several operations supported by a randomly-generated BST. A randomly-generated BST ensures that the height of the tree is balanced on average, which results in an O(log n) time complexity for operations such as search, insertion, and deletion. In contrast, an unbalanced BST could lead to a worst-case time complexity of O(n) for these operations. Also, randomization allows for better distribution of nodes, reducing the likelihood of skewness in the tree. This ensures that each node's left and right subtree are roughly the same size, leading to better performance for operations such as selection and rank. Overall, randomization helps to maintain the balance and efficiency of the BST.

## 3️⃣ - PCW Problems 
### Q1

In [None]:
def depth(bst, node):
    """
    Finds the depth of the node in a BST. Depth of root is 0.
        
    Parameters
    ----------
    bst: BinarySearchTree
        The binary search tree where the node belongs to
    node: Node
        An existing node in the BST that we need to compute the depth of

    Returns
    ----------
    int
        the depth from root of the tree to node
    """

    depth = 0
    parent = node.parent
    
    # traversing through the parents until the root and count steps
    while parent is not None:
        depth += 1
        parent = parent.parent
    return depth

### Q2 

note: this code works in the "explore" function in the original PCWbook with the BST Class from the previous section in a hidden code cell. 

In [None]:
def average_comparisons(bst):
    """
    Finds the average number of comparisons required 
    to search for a randomly chosen element of a standard BST. 

    Parameters
    ----------
    bst: BinarySearchTree
        The binary search tree where we wish to find the average number of comparisons

    Returns
    ----------
    float
        the average number of comparisons
    """
    
    comparisons = 0
    lst_nodes = bst.inorder()
    if len(lst_nodes) == 0:
        return None
    # computes the average of all comparisons needed
    for el in lst_nodes:
        comparisons += depth(bst, bst.search(el)) + 1

    return comparisons/len(lst_nodes)

### Q3

note: this code works in the "explore" function in the original PCWbook with the BST Class from the previous section in a hidden code cell. 

In [None]:
def max_depth(bst):
    """
    Finds the maximum depth of the node with brute-force approach.
    
    Input:
    bst: BinarySearchTree
        The binary search tree where the node belongs to
    
    Output:
    h: int
        The maximum depth in a node
    """
    lst_nodes = bst.inorder()
    lst_depths = []
    # computes the depth of every node
    if len(lst_nodes) == 0:
        return 0
    for el in lst_nodes:
        lst_depths.append(depth(bst, bst.search(el)))
    return max(lst_depths)
    
## testing your code
bst = BinarySearchTree()
nodes = [Node(15), Node(6), Node(18), Node(3), Node(7), 
         Node(17), Node(20), Node(2), Node(4)]
for node in nodes:
    bst.insert(node)
print(max_depth(bst))

# test
assert(max_depth(bst) == 3)

### Q4 

In [None]:
def avg_depth(bst):
    """
    Computes the average depth of a BST
    
    Input:
    bst: BinarySearchTree
        The binary search tree
    
    Output:
    - avg_d: float
        Average depth of the binary search tree
    """
    if average_comparisons(bst) == None:
        return 0
    return average_comparisons(bst)-1

### Q5

note: this code works in the "explore" function in the original PCWbook with the BST Class from the previous section in a hidden code cell.


Q: Now, insert randomly shuffled lists into BSTs, and measure the average depth and the maximum depth. How do these statistics scale as you increase 𝑁, the number of nodes? 

A: Based on the plot below, created by generating 50 random BSTs ranging in size from 0 to 1000 with a step of 10 and calculating the mean for two statistics, it looks like the scales of the statistics follow logarithmic patterns. Two graphs were then generated, where the x-axis represents the different input sizes and the curves represent the maximum and average depth for each size. 

In [None]:
import random
import matplotlib.pyplot as plt

def lst_to_bst(lst_nodes):
    """
    Creates a binary search tree object of a given list
    
    Input:
    lst_nodes: list
        The list of nodes to be inserted into the binary search tree
    
    Output:
    bst: BinarySearchTree
        The binary search tree
    """
    
    bst = BinarySearchTree()
    for node in lst_nodes:
        bst.insert(node)
    return bst

def random_lst_size_k(k):
    """
    Returns a randomly generated list with a given size k
    
    Input:
    k: Int
        size of the randomly generated list
    
    Output:
    lst_random: list of lists
        List with 50 randomly generated lists inside
    """
    lst_random = []
    for i in range(50):
        new_lst = []
        for j in range(k):
            new_lst.append(random.randint(0, 99))
        lst_random.append(new_lst)
    return lst_random

# main
input_sizes = [i for i in range(0, 1000, 10)]
results_avg_depth = []
results_max_depth = []
for input_size in input_sizes:
    avg_depths = []
    max_depths = []
    for bst in random_lst_size_k(input_size):
        current_tree = lst_to_bst(bst)
        avg_depths.append(avg_depth(current_tree))
        max_depths.append(max_depth(current_tree))
    avg_depth_mean = sum(avg_depths) / len(avg_depths)
    max_depth_mean = sum(max_depths) / len(max_depths)
    results_avg_depth.append(avg_depth_mean)
    results_max_depth.append(max_depth_mean)

plt.plot(input_sizes, results_avg_depth, label='Average Depth')
plt.plot(input_sizes, results_max_depth, label='Max Depth')
plt.xlabel('Input Size')
plt.ylabel('Tree Depth')
plt.title('Binary Search Tree Depth vs. Input Size')
plt.legend()
plt.show()

### Q6 

Q: Is the plot above in agreement with the theoretically expected result? Explain.

A: According to Cormen et al. (2015), we can expect the height of a randomly constructed binary search tree with n distinct keys to have an expected value of O(logn). This expectation aligns with the curve representing the maximum depth of the tree, which essentially reflects the height of the tree.