<div align="center">
    <h1>DS-210: Programming for Data Science</h1>
    <h1>Lecture 18</h1>
</div>

1. Binary trees
2. Binary search trees


# Binary trees

## Trees vs Graphs

Trees are a special type of graph with a few key features
* Hierarchical structure with a root node
* Acyclic
* Parent and child nodes with well-defined ancestry (every node besides the root has only one parent)
* Only n-1 edges for n nodes
* May constrain the number of children (in a binary tree each parent can have at most two children)


A general tree:
<div align="center">
<img src="tree-example.jpg" alt="[non-binary tree example]" width="100%">
</div>

<div align="center">
<img src="binary-tree-example.jpg" alt="[binary tree example]" width="70%">
</div>


## Why are trees useful?


### ‚úÖ Anything with a hierarchical data structure (e.g., File systems, HTML DOM structure)

* File systems: Directories and files form a natural tree‚Äîfolders contain subfolders and files, and this nesting forms a hierarchy.
* HTML DOM: The structure of a webpage is a tree where elements nest inside each other; the root is usually the `<html>` tag.

### üóú Data compression (Huffman coding)

* Builds a binary tree where the most frequent characters are near the top and less frequent ones further down.
* Encoding: Each character gets a unique binary string based on its path from the root‚Äîshorter for frequent characters.

### üß† Compilers use syntax trees (Abstract Syntax Trees - ASTs)

* Represents code structure: A program‚Äôs nested expressions, blocks, and statements naturally form a tree.
* Compiler passes: Syntax trees are walked recursively for semantic checks, optimization, and code generation.
* In Rust: The Rust compiler (rustc) builds and transforms ASTs as part of its pipeline


### üî° Prefix trees (Tries) (discussed below)

* Spell checking and autocomplete: Fast prefix search to suggest or validate words.
* Internet routing: IP addresses share prefixes, and tries are used for longest prefix match routing.
* Memory vs speed tradeoff: Tries use more memory but offer O(k) search time, where k is the key length.


### üíæ Databases use trees for indexing (e.g., B-trees, B+ trees)

* Efficient range queries and lookups: Balanced trees ensure O(log n) insert/search/delete.
* Disk-friendly: B-trees minimize disk reads by keeping nodes wide and shallow.
* Used in practice: PostgreSQL, SQLite, and others use variants of B-trees for indexing.

### üîê Merkle trees

* Used in blockchains: Efficient verification of large data sets by checking only a small number of hashes.
* Tamper detection: Changing one leaf changes hashes all the way up‚Äîperfect for integrity checking.
* Applications: Bitcoin, Git, BitTorrent, and more rely on Merkle trees.


### üåê Spanning trees in networks

* Minimal spanning tree (MST): Ensures all nodes in a network are connected with minimal total edge weight.
* Used in routing: Algorithms like Kruskal‚Äôs or Prim‚Äôs help avoid loops in protocols like Ethernet (Spanning Tree Protocol).

### üå≥ Decision trees

* Machine learning: Trees split data based on feature thresholds to make predictions.
* Easy to interpret: Each path in the tree represents a rule for classification.

### üîÉ Sorting algorithms

* Binary heaps: Used in heap sort‚Äîtrees where the parent is smaller/larger than children.
* BST-based sorts: In-order traversal of a Binary Search Tree gives sorted order.
* AVL/Red-Black Trees: Self-balancing trees used to maintain order during dynamic inserts/deletes.

### üîç Search algorithms

* Binary Search Trees (BSTs): Allow for fast search, insert, and delete in O(log n) time if balanced.
* Trie-based search: Efficient for prefix and word lookup.
* Tree traversal: DFS and BFS strategies apply directly‚Äîrecursion meets queue/stack use in Rust.

## Important Terms

* **Root** - the topmost node of the tree, only node without a parent
* **Ancestor vs Descendant** - If two nodes are connected, the node that is closer to the root is the ancestor of the other node, similarly the further node its descendant
* **Parent vs Child** - A nodes immediate descendant is its child, and a nodes immediate ancestor is its parent
* **Leaf** - a node with no child nodes
* **Subtree** - A subset of the tree, with a parent of the original tree becoming the root of the subtree
* **Depth** - the greatest number of descendants between the root and a leaf node
* **Level** - the number of edges between a node and the root. Root node has level 0
* **Sibling** - nodes that share the same parent

## Why are binary trees useful?

* Simple to analyze
* Many algorithms boil down to a series of "True/False" decisions that map well to binary trees
* Ordering in particular fits well (either you are "<" or you aren't)
* Easy to use with recursive algorithms 

## What types of binary trees are there?

* Complete (every level except the last one are full)
* Perfect (all leaves are at the same level, all interior nodes have 2 children)
* Balanced (left and right subtrees differ by at most 1 level)
* Special
  * Binary Heaps
  * Binary Search Trees
  
## Any other interesting trees?

* Prefix trees or tries

<div align="center">
<img src="binary-tree-example.jpg" alt="[binary tree example]" width="70%">
</div>

## Using Vectors to implement a Binary Tree

### TreeNode struct and new() method

In [2]:
#[derive(Debug)]
struct TreeNode {
    value: usize,
    left: Option<usize>,
    right: Option<usize>,
}

impl TreeNode {
    // Constructor method
    // This method creates a new TreeNode with the given value
    // and initializes left and right child pointers to None
    fn new(value: usize) -> Self {
        TreeNode {
            value,
            left: None,
            right: None,
        }
    }
}

### BinaryTree struct and methods

In [3]:
#[derive(Debug)]
struct BinaryTree {
    // A vector of TreeNode structs
    // The index of the vector is the node index
    nodes: Vec<TreeNode>,
    root: Option<usize>,
}

impl BinaryTree {
    // Constructor method
    // This method creates a new BinaryTree with an empty vector of nodes
    // and a root pointer set to None
    fn new() -> Self {
        BinaryTree { nodes: Vec::new(), root: None }
    }

    fn insert(&mut self, value: usize) {
        let new_node_index = self.nodes.len();
        self.nodes.push(TreeNode::new(value));

        match self.root {
            Some(root_index) => self.insert_at(root_index, new_node_index),
            None => self.root = Some(new_node_index),
        }
    }

    // Inserts a new node into a binary tree by updating the left or right child
    // index of the current node, or if full, a left child node.
    // * Start at the specified current node index:
    //   * if both children are empty, insert the left child
    //   * if left child is occupied, insert the right child
    //   * if both children are occupied, recursively attempt to insert the new
    //     node in the left subtree
    fn insert_at(&mut self, current_index: usize, new_node_index: usize) {
        let current_node = &mut self.nodes[current_index];
        if current_node.left.is_none() {
            current_node.left = Some(new_node_index);
        } else if current_node.right.is_none() {
            current_node.right = Some(new_node_index);
        } else {
            // Insert in left subtree for simplicity, could be more complex to keep balanced
            let left = current_node.left.unwrap();
            self.insert_at(left, new_node_index);
        }
    }
    
}


> Remember that `left` and `right` are indices, but `value` is the value we 
> associated with the node.

* Predict the BinaryTree structure as we repeatedly insert.

In [4]:
let mut tree = BinaryTree::new();
tree.insert(1);
tree.insert(2);
tree.insert(3);
tree.insert(6);
tree.insert(7);
tree.insert(11);
tree.insert(23);
tree.insert(34);

println!("{:#?}", tree);


BinaryTree {
    nodes: [
        TreeNode {
            value: 1,
            left: Some(
                1,
            ),
            right: Some(
                2,
            ),
        },
        TreeNode {
            value: 2,
            left: Some(
                3,
            ),
            right: Some(
                4,
            ),
        },
        TreeNode {
            value: 3,
            left: None,
            right: None,
        },
        TreeNode {
            value: 6,
            left: Some(
                5,
            ),
            right: Some(
                6,
            ),
        },
        TreeNode {
            value: 7,
            left: None,
            right: None,
        },
        TreeNode {
            value: 11,
            left: Some(
                7,
            ),
            right: None,
        },
        TreeNode {
            value: 23,
            left: None,
            right: None,
        },
        TreeNode {
        

> Reminder: The numbers in the circles are the values, not the indices of the node.

<div align="center">
<img src="binary_tree.png" alt="[binary tree]" width="100%">
</div>

This isn't super readable, is there a better way to output this tree?

# Tree Traversal

There are several ways to traverse trees using algorithms we have seen before:
* Using BFS
  * Level-order Traversal
* Using DFS
  * Pre-order Traversal
  * In-order Traversal
  * Post Order Traversal

We are going to use level-order traversal to visit each level of the tree in order

## Level-Order Traversal (BFS)

Just like before.

* Create an empty queue
* Add the root of tree to queue
* While the queue is not empty
  * Remove node from queue and visit it
  * Add the left child of node to the queue if it exists
  * Add the right child of node to the queue if it exists
  
 

In [5]:
use std::collections::VecDeque;
fn level_order_traversal(tree: &BinaryTree) {
  if let Some(root_index) = tree.root {
    let mut queue = VecDeque::new();
    queue.push_back(root_index);
    while let Some(node_index) = queue.pop_front() {
      let node = &tree.nodes[node_index];
      println!("{}", node.value);
      if let Some(left_index) = node.left {
        queue.push_back(left_index);
      }
      if let Some(right_index) = node.right {
        queue.push_back(right_index);
      }
    }
  }
}

let mut tree2 = BinaryTree::new();
tree2.insert(1);
tree2.insert(2);
tree2.insert(3);
tree2.insert(6);
tree2.insert(7);
tree2.insert(11);
tree2.insert(23);
tree2.insert(34);
level_order_traversal(&tree2);

1
2
3
6
7
11
23
34


In [6]:
// Slightly more complex version that prints each level on a different line
use std::collections::VecDeque;
fn level_order_traversal2(tree: &BinaryTree) {
  if let Some(root_index) = tree.root {
    let mut queue = VecDeque::new();
    queue.push_back(root_index);
    while !queue.is_empty() {
      let level_size = queue.len();
        for _ in 0..level_size {
          if let Some(node_index) = queue.pop_front() {
            let node = &tree.nodes[node_index];
            print!("{} ", node.value);

            if let Some(left_index) = node.left {
              queue.push_back(left_index);
            }
            if let Some(right_index) = node.right {
              queue.push_back(right_index);
            }
        }
      }
      println!(); // New line after each level
    }
  }
}

let mut tree2 = BinaryTree::new();
tree2.insert(1);
tree2.insert(2);
tree2.insert(3);
tree2.insert(6);
tree2.insert(7);
tree2.insert(11);
tree2.insert(23);
tree2.insert(34);
level_order_traversal2(&tree2);

1 
2 3 
6 7 
11 23 
34 


## Depth Traversals

### Pre-Order Traversal

Often used when making a copy of a tree, uses DFS

Algorithm:
* Visit the current node (e.g. print the current node value)
* Recursively traverse the current node's left subtree
* Recursively traverse the current node's right subtree

The pre-order traversal is a topologically sorted one, because a parent node is processed before any of its child nodes is done.


In [7]:
fn pre_order_traversal(tree: &BinaryTree, node_index: Option<usize>) {
  if let Some(index) = node_index {        // If the node index is not None
    let node = &tree.nodes[index];         // Get the node at the index
    println!("{}", node.value);            // Visit (print) the current node value
    pre_order_traversal(tree, node.left);  // Traverse the left subtree
    pre_order_traversal(tree, node.right); // Traverse the right subtree
  }
}

pre_order_traversal(&tree2, tree2.root)

1
2
6
11
34
23
7
3


()

### In-Order Traversal

In-Order traversal:

* Recursively traverse the current node's left subtree.
* Visit the current node (e.g. print the node's current value).
* Recursively traverse the current node's right subtree.

In a binary search tree ordered such that in each node the value is greater than
all values in its left subtree and less than all values in its right subtree,
in-order traversal retrieves the keys in ascending sorted order.



How do we update `pre_order_traversal()` to get `in_order_traversal()`?

In [8]:
fn in_order_traversal(tree: &BinaryTree, node_index: Option<usize>) {
    if let Some(index) = node_index {       // If the node index is not None
      let node = &tree.nodes[index];        // Get the node at the index
      in_order_traversal(tree, node.left);  // Traverse the left subtree
      println!("{}", node.value);           // Visit (print) the current node value
      in_order_traversal(tree, node.right); // Traverse the right subtree
    }
  }
  
  in_order_traversal(&tree2, tree2.root)

34
11
6
23
2
7
1
3


()

### Post-Order traversal

Good for deleting the tree or parts of it, must delete children before parents:

* Recursively traverse the current node's left subtree.
* Recursively traverse the current node's right subtree.
* Visit the current node.


In [9]:
fn post_order_traversal(tree: &BinaryTree, node_index: Option<usize>) {
    if let Some(index) = node_index {       // If the node index is not None
      let node = &tree.nodes[index];        // Get the node at the index
      post_order_traversal(tree, node.left);  // Traverse the left subtree
      post_order_traversal(tree, node.right); // Traverse the right subtree
      println!("{}", node.value);           // Visit (print) the current node value
    }
  }
  
  post_order_traversal(&tree2, tree2.root)

34
11
23
6
7
2
3
1


()

# Binary search trees

* Organize data into a binary tree

* Similar to binary heaps where the parent was greater (or lesser) than either of its children

* **Binary Search Tree:** the value of a node is greater than the values of all nodes
  in its left subtree and less than the value of all nodes in the right subtree.

* Enables _binary search_ for efficient lookup, addition and removal of items.

* Each comparison skips half of the remaining tree.

* Complexity for search, insert and delete is $O(\log n)$, assuming balanced trees.

<div align="center">
<img src="./binary_search_tree.png" alt="[sample tree]" width="37%">
</div>

* Invariant at each node:
  * all left descendants ${}\le{}$ parent
  * parent ${}\le{}$ all right descendants

<div align="center">
<img src="relationship.png" alt="[sample tree]" width="37%">
</div>

* Compared to binary heaps:
  * different ordering of elements

## Basic operations: find a key $k$

How can we do this?

* Descend recursively from the root until $k$ found or stuck:
  * If current node key is $k$, return.
  * If $k<{}$ value at the current node, go left
  * If $k>{}$ value at the current node, go right
  * If finished searching, return not found

<br>
<br>
<div align="center">
    <b>[Example: Find the key 7 in the tree above.]</b>
</div>

## Basic operations: insert a key $k$

How can we do this?

* Keep descending from the root until you leave the tree
  * If $k\le{}$ value at the current node, go left
  * If $k>{}$ value at the current node, go right
* Create a new node containing $k$ there

<br>
<br>
<div align="center">
    <b>[Example: Insert value 6 in tree above.]</b>
</div>

<br><br>

<div align="center">
  <img src="./bst_insert_6.png" width="80%">
</div>

## Basic operations: delete a node

How can we do this?

This can be more complicated because we might have to find a replacement.

### Case 1 -- Leaf Node

If node is a leaf node, just remove and return.




### Case 2 -- One Child

If node has only one child, remove and move the child up.

Example: Remove 4.

```
      7
     / \
    4   11
     \
      6
```

Remove 4 and move 6 up.

```
      7
     / \
    6   11
```




### Case 3 -- Else

Otherwise, find the _left-most_ descendant of the _right_ subtree and move up.

This is the _inorder successor_.

```
        7
       / \
      4   11
     / \
    2   6
   / \ / \
  1  3 5  8
```

In this case it is 5, so replace 4 with 5.

```
        7
       / \
      5   11
     / \
    2   6
   / \   \
  1  3    8
```


## Cost of these operations?

<div align="center">
    <b> O( depth of the tree ) </b>
</div>

**Bad news:** the depth can be made proportional to $n$, the number of nodes. How?

**Good news:** smart ways to make the depth $O(\log n)$

## Balanced binary search trees

There are smart ways to rebalance the tree!

* Depth: $O(\log n)$

# Binary search trees -- Implementation

1. Applications (range searching)
2. Rust: `BTreeMap` and `BTreeSet`
3. Tries (Prefix Trees)


In [10]:
#[derive(Debug)]
struct TreeNode {
    value: usize,
    left: Option<usize>,
    right: Option<usize>,
}

impl TreeNode {
    fn new(value: usize) -> Self {
        TreeNode {
            value,
            left: None,
            right: None,
        }
    }
}

#[derive(Debug)]
struct BinaryTree {
    nodes: Vec<TreeNode>,
    root: Option<usize>,
}

impl BinaryTree {
    fn new() -> Self {
        BinaryTree { nodes: Vec::new(), root: None }
    }

    fn insert(&mut self, value: usize) {
        let new_node_index = self.nodes.len();
        self.nodes.push(TreeNode::new(value));

        match self.root {
            Some(root_index) => self.insert_at(root_index, new_node_index, value),
            None => self.root = Some(new_node_index),
        }
    }

    fn insert_at(&mut self, current_index: usize, new_node_index: usize, value:usize) {
        let current_node = &mut self.nodes[current_index];
        if current_node.value < value {
            if current_node.right.is_none() {
                current_node.right = Some(new_node_index);
            } else {
                let right = current_node.right.unwrap();
                self.insert_at(right, new_node_index, value);
            }
        } else {
            if current_node.left.is_none() {
                current_node.left = Some(new_node_index);
            } else {
                let left = current_node.left.unwrap();
                self.insert_at(left, new_node_index, value);
            }
        }
    }
}


* What happens if we the following values?

In [11]:

let mut tree = BinaryTree::new();
tree.insert(1);
tree.insert(2);
tree.insert(3);
tree.insert(6);
tree.insert(7);
tree.insert(11);
tree.insert(23);
tree.insert(34);
println!("{:#?}", tree);


BinaryTree {
    nodes: [
        TreeNode {
            value: 1,
            left: None,
            right: Some(
                1,
            ),
        },
        TreeNode {
            value: 2,
            left: None,
            right: Some(
                2,
            ),
        },
        TreeNode {
            value: 3,
            left: None,
            right: Some(
                3,
            ),
        },
        TreeNode {
            value: 6,
            left: None,
            right: Some(
                4,
            ),
        },
        TreeNode {
            value: 7,
            left: None,
            right: Some(
                5,
            ),
        },
        TreeNode {
            value: 11,
            left: None,
            right: Some(
                6,
            ),
        },
        TreeNode {
            value: 23,
            left: None,
            right: Some(
                7,
            ),
        },
        TreeNode {
        

```
1
 \
  2
   \
    3
     \
      6
       \
        7
         \
         11
          \
          23
           \
           34
```

Unbalanced binary search trees can be inefficient!

This is a degenerate tree -- basically a linked list.

## Balanced binary search trees

There are smart ways to rebalance the tree!

* Depth: $O(\log n)$

* Usually additional information has to be kept at each node

* Many popular, efficient examples:
  * Red‚Äìblack trees
  * AVL trees
  * BTrees (Used in Rust)
  * ...
  
  Fundamentally they all support rebalancing operations using some form of tree rotation.

  We'll look at some simple approaches.

## Basic operations: rebalance a tree

How can we do this?

* Quite a bit more complicated

* One basic idea. Find the branch that has gotten too long
  * Swap the parent with the child that is at the top of the branch by making the child the parent and the parent the child
  * If you picked a left child take the right subtree and make it a left child of the old parent
  * if you picked a right child take left subtree and make it a right child of the old parent
  
<br>
<br>

### Example

Let's go back to our degenerate tree. It's just one long branch.

```
1
 \
  2
   \
    3
     \
      6
       \
        7
         \
         11
          \
          23
           \
           34
```

Step 1: Rotate around 1-2-3.

```
  2
 / \
1   3
     \
      6
       \
        7
         \
         11
          \
          23
           \
           34
```

Step 2: Rotate around 3-6-7.

```
  2
 / \
1   6
   / \
  3   7
       \
       11
        \
        23
         \
         34
```

Step 3: Rotate around 7-11-23.

```
  2
 / \
1   6
   / \
  3   11
     /  \
    7    23
          \
          34
```

## A simple way to rebalance a binary tree

* First do an in-order traversal to get the nodes in sorted order
* Then use the middle of the sorted vector to be the root of the tree and recursively build the rest

In [12]:
impl BinaryTree {
    fn in_order_traversal(&self, node_index: Option<usize>)->Vec<usize> {
        let mut u: Vec<usize> = vec![];
        if let Some(index) = node_index {
            let node = &self.nodes[index];
            u = self.in_order_traversal(node.left);
            let mut v: Vec<usize> = vec![node.value];
            let mut w: Vec<usize> = self.in_order_traversal(node.right);
            u.append(&mut v);
            u.append(&mut w);
        }
        return u;
    }
}



In [13]:
let mut z = tree.in_order_traversal(tree.root);
z.sort();
println!("{:?}", z);

[1, 2, 3, 6, 7, 11, 23, 34]


In [14]:
impl BinaryTree {
    // This function recursively builds a perfectly balanced BST by:
    // 1. Finding the middle element of a sorted array to use as the root
    // 2. Recursively building the left and right subtrees using the elements
    //    before and after the middle element, respectively.
    fn balance_bst(&mut self, v: &[usize], start:usize, end:usize) -> Option<usize> {
        if start >= end {
            return None;
        }
        let mid = (start+end) / 2;
        let node_index = self.nodes.len();
        self.insert(v[mid]);
        self.nodes[node_index].left = self.balance_bst(v, start, mid);
        self.nodes[node_index].right = self.balance_bst(v, mid+1, end);
        Some(node_index)
    }
}



In [15]:
let mut bbtree = BinaryTree::new();
bbtree.balance_bst(&z, 0, z.len());
println!("{:#?}", bbtree);
println!("{:?}", level_order_traversal2(&bbtree));

BinaryTree {
    nodes: [
        TreeNode {
            value: 7,
            left: Some(
                1,
            ),
            right: Some(
                5,
            ),
        },
        TreeNode {
            value: 3,
            left: Some(
                2,
            ),
            right: Some(
                4,
            ),
        },
        TreeNode {
            value: 2,
            left: Some(
                3,
            ),
            right: None,
        },
        TreeNode {
            value: 1,
            left: None,
            right: None,
        },
        TreeNode {
            value: 6,
            left: None,
            right: None,
        },
        TreeNode {
            value: 23,
            left: Some(
                6,
            ),
            right: Some(
                7,
            ),
        },
        TreeNode {
            value: 11,
            left: None,
            right: None,
        },
        TreeNode {
        

Produces the result:

```
        7
      /   \
    3      23
   / \    /  \
  2   6  11  34
 /
1
```

## Why use binary search trees?

* Hash maps and hash sets give us $O(1)$ time operations?

### Reason 1: 

* Good worst case behavior: no need for a good hash function

### Reason 2:
* Can answer efficiently questions such as:
  * What is the smallest/greatest element?
  * What is the smallest element greater than $x$?
  * List all elements between $x$ and $y$
  

## Example: find the smallest element greater than $x$

**Question:** How can you list all elements in order in $O(n)$ time?

<br>
<br>
<div align="center">
    <b>[Work out on the board]</b>
</div>


**Answer:** recursively starting from the root

* visit left subtree
* output current node
* visit right subtree

**Outputting smallest element greater than $x$:**

* Like above, ignoring whole subtrees smaller than $x$
* Will get the first element greater than $x$ in $O(\log n)$ time

For balanced trees: listing $t$ first greater elements takes $O(t + \log n)$ time

# $B$-Trees

Are there binary search trees in Rust's standard library?

* Not exactly

* Binary Search Trees are computationally very efficent for search/insert/delete ($O(\log n)$).

* In practive, _very inefficient_ on modern computer architectures.
  * Every insert triggers a heap allocation
  * Every single comparison is a _cache-miss_.


Enter $B$-trees:

1. B-trees are balanced search trees where each node contains between B and 2B keys, with one more subtree than keys.

2. All leaf nodes are at the same depth, ensuring consistent O(log n) performance for search, insert, and delete operations.

3. B-trees are widely used in database systems for indexing and efficient range queries.

4. They're implemented in Rust's standard library as `BTreeMap` and `BTreeSet` for in-memory operations.

5. The structure is optimized for both disk and memory operations, with nodes sized appropriately for the storage medium.


<div align="center">
  <img src="./B-tree.svg" width="70%">
</div>

_By CyHawk, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=11701365_

What does the B stand for?

<em> Invented by Bayer and McCreight at Boeing.  Suggested explanations have been Boeing, balanced, broad, bushy, and Bayer.  McCreight has said that "the more you think about what the B in B-trees means, the better you understand B-trees. </em>

## `BTreeSet` and `BTreeMap`

```
std::collections::BTreeSet
std::collections::BTreeMap
```

Sets and maps, respectively

In [16]:
// let's create a set
use std::collections::BTreeSet;
let mut set: BTreeSet<i32> = BTreeSet::new();
set.insert(11);
set.insert(7);
set.insert(5);
set.insert(23);
set.insert(25);

In [17]:
// listing a range
set.range(7..25).for_each(|x| println!("{}", x));

7
11
23


In [18]:
// listing a range: left and right bounds and 
// whether they are included or not
use std::ops::Bound::{Included,Excluded};
set.range((Excluded(5),Included(11))).for_each(|x| println!("{}", x));

7
11


In [19]:
// Iterating through the items is the in-order traversal that will give you sorted output
for i in &set {
    println!("{}", *i);
}

5
7
11
23
25


()

In [23]:
// let's make a map now
use std::collections::BTreeMap;
// We can rely on type inference to avoid having to fully write out the types
let mut map = BTreeMap::new();
map.insert("DS310", "Data Mechanics");
map.insert("DS210", "Programming for Data Science");
map.insert("DS120", "Foundations of Data Science I");
map.insert("DS121", "Foundations of Data Science II");
map.insert("DS122", "Foundations of Data Science III");



In [24]:
// Try to find a course
if !map.contains_key("DS111") {
    println!("Course not found");
}

Course not found


()

In [25]:
for (course, name) in &map {
    println!("{course}: \"{name}\"");
}

DS120: "Foundations of Data Science I"
DS121: "Foundations of Data Science II"
DS122: "Foundations of Data Science III"
DS210: "Programming for Data Science"
DS310: "Data Mechanics"


()

> Note that the keys are sorted.

In [26]:
// listing a range
map.range("DS000".."DS199").for_each(|(x,y)| println!("{}: \"{}\"", x, y));

DS120: "Foundations of Data Science I"
DS121: "Foundations of Data Science II"
DS122: "Foundations of Data Science III"


## BTreeMap vs HashMap

1. **Use HashMap when:**
   - You need O(1) average-case lookup time for specific keys
   - You don't need to maintain any particular order of elements
   - You're doing mostly individual key lookups rather than range queries
   - Your keys implement `Hash`, `Eq`, and `PartialEq` traits

2. **Use BTreeMap when:**
   - You need to efficiently list all values in a specific range
   - You need ordered iteration over keys/values
   - You need to find the smallest/greatest element or elements between two values
   - You need consistent O(log n) performance for all operations
   - Your keys implement `Ord` trait

3. **Key Differences:**
   - HashMap provides O(1) average-case lookup but can have O(n) worst-case
   - BTreeMap provides guaranteed O(log n) performance for all operations
   - BTreeMap maintains elements in sorted order
   - HashMap is generally faster for individual lookups when you don't need ordering

4. **Practical Examples:**
   - Use HashMap for: counting word frequencies, caching, quick lookups
   - Use BTreeMap for: maintaining ordered data, range queries, finding nearest values

5. **Memory Considerations:**
   - HashMap may use more memory due to its underlying array structure which needs resizing (doubling and copyng) when full
   - BTreeMap's memory usage is more predictable and grows linearly with the number of elements -- adds new nodes as needed.


# Prefix Tree (Trie)

Can be pronounced /Ààtra…™/ (rhymes with tide) or /ÀàtriÀê/ (tree).

A very efficient data structure for dictionary search, word suggestions, error corrections etc.

<div align="center">
  <img src="Trie_example.svg.png" width="80%">
</div>

_A trie for keys "A", "to", "tea", "ted", "ten", "i", "in", and "inn". Each complete English word has an arbitrary integer value associated with it. [Wikipedia](https://en.wikipedia.org/wiki/Trie)_

Available in Rust as an external create https://docs.rs/trie-rs/latest/trie_rs/


In [2]:
:dep trie-rs="0.1.1"
fn testme() {
    use std::str;
    use trie_rs::TrieBuilder;
    let mut builder = TrieBuilder::<u8>::new();  
    
    builder.push("to");
    builder.push("tea");
    builder.push("ted");
    builder.push("ten");
    builder.push("teapot");
    builder.push("in");
    builder.push("inn");
    let trie = builder.build();
    println!("Find suffixes of \"te\"");
    let results_in_u8s: Vec<Vec<u8>> = trie.predictive_search("te");
    let results_in_str: Vec<&str> = results_in_u8s
        .iter()
        .map(|u8s| str::from_utf8(u8s).unwrap())
        .collect();
    println!("{:?}", results_in_str);
}

testme();


Find suffixes of "te"
["tea", "teapot", "ted", "ten"]


### To Insert a Word

1. **Start at the Root**
   - Begin at the root node of the trie
   - The root node represents an empty string

2. **Process Each Character**
   - For each character in the word you want to insert:
     - Check if the current node has a child node for that character
     - If yes: Move to that child node
     - If no: Create a new node for that character and make it a child of the current node

3. **Mark the End**
   - After processing all characters, mark the final node as an end-of-word node
   - This indicates that a complete word ends at this node

Let me illustrate with an example. Suppose we want to insert the word "tea" into an empty trie:

```
Step 1: Start at root
   (root)

Step 2: Insert 't'
   (root)
     |
     t

Step 3: Insert 'e'
   (root)
     |
     t
     |
     e

Step 4: Insert 'a'
   (root)
     |
     t
     |
     e
     |
     a*

Step 5: Mark 'a' as end-of-word (shown with *)
```

If we then insert "ted", the trie would look like:
```
   (root)
     |
     t
     |
     e
    / \
   a*  d*
```

The asterisk (*) marks nodes where words end. This structure allows us to:
- Share common prefixes between words
- Efficiently search for words
- Find all words with a given prefix
- Check if a word exists in the trie



### Most likely your spellchecker is based on a trie

In this case, we will compare possible close matches and then sort by frequency of occurence in some corpus and present top 3-5.

If your word is not in the trie do the following:

* Step 1: Find the largest prefix that is present and find the trie words with that prefix
* Step 2: Delete the first letter from your word and redo Step 1
* Step 3: Insert a letter (for all letters) to the beginning of the word and redo Step 1
* Step 4: Replace the beginning letter with a different one (for all letters) and redo Step 1
* Step 5: Transpose the first two letters and redo Step 1
* Step 6: Collect all words from Steps 1-5 sort by frequency of occurrence and present top 3-5 to user

## In-Class Poll

https://piazza.com/class/m5qyw6267j12cj/post/411
