# Data Structures

## 1. Data Structure Operations

There are common ways that we can interact with different data structures. 

It is useful to frame the appropriateness of a data structure for a given based on the speed of the operations that are required most for that task.

- Read: Look up the value at a particular location in the data structure
- Search: Look for a particular value in the data structure.
- Insert: Add a new value to the data structure.
- Delete: Remove a value from the data structure.

Reading is "find by key", searching is "find by value".


## 2. Arrays

An array is a list of elements.

It is stored in memory as a block of contiguous memory addresses. When the array is declared, the head of the array is stored, i.e. the memory address of the first elelment.

The **size** of an array is the number of elements in the array.
The **index** denotes where a particular piece of data resides in that array.


| Operation | Complexity (Worst) | Complexity (Best)    |
|-----------|--------------------|----------------------|
| Read      | O(1)               | O(1)                 |
| Search    | O(N)               | O(1)                 |
| Insertion | O(N)               | O(1)                 |
| Deletion  | O(N)               | O(1)                 |


### 2.1 Reading an Array

A computer can look up a given memory address in constant time.

We know the head of the array and the index to look up. So we can read in **O(1)** time.

Example:

1. Head of the array is memory address 1063.
2. We want to look up index 5.
3. Read memory address 1068 (because 1063 + 5 = 1068).

> If you were asked to raise your right pinky finger, you wouldn't need to search through all of your fingers to find it


### 2.2 Searching an Array

A computer has immediate access to all of its memory addresses but does not know ahead of time their *contents*.

So to find a particular value, we will potentially have to search through every element.

Searching an array is therefore **O(N)**.


### 2.3. Insertion into an Array

The efficiency of inserting into an array depends on **where** in the array you are inserting.

If inserting an element at the end, we simply place the new value at that memory address (assuming the memory address is empty). This is a constant time operation O(1).

But if we insert at any other position, we need to:

1. Shift each of the existing elements to the right of the insert index 1 position rightwards
2. Insert the new value in the gap created.

So to insert at index $i$, there are $N - i$ elements to shift (Step 1), then 1 more operation to insert the new value (Step 2).

In the worst case - inserting at the start of an array - insertion is **O(N)**.


### 2.4. Deletion from an Array

Similarly, the efficiency of deletion depends on the index being deleted.

If deleting the last element, there is simply 1 operation to clear the memory address, so this is a constant time operation O(1).

But if we delete an element in any other position, we need to:

1. Delete the element. This leaves a gap in the middle of the array.
2. Shift each of the elements to the right of the gap leftwards, to fill the gap.

So to delete at index $i$, we do 1 operation to delete that element (Step 1), then shift the next $N-i$ elements leftwards (Step 2).

In the worst case - deleting the first element of the array - deletion is **O(N)**.


## 3. Sets

A set is a collection of *unique* elements, i.e. duplicate values are not allowed.

There are different ways of implementing sets: array-based sets and hash-sets are discussed here.

Note that Python already has sets, but we'll give outline implementations for clarity.


### 3.1. Array-based Sets

An array is used to store elements. As with standard arrays, elements are stored in contiguous memory locations, and each element has a unique index.

Example in Python:

In [None]:
class ArraySet:
    
    def __init__(self):
        self.elements = []
        
    def __repr__(self):
        return str(self.elements)

    def add(self, element):
        # Search the array for `element`, then append it if it is not a duplicate.
        if element not in self.elements:
            self.elements.append(element)

    def remove(self, element):
        # Search the array for the value, then remove it.
        if element in self.elements:
            self.elements.remove(element)

    def contains(self, element):
        return element in self.elements

In [13]:
set_arr = ArraySet()
set_arr.add(1)
set_arr.add(2)
set_arr.add(3)
print(set_arr)

[1, 2, 3]


If we try to add a duplicate value it does not get added to the array:

In [14]:
set_arr.add(2)
print(set_arr)

[1, 2, 3]


The read, search and delete operations for an array-based set are identical to the standard array.

**Insert** operations are where array-based sets diverge. We always insert at the end of a set, which is constant time. *But* we need to search the array every time to ensure the new value is not a duplicate.

So we always need to do a search of all N elements, and then 1 insert operation at the end.

This means even in the best case, insertion into an array-based set is O(N) compared to O(1) when inserting at the end of a standard array.

The reason for using a set is because the use case requires no duplicates, not because it is inherently "quicker" than a standard array.


| Operation | Complexity (Worst) | Complexity (Best)    |
|-----------|--------------------|----------------------|
| Read      | O(1)               | O(1)                 |
| Search    | O(N)               | O(1)                 |
| Insertion | O(N)               | **O(N)**             |
| Deletion  | O(N)               | O(1)                 |

### 3.2. Hash Sets

A hash-based set computes the hash of each element and uses this to store elements. 

An example implementation implements the set as key-value pairs where keys are the hash of the elements and values are a placeholder value like True, or an array to handle hash collisions.

When there is a hash collision between mutliple elements, a typical approach is to insert all of these elements as an array under the same hash key.

The worst case scenario is caused by the extreme edge case where hash collisions are so prominent that *every* element has the same hash, essentially reducing the hash set to an array. This is generally avoided as long as the hash algorithm is decent.

For this reason, the **average** complexity is more meaningful in the table below. (Note that best has been replace with average in the table headings.)

Hash-based sets do not support reading by index, unlike array-based sets. But all other operations are typically constant time.

| Operation | Complexity (Worst) | Complexity **(Average)** |
|-----------|--------------------|--------------------------|
| Read      | N/A                | N/A                      |
| Search    | O(N)               | O(1)                     |
| Insertion | O(N)               | O(1)                     |
| Deletion  | O(N)               | O(1)                     |


Example in Python:

In [None]:
class HashSet:
    def __init__(self):
        # Use a dict to represent the hash set.
        self.elements = {}
        
    def __repr__(self):
        return str(self.elements)

    def add(self, element):
        # The key is the element and the value is arbitrary.
        # There are two extensions we could add here:
        #   1. The key should really be the *hash* of the element, not just the element itself.
        #      Essentially, this is using an implicit hash function which is just a pass-through:
        #      hash_func = lambda x: x
        #   2. Handle hash collisions by making the value an array which is appended to in the case of collisions.
        self.elements[element] = True

    def remove(self, element):
        if element in self.elements:
            del self.elements[element]

    def contains(self, element):
        return element in self.elements

In [21]:
set_hash = HashSet()
set_hash.add(1)
set_hash.add(2)
set_hash.add(3)
print(set_hash)

{1: True, 2: True, 3: True}


If we try to add a duplicate value it simply overwrites the previous value:

In [23]:
set_hash.add(2)
print(set_hash)

{1: True, 2: True, 3: True}


## 4. Ordered Arrays

These are identical to regular arrays with the additional condition that elements are always ordered.

This obviously relies heavily on efficient sorting. This is a topic unto itself; see notes on sorting for more info.

When **inserting** into an ordered array, we need to:

1. Search for the correct position - Look at each element in turn and compare if the insert element is greater than it
2. Insert into the array

These two terms increase in opposite directions depending on the insert position. The further into the array we need to search (Step 1), the fewer elements we need to shift for the insertion (Step 2).


### 4.1. Binary Search

In a typical (unordered) array, the only option for searching is a *linear search*: we loop through each element in turn until we find our target.

For an ordered array, we can improve on this using a **binary search**.

1. Pick the middle element. 
2. If the target value is greater than this, search the right half, otherwise search the left half.
3. Repeat this recursively until we find our target.


This approach splits the search region in half for every constant time comparison operation.

Or put another way, if we doubled the number of elements in the array, the binary search would only have to perform 1 extra step. For $N$ elements we need $log_2(N)$ binary splits.

Therefore, the time complexity is **O(log(N))**.

In [23]:
def binary_search(ordered_array, target):
    """Perform a binary search for the target value on the given ordered array.

    Parameters
    ----------
    ordered_array: list
        The array to search in.
    target: int
        The target value we are searching for.

    Returns
    -------
    target_index: int
        The index of the target value.
        Returns None if the value does not exist in the array.
    """
    # Establish the lower and upper bounds of our search.
    # Initially, this is the entire array
    idx_lower = 0
    idx_upper = len(ordered_array) - 1

    while idx_lower <= idx_upper:
        # Find the midpoint between our bounds
        idx_midpoint = (idx_upper + idx_lower) // 2
        value_at_midpoint = ordered_array[idx_midpoint]

        # Compare to our target value and narrow the upper or lower bound accordingly
        if value_at_midpoint == target:
            # We have found the target!
            return idx_midpoint
        elif value_at_midpoint < target:
            # The target is bigger so must be to the right side
            idx_lower = idx_midpoint + 1
        elif value_at_midpoint > target:
            # The target is smaller so must be on the left side
            idx_upper = idx_midpoint - 1

    # If the lower and upper bounds meet we have exhausted the whole array, so the target is not in the array
    return None

Let's try this on a few examples.

In [34]:
ordered_array = [1, 2, 4, 5, 7, 8, 9, 10, 13, 14]

In [35]:
binary_search(ordered_array, 7)

4

In [36]:
binary_search(ordered_array, 14)

9

Now a value that's not in the array:

In [37]:
binary_search(ordered_array, 14)

9

Compare this with a linear search

In [46]:
def linear_search(array, target):
    """Perform a linear search for the target value on the given array.

    Parameters
    ----------
    array: list
        The array to search in.
    target: int
        The target value we are searching for.

    Returns
    -------
    target_index: int
        The index of the target value.
        Returns None if the value does not exist in the array.
    """
    # Loop through every element in the array.
    # Note: we should really use enumerate() rather than range(len()) but I wanted to keep this generic 
    # without too many python-specific helpers
    for idx in range(len(array)):
        if array[idx] == target:
            return idx
    
    # If we reach the end of the array without returning a value, then the target does not exist in the array.
    return None

Let's compare how they perform for a reasonably big array with 1 million elements.

In [None]:
array = [k for k in range(1000000)]

In [51]:
%%timeit
binary_search(array, 987654)

1.13 µs ± 56.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [52]:
%%timeit
linear_search(array, 987654)

15.9 ms ± 320 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


The binary search is ~14000x faster than the linear search!

## 5. Hash Tables

Hash tables are key:value pairs. We can look up a the value for a given key in $O(1)$ time.

Also known as hash maps, dictionaries, maps, associative arrays.


### 5.1. Hashing

The process of taking characters and converting them to numbers is called **hashing**.

The code that performs this conversion is the **hash function**.

A hash function requires one condition:

> Consistency: A hash function must convert the same string to the same number every time it's applied.

In practice, for the hash function to be useful, it should also be collision resistant:

> Collision resistant: Different inputs should hash to different outputs.

As an extreme example, the following hash function is consistent but not collision resistant:
```python
def crappy_hash(input_str: str) -> int:
    """This is the world's worst hash function."""
    return 1
```


### 5.2. Hash Table Lookups

We want to insert the following key:value pair into our hash table:
```
key = "Name"
value = "Gurp"
```

Let's say we have a hash function that is actually good, and in this particular case `hash_function(key)` returns `12`.

The hash table will then insert `value` at memory address `12` (or more specifically, the memory address of the head of the dictionary + 12).

This means if we ever want to look up the key `"Name"`, we hash it and immediately know to access memory address 12 and return the value `"Gurp"`.

So hash table lookups are $O(1)$.

More specifically, looking up by **key** is $O(1)$. Searching by **value** is essentially searching through an array, so is $O(N)$.


### 5.3. Dealing with Collisions

A **collision** occurs when we try to add data to a cell that is already occupied.

One approach to handling this is called **separate chaining**.

Instead of placing a single value in a cell, we place a *pointer to an array*. This array contains length-2 subarrays where the first element is the key and the second element is the value.

If there are no collisions, a hash table look up is $O(1)$.
In the worst case, ALL keys collide and so we essentially have to search through an array which is $O(N)$.


### 5.4. Hash Table Efficiency
A hash tables efficiency depends on:

1. How much **data** we're storing in it
2. How many **cells** are available
3. Which **hash function** we use

A good hash function (3) should distribute the data (1) evenly across all cells (2).

This ensures the memory is used efficiently while avoiding collisions.

The **load factor** is the ratio of data to cells, and ideally should be around 0.7, i.e. 7 elements for every 10 cells.


| Operation | Complexity (Worst) | Complexity **(Average)** |
|-----------|--------------------|--------------------------|
| Read      | O(N)               | O(1)                     |
| Search    | O(N)               | Keys: O(1) Values: O(N)  |
| Insertion | O(N)               | O(1)                     |
| Deletion  | O(N)               | O(1)                     |

The worst case corresponds to when all keys collide, reducing the hash table to an array effectively.

## 6. Stacks

A stack is stored in memory the same as an array, but it has 3 constraints:

1. Data can only be **inserted** at the end (push)
2. Data can only be **deleted** from the end (pop)
3. Only the last element can be **read** (peek)

> Read, insert and delete can only happen at the end.

This makes them useful as Last-In First-Out (LIFO) data stores: the last item pushed the the stack is the first to be popped.

Example in Python:

In [5]:
class Stack:

    def __init__(self, initial_elements: list = []):
        # We can pass a list to initialise the Stack
        self.data = initial_elements

    def __repr__(self):
        return str(self.data)

    def push(self, element):
        self.data.append(element)

    def pop(self):
        return self.data.pop()

    def peek(self):
        if self.data:
            return self.data[-1]


In [15]:
stack = Stack([1, 2, 3, 4])
stack

[1, 2, 3, 4]

In [16]:
stack.push(5)
print(stack)

[1, 2, 3, 4, 5]


In [17]:
stack.peek()

5

In [18]:
print(stack)

[1, 2, 3, 4, 5]


In [19]:
stack.pop()

5

In [20]:
print(stack)

[1, 2, 3, 4]


The benefits of stacks, and other constrained data structures, are:

1. Prevent potential bugs when using certain algorithms. For example, an algorithm that relies on stacks may break if removing elements from the middle of the array, so using a standard array is more error-prone.
2. A new mental model for tackling problems. In the case of stacks, this is the LIFO approach.


The concept of stacks is a useful precursor to recursion, as we push to and pop from the end of a stack.

## 7. Queues

A queue is conceptually similar to a stack - it is a constrained array. This time, it is First-In First-Out (FIFO) like a queue of people; the first person to arrive is the first to leave.

Queue restrictions:

1. Data can only be **inserted** at the **end** (enqueue)
2. Data can only be **deleted** from the **front** (dequeue)
3. Data can only be **read** from the **front** (peek)

Points (2) and (3) are the opposite of the stack.

In [21]:
class Queue:

    def __init__(self, initial_elements: list = []):
        # We can pass a list to initialise the Stack
        self.data = initial_elements

    def __repr__(self):
        return str(self.data)

    def enqueue(self, element):
        self.data.append(element)

    def dequeue(self):
        return self.data.pop(0)

    def peek(self):
        if self.data:
            return self.data[0]

In [22]:
q = Queue([1, 2, 3, 4])
q

[1, 2, 3, 4]

In [23]:
q.enqueue(5)
print(q)

[1, 2, 3, 4, 5]


In [24]:
q.dequeue()

1

In [25]:
print(q)

[2, 3, 4, 5]


In [26]:
q.peek()

2

In [27]:
print(q)

[2, 3, 4, 5]


## 8. Linked Lists

A linked list represents a list of items as **non-contiguous** blocks of memory.

It is a list of items, similar to an array. But an array occupies a continuous block of memory.


| Operation | Complexity (Worst) | Complexity (Best)*       |
|-----------|--------------------|--------------------------|
| Read      | O(N)               | O(1)                     |
| Search    | O(N)               | O(1)                     |
| Insertion | O(N)               | O(1)                     |
| Deletion  | O(N)               | O(1)                     |

The best case corresponds to operating on the head node.

In a linked list, each element is contained in a **node** that can be in scattered positions in memory. The node contains the data element and a "link" which is a pointer to the memory address of the next element.

Benefits of a linked list over an array:

1. Memory efficient: we don't need a continuous block of memory
2. $O(1)$ inserts and deletes from the beginning of the list
3. Useful when we want to traverse through a data structure while making inserts and deletes, because we do not have to shift the entire data structure each time as we would have to with an array

A node contains two pieces of information:

Data | Link
-----|-----
"a"  | 1666

These nodes can then be linked together in a list... a linked list!

```{mermaid}
flowchart LR

  A("'a'|1666") --> B("'b'|1984") --> C("'c'|1066") --> D("...") --> E("'z'|null")

```

The link of the last node is null to indicate the end of the list.


### 8.1. Implementating a Node

We first need a node data structure, which will hold our data and a link to the next node.

We'll point to the next node itself, rather than its memory address. This still has the same effect as nodes are scattered throughout different memory locations.

In [115]:
class Node:

    def __init__(self, data, link=None):
        self.data = data
        self.link = link
    
    def __repr__(self) -> str:
        return f"Data: {self.data}\tLink: \n{self.link}"

Create some nodes and link them together

In [116]:
node1 = Node("a")
node2 = Node("b")
node3 = Node("c")
node4 = Node("d")

This is what a single node looks like:

In [117]:
print(node1)

Data: a	Link: 
None


Now we link them

In [118]:
node1.link = node2
node2.link = node3
node3.link = node4

In [119]:
node1

Data: a	Link: 
Data: b	Link: 
Data: c	Link: 
Data: d	Link: 
None

### 8.2. Implementing a Linked List

The linked list simply keeps track of the **head**, i.e. the first node in the list.

When using linked lists, we only have immediate access to this first node. For any other values, we need to start at the head node and traverse the list.

In [120]:
class LinkedList:

    def __init__(self, head):
        self.head = head

    def __repr__(self) -> str:
        return str(self.head)

In [121]:
ll = LinkedList(node1)
ll

Data: a	Link: 
Data: b	Link: 
Data: c	Link: 
Data: d	Link: 
None

### 8.3. Reading from a Linked List

We start at the head an traverse the list until we reach the desired index.

This means they ar $O(N)$ in the worst case.

In [122]:
class LinkedList:

    def __init__(self, head):
        self.head = head

    def __repr__(self) -> str:
        return str(self.head)
    
    def read(self, index):
        """Read the node at the given index."""
        current_idx = 0
        current_node = self.head

        while (index > current_idx):
            if current_node.link is None:
                # The index does not exist in the linked list, we have reached the end
                return None
            
            current_node = current_node.link
            current_idx += 1          

        return current_node


In [123]:
ll = LinkedList(node1)
ll.read(2)

Data: c	Link: 
Data: d	Link: 
None

In [124]:
ll.read(10)

### 8.4. Searching a Linked List

To search for a value, again we have to traverse the whole list.

This means the worst case complexity is $O(N)$.

The mechanics of searching are the same as reading - we traverse the graph. The difference is we keep going until we find the value or reach the end of the list, rather than stopping at a predetermined index with `read`.

In [125]:
class LinkedList:

    def __init__(self, head):
        self.head = head

    def __repr__(self) -> str:
        return str(self.head)
    
    def read(self, index):
        """Read the node at the given index."""
        # Start at the head
        current_idx = 0
        current_node = self.head

        # Traverse the list until we find the desired index
        while (index > current_idx):
            if current_node.link is None:
                # The index does not exist in the linked list, we have reached the end
                return None
            
            current_node = current_node.link
            current_idx += 1          

        return current_node
    
    def search(self, value):
        """Find the index of the given value."""
        # Start at the head
        current_idx = 0
        current_node = self.head

        # Loop until we reach the None value which denotes the end of the list
        while current_node:
            if current_node.data == value:
                # We've found our target value
                return current_idx
            # Try the next node
            current_node = current_node.link
            current_idx += 1

        # We have traversed the whole list without finding a matching value
        return None


In [126]:
ll = LinkedList(node1)
ll.search('c')

2

### 8.5. Inserting into a Linked List

Inserting a node into a linked list **where we already have the current node** is an $O(1)$ operation. 

1. **Point to the next node**. `new_node.link = current_node.link`
2. **Link from the previous node**. `current_node.link = new_node`

With a linked list, we only have the head node, so we can insert at the start in $O(1)$ time. 

But to insert at any other point, we have to traverse there first (an $O(N)$ operation) and then do the insert.

This is the key point of linked lists: insertion at the beginning is $O(1)$ but at the end is $O(N)$. This is the opposite of arrays, meaning linked lists are useful in cases where insertions are mostly at the beginning.



In [127]:
class LinkedList:

    def __init__(self, head):
        self.head = head

    def __repr__(self) -> str:
        return str(self.head)
    
    def read(self, index):
        """Read the node at the given index."""
        # Start at the head
        current_idx = 0
        current_node = self.head

        # Traverse the list until we find the desired index
        while (index > current_idx):
            if current_node.link is None:
                # The index does not exist in the linked list, we have reached the end
                return None
            
            current_node = current_node.link
            current_idx += 1          

        return current_node
    
    def search(self, value):
        """Find the index of the given value."""
        # Start at the head
        current_idx = 0
        current_node = self.head

        # Loop until we reach the None value which denotes the end of the list
        while current_node:
            if current_node.data == value:
                # We've found our target value
                return current_idx
            # Try the next node
            current_node = current_node.link
            current_idx += 1

        # We have traversed the whole list without finding a matching value
        return None
    

    def insert(self, value, index):
        """Insert the value at the given index."""
        new_node = Node(value)

        if index == 0:
            # Link to the old head and update the linked lists head
            new_node.link = self.head
            self.head = new_node
            return
        
        # Traverse the linked list until we find our node
        current_node = self.head
        current_idx = 0
        while current_idx < index - 1:
            current_node = current_node.link
            current_idx += 1

        # Update the links to insert the new node
        new_node.link = current_node.link
        current_node.link = new_node
        return 


        


Insert a new head of our linked list

In [128]:
ll = LinkedList(node1)
ll

Data: a	Link: 
Data: b	Link: 
Data: c	Link: 
Data: d	Link: 
None

In [129]:
ll.insert('new_head', 0)

In [130]:
ll

Data: new_head	Link: 
Data: a	Link: 
Data: b	Link: 
Data: c	Link: 
Data: d	Link: 
None

Insert in the middle

In [131]:
ll.insert("I'm new here", 3)

In [132]:
ll

Data: new_head	Link: 
Data: a	Link: 
Data: b	Link: 
Data: I'm new here	Link: 
Data: c	Link: 
Data: d	Link: 
None

### 8.6. Deleting from a Linked List

It is quick to delete from the beginning of a linked list for the same reasons as insertion.

1. Make the previous node point to the next next node

In [135]:
class LinkedList:

    def __init__(self, head):
        self.head = head

    def __repr__(self) -> str:
        return str(self.head)
    
    def read(self, index):
        """Read the node at the given index."""
        # Start at the head
        current_idx = 0
        current_node = self.head

        # Traverse the list until we find the desired index
        while (index > current_idx):
            if current_node.link is None:
                # The index does not exist in the linked list, we have reached the end
                return None
            
            current_node = current_node.link
            current_idx += 1          

        return current_node
    
    def search(self, value):
        """Find the index of the given value."""
        # Start at the head
        current_idx = 0
        current_node = self.head

        # Loop until we reach the None value which denotes the end of the list
        while current_node:
            if current_node.data == value:
                # We've found our target value
                return current_idx
            # Try the next node
            current_node = current_node.link
            current_idx += 1

        # We have traversed the whole list without finding a matching value
        return None
    

    def insert(self, value, index):
        """Insert the value at the given index."""
        new_node = Node(value)

        if index == 0:
            # Link to the old head and update the linked lists head
            new_node.link = self.head
            self.head = new_node
            return
        
        # Traverse the linked list until we find our node
        current_node = self.head
        current_idx = 0
        while current_idx < index - 1:
            current_node = current_node.link
            current_idx += 1

        # Update the links to insert the new node
        new_node.link = current_node.link
        current_node.link = new_node
        return 
    
    def delete(self, index):
        """Delete the value at the given index."""
        if index == 0:
            # We are deleting the head node, so point at the second node instead
            self.head = self.head.link
            return
    
        # Traverse the linked list until we find our node
        current_node = self.head
        current_idx = 0
        while current_idx < index - 1:
            current_node = current_node.link
            current_idx += 1

        # Skip the next node (which we are deleting) and point ot its link instead
        current_node.link = current_node.link.link
        return       

In [136]:
ll = LinkedList(node1)
ll

Data: a	Link: 
Data: b	Link: 
Data: I'm new here	Link: 
Data: c	Link: 
Data: d	Link: 
None

Delete the head node

In [137]:
ll.delete(0)
print(ll)

Data: b	Link: 
Data: I'm new here	Link: 
Data: c	Link: 
Data: d	Link: 
None


Delete a middle node

In [138]:
ll.delete(1)
print(ll)

Data: b	Link: 
Data: c	Link: 
Data: d	Link: 
None


### 8.7. Doubly Linked Lists

A doubly linked list is a variant where each node contains pointers to the previous node and the next node.

Data | Previous | Next
-----|----------|------- 
"a"  | null     | 1666

The linked list tracks the head and tail.

This makes it quicker to read/insert/delete from either the beginning or end. We can also traverse backwards or forwards through the list.


```{mermaid}
flowchart LR

  A("'a'|null|1666") <---> B("'b'|1234|1984") <--> C("'c'|1666|1066") <--> D("...") <--> E("'z'|1993|null")

```


Doubly linked lists are a good data structure to use for queues, since we can insert/delete at either end.

## 9. Binary Search Trees

We can have some use cases where we want to keep our data sorted. 

Sorting is expensive, $O(N log N)$ at the best of times, so we want to avoid sorting often. Ideally we would keep our data sorted at all times.
An ordered array *could* do the job, but insertions and deletions are slow as we have to shift a chunk over the array every time.

We want a data structure that:

1. Maintains order
2. Has fast inserts, deletes and search

This is where a binary search tree comes in.


### 9.1. Trees 

Trees are another node-based data structure when each node can point to multiple other nodes.


```{mermaid}
flowchart TD


  A(a) --> B(b)
  A(a) --> C(c)

  B(b) --> D(d)
  B(b) --> E(e)

  C(c) --> F(f)
```

- The **root** is the uppermost node.
- `a` is the **parent** of `b` and `c`; `b` and `c` are **children** of `a`.
- The **descendants** of a node are all of its children and its children's children's children etc. The **ancestors** of anode are its parents and its parent's parent's parents etc.
- Each horizontal layer ofthe tree is a **level**.
- A tree is **balanced** if all of its subtrees have the same number of nodes.


### 9.2. Binary Search Tree Rules

A **binary tree** is one in which each node can have at most 2 children.

A **binary search tree**must abide by the following rules:

1. Each node can have at most one "left" child and one "right" child
2. A node's left descendants are all smaller than the node. It's right descendants are all larger.


### 9.3. Implementation

In [161]:
class TreeNode:

    def __init__(self, value, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

    def __repr__(self):
        return f"TreeNode with value: {self.value}"


class Tree: 

    def __init__(self, root_node):
        self.root_node = root_node

    def __repr__(self) -> str:
        return f"Tree with root: {self.root_node}"


In [162]:
tree_node2 = TreeNode('b')
tree_node3 = TreeNode('c')
tree_node1 = TreeNode('a', tree_node2, tree_node3)

tree = Tree(tree_node1)

In [163]:
tree_node1

TreeNode with value: a

In [164]:
tree_node1.left

TreeNode with value: b

In [165]:
tree_node1.right

TreeNode with value: c

In [166]:
tree

Tree with root: TreeNode with value: a