# Learning Objectives

- [ ] *1.2.3 Implement search programs.
    - Hash Table Search
- [ ] 1.3.1 Understand the concept of static allocation of memory.
- [ ] 1.3.2 Understand the concept of dynamic allocation of memory.
- [ ] 1.3.3 Create, insert, and delete operations for stack and queue (linear and circular).
- [ ] 1.3.4 Understand the concept of free space list (which could be another linked list or an array).
- [ ] 1.3.5 Create, update (edit, insert, delete) and search operations for a linear linked list. Exclude: doubly-linked list and circular linked list
- [ ] 1.3.6 Create, update (edit, insert, delete) and search operations for a binary search tree. *Exclude: deletion of nodes from binary search tree
- [ ] 1.3.7 Understand pre-order, in-order and post-order tree traversals; and application of in-order tree traversal for binary search tree.
- [ ] 2.3.2 Implement search programs.
    - Hash Table Search
- [ ] 2.3.3 Write programs to implement operations for stacks, queues (linear and circular), linear linked lists and binary search trees. *Exclude: doubly-linked list and circular linked list*

# References

1. Leadbetter, C., Blackford, R., & Piper, T. (2012). Cambridge international AS and A level computing coursebook. Cambridge: Cambridge University Press.
2. https://www.geeksforgeeks.org/tree-traversals-inorder-preorder-and-postorder/

# 10.0 Abstract Data Type

Abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior from the point of view of a user, of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations.  This is analogous to an algebraic structure in mathematics. 

For example, integers are an ADT, defined as the values ..., −2, −1, 0, 1, 2, ..., and by the operations of addition, subtraction, multiplication, and division, together with greater than, less than, etc., which behave according to familiar mathematics (with care for integer division), independently of how the integers are represented by the computer. Explicitly, "behavior" includes obeying various axioms (associativity and commutativity of addition, etc.), and preconditions on operations (cannot divide by zero).

This mathematical model contrasts with **data structures**, which are concrete representations of data, and are the point of view of an implementer, not a user.

Typically integers are represented in a data structure as binary numbers, most often as two's complement, but might be binary-coded decimal or in ones' complement, but the user is abstracted from the concrete choice of representation, and can simply use the data as data types.

In conclusion, **Abstract Data Type** (ADT): 
- specifies interactions/operations of the objects of the data type. 
- has no code. It is not the actual implementation.
- often has more than one way to be implemented.

While **Data Structure** (DS) is the actual representation of the data and the algorithms to manipulate the data elements. i.e concrete implementation of a ADT. Also, one allowable operation in ADT is one function implemented in the associated DS.

As such, the **main advantage of ADT** is that users only need to understand the allowable operations of an ADT before using it. No knowledge of actual implementation is required.

## 10.0.1 Static vs Dynamic Data Structures

**Static data structures** do not change in size while the program is running. A typical static data structure is an array. Once you declare its size (or upper bound), it cannot be changed. Some programming languages do allow the size of arrays to be changed (like Python's `list`), in which case they are dynamic data structures. For the purposes of A-Level syllabus, <u>an array is considered a static data structure</u>.

**Dynamic data structures** can increase and decrease in size while a program is running. A typical dynamic data structure is a linked list, which we will see later in this chapter.

### Advantages and Disadvantages of Static Data Structures
- Advantages
    - Space allocated during compilation
    - Easier to program
    - Easy to check for overflow
    - Random access, enables you to read or writeinformation anywhere in the file
- Disadvantages
    - Space allocation is fixed
    - Wastes space when only partially used
    
### Advantages and Disadvantages of Dynamic Data Structures
- Advantages
    - Only uses space required
    - Efficient use of memory
    - Emptied storage can be returned to the system
- Disadvantages
    - Difficult to program
    - Searches can be slow
    - Serial access, i.e. you can only read and write information sequentially, starting from the beginning of the file.

# 10.1 Stack


**Stack** is an ADT which stores items in order in which they are added and the **last item to join is the first item to leave**.
* Items can only be <u>added to</u> and <u>removed from</u> the **top of the stack**.
* The order is also called **Last-In-First-Out (LIFO)**.

<center>
<img src="images/mario_2.jpg" width="250" align="center"/>
</center>

## 10.1.1 Stack Operations
The basic operations of a stack is to add and remove item from its top. 
- `push()`: Add item to the stack
- `pop()`: Remove item from the stack

Other supporting functions that can be added are:
- `is_empty()`: return `True` if the stack is empty, return `False` if it's not
- `size()`: return the number of items in the stack
- `peek()`: return the element on the top of the stack

### Exercise 1

Define a `Stack` class which implements the operations of a Stack:
* Initialize an empty list `_items` in its initializer method.
* Implement `push()` and `pop()` methods with basic operations of a stack. HINT: what list methods do you think is useful here?

Test your code using the following code block.
>```python
> s = Stack()
> s.push('apple')
> s.push('banana')
> print(s._items)
> print(s.pop())
> print(s.pop())
> print(s.pop())
>```

In [None]:
#YOUR_CODE_HERE

### Exercise 2

Define a `BetterStack` class which inherits from `Stack` class that includes the supplementary methods 
- `is_empty()`, 
- `size()`, 
- `peek()`.

In [None]:
#YOUR_CODE_HERE

## 10.1.2 Example Uses of Stack

- Reverse a sequence.
- Detect missing symbols, e.g. missing opening or closing bracket.


# 10.2 Queue
**Queue** is an ADT which stores items in order in which they are added and the **first item to join is the first item to leave**. This order is also called First-In-First-Out (FIFO).

There are 2 types of queue:
- **linear queue** arranges data in a sequential order one, Items are added to the rear and removed from the front.
- **circular queue** arranges data similar to a circle by connecting the last element back to the first element. 

Imagine a Linear Queue which based an array where index $0$ is always the first item and index $n$ is always the last. In order to remove an item from the Linear Queue, then all items $1$ to $n$ must be shifted forward, what was in index 1 into index 0, those in index 2 to index 1, so on and so forth. This process would take a considerable amount of time for large queues and/or frequent operations on the queue. However, in a Circular Queue, pointing the head of the queue to the next item when one is removed becomes as simple as a single assignment and thus, there are less operations updating the queue.

## 10.2.1 Linear Queue Operations
The basic operations of a queue is to add and remove item from queue. 
- `enqueue()`: Add item to the queue
- `dequeue()`: Remove item from the queue

Other supporting functions that can be added are:
- `is_empty()`: return `True` if the queue is empty, return `False` if it's not
- `size()`: return the number of items in the queue
- `peek()`: return the element on the front of the queue

### Exercise 2

Define a `LQueue` class which implements the operations of a linear queue:
* Initialize an empty list `_items` in its initializer method.
* Implement `enqueue()` and `dequeue()` methods with basic operations of a stack. HINT: what list methods do you think is useful here?
* Also implement the supplementary methods 
    - `is_empty()`, 
    - `size()`, 
    - `peek()`.

Test your code using the following code block.
>```python
> q = Queue()
> q.enqueue('apple')
> q.enqueue('banana')
> print(q.size())
> q.dequeue()
> print(q.peek())
> print(q.dequeue())
> print(q.is_empty())
>```

In [None]:
#YOUR_CODE_HERE

## 10.2.1 Circular Queue Operations
The basic operations of a queue is to add and remove item from queue. 
- `enqueue()`: Add item to the queue
- `dequeue()`: Remove item from the queue

In [None]:
#YOUR_CODE_HERE

### Exercise 3

Implement a **priority queue** where job with higher weight will be processed first. We need to code two classes `Job` and `PriorityQueue`.

The `Job` class has only one instance attribute, `weight`. 
* Implement its `__str__()` method which returns its weight in string format.

The `PriorityQueue` class inherits from `LQueue` class by overriding its `enqueue()` method. 
* The new `enqueue()` method inserts item at appropriate position so that items with higher weight will be dequeue first.

In [None]:
#YOUR_CODE_HERE

## 10.2.2 Example Uses of Queue

- Printer Job Queue

# 10.3 Linked List

A **linear linked list** is a linear data structure which holds a collection of elements, called **Node**. Unlike the usual list, these nodes may not be <u>not stored at continuous memory locations</u>. Each node contains <u>a **data** and **pointer(s)**</u> pointing to other node(s).

* Nodes can be accessed in a sequential way.
* Linked list does not provide random access to a node.

When the Nodes are connected with only the `next` pointer the list is called **Singly Linked List** and when it’s connected by the `next` and `previous` pointers, the list is called **Doubly Linked List**. Doubly Linked List is not in the scope of A-Level syllabus. As such for our purpose, Linked List refers to Linear Singly Linked List, unless otherwise stated.

<center>
<img src="images/mario_2.jpg" width="250" align="center"/>
</center>

<img src="./images/adt-linked-list.png" alt="Queue" style="width: 350px;"/>
<center>https://medium.com/@lucasmagnum/sidenotes-linked-list-abstract-data-type-and-data-structure-fd2f8276ab53</center>

## 10.2.1 Linked List Operations
The basic operations of a queue is to add and remove item from queue. 
- `prepend()`: Add a node in the beginning
- `pop_first()`: Remove a node from the beginning
- `append()`: Add a node in the end
- `pop()`: Remove a node from the end
- `remove()`: Remove a node, which matches a value, from the list

Other supporting functions that can be added are:
- `is_empty()`: return `True` if the queue is empty, return `False` if it's not
- `size()`: return the number of items in the queue
- `peek()`: return the element on the top of the queue

### Exercise 1: Node

Implement a class `Node` for Linked List.
* It has an instance attribute `data` which holds data of the node, and another instance attribute `next` pointing to next node. 
* Both instance attributes are initialized by input parameters in initializer method.
* It implements `__repr__()` method which returns string `Node(data->next.data)`, e.g. `Node(A->B)` if the value for current and next nodes are `A` and `B` respectively.

In [None]:
#YOUR_CODE_HERE

### Excercise 2: Linked List

A Linked List contains an attribute `head` which points first node of the linked list. 

Implement a `LinkedList` class with following methods:
* Initializer method which initializes `head` to `None` since the initial linked list is empty.
* `is_empty()` method which returns `True` if linked list is empty, `False` otherwise
* `size()` method returns number of nodes in the list
* `contains()` method which return `True` if an item is found in the linked list, `False` otherwise

In [None]:
#YOUR_CODE_HERE

### Excercise 3: Linked List

A Linked List typically contains following methods.
* `prepend()`: Add a node in the beginning
* `pop_front()`: Remove a node from the beginning
* `remove()`: Remove Node, which matches a value, from the list

The `remove()` method will return `True` if a matching value is found in the linked list, else it will `return` False. The implementation needs to take care 4 scenarios:
* When the linked list is empty, i.e `head` is pointing to `None`
* When the item to be removed is the head node
* When the item to be removed is in any other node
* When the item to be removed is not found

Implement above methods in class `BetterLinkedList` which inherites from `LinkedList` class.

In [None]:
#YOUR_CODE_HERE

## 10.1.2 Example Uses of Linked List

- **Free-space list** is the list which keeps track of free space in memory. When we create a file, the free-space list is searched for the required amount of space and space is allocated to accomodate the new file. This space is then removed from the free-space list. On the flipside, when a file is deleted, the freed up disk space is added to the free-space list. A linked list can be used keep track of the free blocks. Free space list could also be implemented with an array. 

# 10.4 Hash Table 

A **hash table (hash map)** is a data structure that implements an associative array abstract data type, a structure that can map **keys** to **values(data)**. A hash table must come with a **hash function** to compute an index value (also called a **hash code**) which points to into an array of **hash buckets**, the place where data is stored. This is similar to a dictionary. 

<center>
<img src="images/mario_2.jpg" width="250" align="center"/>
</center>

<u>For example</u>, to store a phone book, you uses a person's name as key, and his phone number is the value(data) to be looked up.
- Key is passed into the hash function to generate an index value which points to a location where data is stored.
- Potentially multiple data may be stored in the same bucket, i.e. multiple keys may point to same bucket.

## 10.4.1 Hash Table Operations
The basic operations of a hash table is to add, find and remove item from table. 
- `add(key,value)`: Add `value` data associated with key `key`
- `find(key)`: return `value` data with provided `key`
- `remove(key)`: remove `key` and its associated `value` data from the table

### Exercise 1: 
Implement a hash table for a phone book. Each entry in the phone book is a pair of `Name` and `Phone`.
* `Name` is used as the key.
* `(Name, Phone)` tuple is saved as the data.

We will define a class `HashTable` to store the data.
* It has a list attribute `buckets` which keeps all data.
* Initialize the list size, i.e. how many buckets, by input parameter `size`.
* It has a <u>static</u> function `_hash()` which returns an `index` value based on input parameter `key`. The logic to be implemented in `_hash()` function is straight forward. We will simply return length of the `key` as the `index` value.
* The `index` value specifies which bucket to put the data.

In [None]:
#YOUR_CODE_HERE

### Exercise 2
Let's try to add following items into the Hash Table.
* Create a hash table of 10 buckets.
* For each element in the list `contacts` below, 
    >```python
    contacts = [
        ('Ben', '357-0394'),
        ('Alan', '558-9171'),
        ('Freddi', '760-2466'),
        ('Stephanie', '299-5109')]
    >```
    
    * Use `_hash()` function to find out which bucket it belongs to;
    * Put the contact in the bucket.
    * Print out the `buckets` to view how contacts are stored.

In [None]:
#YOUR_CODE_HERE

### Exercise 3

With the populated hash table, how do you retrieve the data of for a name, e.g. `'Freddi'`?
* Use `_hash()` function to find `index` value.
* Locate the bucket by index.
* Return the bucket.

In [None]:
#YOUR_CODE_HERE

### Exercise 4

We may need to remove an item, e.g. `'Freddi'`, from the hash table.
* Use `_hash()` function to find index value.
* Locate the bucket by index and set it to `None`.

In [None]:
#YOUR_CODE_HERE

In this example, the hash function generates different index values for each of the entries and the data are stored in different buckets. As such, search and delete has $O(1)$ time complexity. 

## 10.4.2 Hash Table Collision

Ideally, the hash function will assign each key to a unique bucket. But since a hash function returns a small number for a big key, there is possibility that two keys result in same value. That is **hash table collision**.

### Example 5

Consider the following list `contacts` where **the hash function generates same index value for all entries**, and thus, all data are stored in same bucket. 

>```python
    contacts = [
        ('Amanda', '357-0394'),
        ('Christ', '558-9171'),
        ('Freddi', '760-2466'),
        ('Steven', '299-5109')]
>```

Since all contacts' name has length of 6 characters, their hashed indexes point to the same bucket. Thus 6th bucket needs to be able to hold multiple contacts.

For simplicity, We will implement a bucket as a list.


<center>
<img src="images/mario_2.jpg" width="250" align="center"/>
</center>

In [None]:
#YOUR_CODE_HERE

This is the worst case where a hash table acts a list and time spent in searching is $O(n)$. To improve efficiency, we need a better hash function.

To achieve a good hashing mechanism, It is important to have a good hash function with the following basic requirements:
* There should be **minimum collisions** as far as possible in the hash function that is used. 
* A hash function should have an **easy computation for the unique keys**.
* Hash function should result in a **uniform distribution** of data across the hash table and thereby **prevent clustering**.

As collisions are bound to occur, we have to use appropriate collision resolution techniques to take care of the collisions.
- Linear Probing : When the hash function causes a collision by mapping a new key to a cell of the hash table that is already occupied by another key, linear probing searches the table for the closest following free location and inserts the new key there. Lookups are performed in the same way, by searching the table sequentially starting at the position given by the hash function, until finding a bucket with a matching key or an empty bucket.


# 10.5 Binary Search Tree

## 10.5.1 Tree and Binary Tree

A **tree** is a data structure which – similar to a linked list – sets up link pointers between various data items. 

A tree with only two possible descendants from each value is called a **binary tree**.

Tree terminology
- Each data item in the tree is called a **node**.
- The first item added to the tree is called the **root value**.
- All items to the left of the root form the **left sub-tree**.
- All items to the right of the root form the **right sub-tree**.
- Any node may have its own sub-tree.
- An item which has no descendants is called a **leaf node**.

### Exercise 1: 
Implement a class `Node` for Binary Tree node.

- It has an instance attribute `data` which holds data of the node, instance attribute `left` next pointing to the left node and instance attribute `right` pointing to the right node
- It implements `__repr__()` method which returns string `data(left.data,right.data)`, e.g. 
    >```
    n1 = Node(10, Node(5), Node(15))
    print(n1)
    >```
    
    outputs `10(5,15)`

In [None]:
#YOUR_CODE_HERE

### Exercise 2: 
Implement a class `BinaryTree` for Binary Tree.
- It has an instance attribute `root` which holds a specificed node. If no node is specified, default value of `root` is `None`


## 10.5.2 Binary Search Tree

Binary Search Tree (BST) is a type of binary tree with following special properties:
* The left subtree of a node contains only nodes with keys lesser than the node’s key.
* The right subtree of a node contains only nodes with keys greater than the node’s key.
* The left and right subtree each must also be a binary search tree.

<img src="./images/binary_search_tree.jpg" alt="Queue ADT" style="width: 350px;"/>

<center>https://www.tutorialspoint.com/data_structures_algorithms/images/binary_search_tree.jpg</center>

By most definitions, BST only allow distinct values, and <u>duplicates are not allowed</u>. 

This is because allowing duplicate values will bring much more complexity than convenience.

## 10.5.3 Binary Search Tree Operations

### 10.5.3.1 Insert a Node

The operation to insert a value to is a **recursive process** at each node of the tree. 

Assume current node is not `None`,
* if the incoming value `val` is less than current node's value, 
    * if left child is `None`, create a new node with the value and assign to it,
    * else recurse into left subtree.
* if the incoming value is greater than or equals to current node's value, 
    * if right child is `None`, create a new node with the value and assign to it,
    * else recurse into right subtree.

### 10.5.3.2 Find a Node

To search a given node in Binary Search Tree, 
* If the value matches current node's data, return the node. 
* If the value is greater than current node, recur into the right subtree of root node.
* Otherwise we recur into the left subtree.

Following recursive function `_find(node, val)` find the `val` in the tree where `node` is the root.

## 10.5.4 Tree Traversals

Tree traversal is defined to be a way to visit the nodes in some particular order. 

There are 3 such tree traversals:
- in-order : We go as far to the left as we possibly can and read that node. Then we move to the root and finally we move right. We repeat the process, only returning when we run out of branches to follow. On the way back we read any nodes that we haven’t read yet. So, for in-order traversal, the nodes are visited in (Left, Root, Right) order. 
- pre-order : We first visit the root node. Then, we move to the left node and go as far as the root node before the last left leaf node, read it, read the most left leaf node and read the right node. We repeat the process, only returning when we run out of branches to follow. On the way back we read any nodes that we haven’t read yet. So, for pre-order traversal, the nodes are visited in (Root, Left, Right) order. 
- post-order :  We go as far to the left as we possibly can and read that node. Then we move to the right. We repeat the process, only returning when we run out of branches to follow.  inally we move to the node. On the way back we read any nodes that we haven’t read yet. So, for post-order traversal, the nodes are visited in (Left, Right, Node) order. 

In [None]:
# Python program to for tree traversals
 
# A class that represents an individual node in a
# Binary Tree
 
 
class Node:
    def __init__(self, key):
        self.left = None
        self.right = None
        self.val = key
 
 
# A function to do inorder tree traversal
def printInorder(root):
 
    if root:
 
        # First recur on left child
        printInorder(root.left)
 
        # then print the data of node
        print(root.val),
 
        # now recur on right child
        printInorder(root.right)
 
 
# A function to do postorder tree traversal
def printPostorder(root):
 
    if root:
 
        # First recur on left child
        printPostorder(root.left)
 
        # the recur on right child
        printPostorder(root.right)
 
        # now print the data of node
        print(root.val),
 
 
# A function to do preorder tree traversal
def printPreorder(root):
 
    if root:
 
        # First print the data of node
        print(root.val),
 
        # Then recur on left child
        printPreorder(root.left)
 
        # Finally recur on right child
        printPreorder(root.right)
 
 
# Driver code
root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.left = Node(4)
root.left.right = Node(5)
print "Preorder traversal of binary tree is"
printPreorder(root)
 
print "\nInorder traversal of binary tree is"
printInorder(root)
 
print "\nPostorder traversal of binary tree is"
printPostorder(root)

## 10.5.5 Example Uses of Binary Search Tree

- Sorting an unordered list via in-order traversal.