# Hash Functions and Hash Tables
## Collision Resolution
- Separate chaining: if a collision occurs, use a list to store multiple keys in the same location.
- Open addressing: if a collision occurs, store the entry in the next available location.

### Search with Linear Probing

Let h<sub>0</sub> = h(k)

h<sub>j</sub>(k) = [h(k) + j] mod(N)

- Consider a hash table A that uses linear probing.
- `findElement(k)`
    - We start at cell h(k).
    - We probe consecutive locations until one of the following occurs.
        - An item with key k is found, or
        - N cells have been unsuccessfully proved.

### Updates with Linear Probing
- To handle insertions and deletions, we introduce a special object, called `AVAILABLE`, which replaces deleted elements.
- `removeElement(k)`
    - We search for an item with key k.
    - If such an item (k, o) is found, we replace it with the special item `AVAILABLE` and we return element 0.
    - Else, we return NO_SUCH_KEY.
- `insertItem(k, o)`
    - We throw an exception if the table is full.
    - We start at cell h(k).
    - We probe consecutive cells until one of the following occurs:
        - A cell i is found that is either empty or stores AVAILABLE, or
        - N cells have been unsuccessfully probed.
    - We store item (k, o) in cell i.

### Performance of Linear Probing
- Search: average number of probes = C(&#945;)

Experimental results for a hash table with load factor &#945;

| &#945; = n/N | C(&#945;) |
| ----- | ----- |
| 0.1 (10%) | 1.06 |
| 0.5 (50%) | 1.50 |
| 0.75 (75%) | 2.50 |
| 0.9 (90%) | 5.50 |

- When using linear probing, entries are prone to grouping in a location (primary clustering). To solve this, we use a non-linear probe.

### Quadratic Probing

h<sub>j</sub>(k) = [h(k) + j<sup>2</sup>] mod(N)

N: prime

- mod is hard to calculate.
- Visits only half of the table.

### Performances of Quadratic Probing
Experimental results for a hash table with load factor &#945;

| &#945; = n/N | C(&#945;) |
| ----- | ----- |
| 0.1 (10%) | 1.05 |
| 0.5 (50%) | 1.44 |
| 0.75 (75%) | 1.99 |
| 0.9 (90%) | 2.79 |

- With non-linear probing, two keys that hash to the same place follow the same collision path (secondary clustering). To resolve this, we use double hashing.

### Double Hashing

h<sub>j</sub>(k) = [h(k) + j*d(k)] mod(N)

OR

h<sub>j</sub>(k) = [h(k) + j<sup>2</sup>*d(k)] mod(N)

- Choice of primary hashing function h()
- Choice of secondary hashing function d()

### Performances of Double Hashing

Experimental results for a hash table with load factor &#945;

| &#945; = n/N | C(&#945;) |
| ----- | ----- |
| 0.1 (10%) | 1.05 |
| 0.5 (50%) | 1.38 |
| 0.75 (75%) | 1.83 |
| 0.9 (90%) | 2.55 |

## Performance of Hashing: Summary
- In the worst case, searches, insertions, and removals on a hash table take O(n) time.
- The worst case occurs when all the keys inserted into the dictionary collide.
- The load factor &#945; = n/N affects the performance of a hash table.
- Assuming that the hash values are like random numbers, it can be shown that the expected number of probes for an insertion with open addressing is approximately 1/(1 - &#945;)
- The **expected running time** if all the dictionary ADT operations in a hash table is O(1).
- In practice, hashing is very fast provided the load factor is not close to 100%.
- Applications of hash tables:
    - Small databases
    - Compilers
    - Browser caches
    - P2P

# Graphs
- A graph is a pair (V, E) where
    - V is a set of nodes, called vertices.
    - E is a collection of pairs of vertices, called edges.
    - Vertices and edges are positions and store elements.

## Edge Types
- Directed edge:
    - Ordered pair of vertices (u, v).
    - First vertex u is the origin.
    - Second vertex v is the destination.
- Undirected edge:
    - Unordered pair of vertices {u, v}.
- Directed graph:
    - All the edges are directed.
- Undirected graph:
    - All the edges are undirected.

## Applications
- Electronic circuits
    - PCB
    - Integrated circuit
- Transportation networks
- Computer networks
- Databases
    - Entity-relationship diagram

## Terminology
- End vertices (or endpoints) of an edge;
    - U and V are the endpoints of a.
- Edges incident on a vertex;
    - a, d, and b are incident on V.
- Adjacent vertices;
    - U and V are adjacent.
- Degree of a vertex;
    - X has degree 5.
- Parallel edges;
    - h and i are parallel edges.
- Self-loop
    - j is a self loop.

![Graph Example 1](./Resources/GraphExample1.png)

- Path:
    - Sequence of alternating vertices and edges
    - Begins with a vertex.
    - Ends with a vertex.
    - Each edge is preceded and followed by its endpoints.
- Simple path:
    - Path such that all its vertices and edges are distinct.
- Ex:
    - P<sub>1</sub> = (V, b, X, h, Z) is a simple path.
    - P<sub>2</sub> = (U, c, W, e, X, g, Y, f, W, d, V) is a path that is not simple.

![Graph Example 2](./Resources/GraphExample2.png)

- Cycle:
    - Circular sequence of alternating vertices and edges.
    - Each edge is preceded and followed by its endpoints.
- Simple cycle:
    - Cycle such that all its vertices and edges are distinct.
- Ex:
    - C<sub>1</sub> = (V, b, X, g, Y, f, W, c U, a, V) is a simple cycle.
    - C<sub>2</sub> = (U, c, W, e, X, g, Y, f, W, d, V, a, U) is a cycle that is not simple.

![Graph Example 3](./Resources/GraphExample3.png)