# hparra/hgpa

Switch branches/tags
Nothing to show
Fetching contributors…
Cannot retrieve contributors at this time
176 lines (134 sloc) 4.93 KB

# Hash Tables

Can we implement a Dictionary such that all operations: insertion, deletion, and searching, are O(1)?

Yes. Use a hash table.

Hash tables are fundamental data structures. They're also analogous to the Dictionary ADT, aka associative arrays. Most dictionaries are implemented with a hash table. You really should learn them!

## tl;dr

Complexity:

• all operations are O(1) on average
• all operations are O(n) in worst-case (all collisions, etc.)
• space is O(n)
```# Naive (and Wrong) Implementation
# but it communicates the concept
class HashTable:

def __init__(self):
self.array = []

def set(self, key, value):
index = hash(key) % len(self.array)
self.array[index] = value

def get(self, key):
index = hash(key) % len(self.array)
return self.array[index]

# but...```

Two Fundamental Problems:

1. collisions: "Someone's already in here!"
2. array sizing: "There's no more space!"

Collision Resolution:

• separate chaining: "Each spot should have more spots!"
• self-balancing binary search trees
• open addressing: "Let's stick it in another spot!"
• linear probing: "Just pick the next open spot!"
• double hashing: "Hash it again and see if that's open!"

Array Resizing:

• on insert check array size
• if array is too small (or getting there) then resize
• create new array twice as big
• copy everything over

## Implementations

You may see some hash tables or maps described as weak. This means that hash table uses a weak reference for its values. Once there are no strong (regular) references to the value the garbage collector is free to eliminate it.

Implementations differ:

• collision resolution:
• array resizing: some 2x others 1.75x

### Python

Python's `dict`:

• is a hash table
• is a built-in type, e.g. `somevar = {}`
```# Example Python's dict
secrets = {}
secrets["You"] = "idk"
secrets["HGPA"] = "doesn't like watermelon"
secrets["Tony"] = "has a WWF belt"
secrets["Tyler"] = "listen's to Bjork"
del secrets["You"]
for k in secrets:
print("{} {}").format(k, secrets[k])
for k,v in d.iteritems(): # Python 2
print("{} {}").format(k, v)```

https://docs.python.org/2/library/weakref.html

### C++

C++'s `unordered_map`:

• is a hash table
```#include <iostream>
#include <unordered_map>
using namespace std;
int main() {
unordered_map<string, string> secrets;
secrets["You"] = "idk";
secrets["HGPA"] = "doesn't like watermelon";
secrets["Tony"] = "has a WWF belt";
secrets["Tyler"] = "listen's to Bjork";
delete secrets["You"];
unordered_map<string,string>::iterator i;
// actually iterator across pair<string, string>
for (i = secrets.begin(); i != secrets.end(); i++) {
cout << i->first << "  " << i->second << endl;
}
}```

### Java

As usual Java has various implementations.

Java's `HashMap<K,V>`:

• uses separate chaining with a Linked List (Java 7)
• uses separate chaining with a Red-Black Tree (Java 8)
• is not thread-safe (but see `Hashtable`)
```// Example Java's HashMap
import java.util.HashMap;
import java.util.Map;
import java.util.Iterator;
import java.util.Set;
public class Example {
public static void main(String args[]) {
HashMap<String, String> secrets = new HashMap<String, String>();
// setting
secrets.put("You", "idk");
secrets.put("HGPA", "doesn't like watermelon");
secrets.put("Tony", "has a WWF belt");
secrets.put("Tyler", "listen's to Bjork");
// getting
System.out.println(secrets.get("HGPA"));
// removing
secrets.remove("You");
// traverse -- also example why I don't like java
Set set = secrets.entrySet();
Iterator iterator = set.iterator();
while (iterator.hasNext()) {
Map.Entry entry = (Map.Entry) iterator.next();
System.out.print(entry.getKey() + " " + entry.getValue());
}
}
}```

## RESOURCES

Hash table. Wikipedia.

The Mighty Dictionary. Brandon Craig Rhodes. PyCon 2010. An optimized explanation of Python's Dictionary implementation using a hash table.

How std::unordered_map is implemented?. Stack Overflow.

## TODO

A Proposal to Add Hash Tables to the Standard Library (revision 4). Matthew Austern. 2003-04-03.

Hash Tables (C). Eternally Confuzzled.