# EC2202 Hashing

**Disclaimer.**
This code examples are based on
1. [MIT 6.006 (Professor Erik Demaine, Dr. Jason Ku, and Professor Justin Solomon)](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-spring-2020/index.htm)
2. [KAIST CS206 (Professor Otfried Cheong)](https://otfried.org/courses/cs206/)
3. [LeetCode](https://leetcode.com/)
4. [GeeksForGeeks](https://practice.geeksforgeeks.org/)
5. Coding Interviews

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/-_gOqXzexsg" title="YouTube video player" frameborder="0" allowfullscreen></iframe>
<iframe width="560" height="315" src="https://www.youtube.com/embed/tWk8iaflxm4" title="YouTube video player" frameborder="0" allowfullscreen></iframe>

## Chaining

In [5]:
# Chaining 구현에 앞서 node class 구현
class _Node():
  def __init__(self, key, value, next = None):
    self.key = key
    self.value = value
    self.next = next

def _hash(key):
  return key % 100 # hash func의 정의: 일단은 간단하게 정의.

# Dictionary 구현
class dict():
  def __init__(self):
    self._data = [None]*100 # the memory space we have 100으로 가정

  def __contains__(self, key): # 이 key 값이 dictionary에 있는지 없는지
    return self._findnode(key) is not None

  def _findnode(self, key):
  # find node에 앞서, 각 node가 memory space의 어떤 위치에 있는지 알아야 함, 그 값은 hash 값으로 찾음
    i = _hash(key)       # i = memory location, find using hash value
    p = self._data[i]    # the head of the Linked List at memory location i
    while p is not None: # 저장이 안 되어 있으면 while loop 안 돌음
      if p.key == key:   # p node에 저장된 key가 내가 찾는 값이 맞는지 확인
        return p
      #else
      p = p.next
    return None # 다 끝났는데 p가 없으면 None을 Return 해서 없는 것을 확인

  # print(d[k]) --> return the value with the key k
  def __getitem__(self, key):
    p = self._findnode(key)
    if p: # if p is not none
      return p.value
    raise ValueError(key) # p가 None이면 value error 발생시킴

  # ppp exercise
  # d[k] = value로 item을 세팅하는 method
  def __setitem__(self, key, value):
    p = self._findnode(key)
    if p:
      p.value = value
    else: # make new node
      h = _hash(key) # use hash func to find the memory location for the key value
      self._data[h] = _Node(key, value, self._data[h]) # 새로 생긴 node가 head node가 됨

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/Lrm0wxm1H8U" title="YouTube video player" frameborder="0" allowfullscreen></iframe>
<iframe width="560" height="315" src="https://www.youtube.com/embed/WBvohZehMPE" title="YouTube video player" frameborder="0" allowfullscreen></iframe>

## Open addressing



In [None]:
class _Entry():
  def __init__(self, key, value):
    self.key = key
    self.value = value

def hash(key):
  return key % 100

class dict():
  def __init__(self):
    self._data = [None] * 100

  def _findkey(self, key):
    i = _hash(key)
    while self._data[i] is not None:
    # i에 해당하는 자리가 비어있지 않으면 linear하게 다음 노드를 탐색하며 원하는 key값이 있는지 탐색
      if self._data[i].key == key: # data의 i번째에 찾고자하는 key가 있으면,
        return (True, i)
      #else
      i = (i+1) % 100 # 아니라면 계속 검색해줌
      # 계속 검색했는데도 아니라면,
    return (False, i)

  def __contain__(self, key):
    found, i = self._findkey(key)
    return found # return True or False

  def __getitem(self, key):
    found, i = self._findkey(key)
    if found:
      return self._data[i].value # 그 위치(index)에 저장된 value return
    return ValueError(key)       # 없으면 value error

  # ppp exercise
  def __setitem__(self, key, value):
    found, i = self._findkey(key) # 값이 존재하는지 아닌지 확인 (found가 T / F)
    if found:
      self._data[i].value = value # 값이 존재하면 새로운 value로 업데이트
    else:                         # 값이 존재하지 않으면 Entry(key, value) 넣어줌
      self._data[i] = _Entry(key, value)

In [None]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/EAgLywwjmQY" title="YouTube video player" frameborder="0" allowfullscreen></iframe>
<iframe width="560" height="315" src="https://www.youtube.com/embed/P4kDR7Zdjrw" title="YouTube video player" frameborder="0" allowfullscreen></iframe>

## Practial issues

Naive Implementation: Point x와 y가 다르다고 나옴

In [10]:
class Point:
  def __init__(self, x, y):
    self.x = x
    self.y = y

  def __repr__(self):
    return "Point(%s, %s)" % (self.x, self.y)

In [11]:
s = set()
s.add(Point(3, 5))
print(s)
print(Point(3, 5) in s)

False


In [12]:
p = Point(3, 5)
q = Point(3, 5)
print(p == q)

False


In [13]:
print(hash(p))
print(hash(q))

8399652002096
8399652002660


**WWPP**

In [14]:
s = set()
s.add(Point(3, 5))
print(s)
print(Point(3, 5) in s)

{Point(3, 5)}
False


Even though we can see that s contains a Point(3, 5), we cannot find it in the set. The reason becomes clear when we try the following:

In [15]:
p = Point(3, 5)
q = Point(3, 5)
print(p == q)

False


In [16]:
print(hash(p))
print(hash(q))

8399651890516
8399651890942


Even though two points have the same coordinates, Python does not consider them equal, and they have different hash codes—so there is no way that the set could find the entry.

### Implementing `__eq__`

In [18]:
class Point():
  def __init__(self, x, y):
    self.x = x
    self.y = y

  def __repr__(self):
    return "Point(%s, %s)" % (self.x, self.y)

  def __eq__(self, rhs):
    return self.x == rhs.x and self.y == rhs.y

In [19]:
p = Point(3, 5)
q = Point(3, 5)
print(p == q) # __eq__ implementation 해주면 True를 반환하는 것으로 바뀜

True


In [20]:
s = set()
s.add(Point(3, 5))
print(s)
print(Point(3, 5) in s)

TypeError: unhashable type: 'Point'

*TypeError: unhashable type: 'Point'*

object 저장 조차 안된 것. object를 hash에 넣을 수 없고, 즉 hashable 하지 않음

어떤 obj 구현 시에는 __eq__와 __hash__ method 구현 필수

Python can now determine that the two points are equal—but it tells us that Point objects cannot be used in a hash table. In fact, it’s the hash function that no longer works:

In [None]:
print(hash(p))

The Python interpreter will not use its default implementation of the hash function for objects with an equality operator. Why not? Because the hash code of equal objects needs to be the same, and Python has no way to ensure this.

### Implementing `__hash__`

(이거 교수님 코드임)

In [21]:
class Point():
  def __init__(self, x, y):
    self.x = x
    self.y = y

  def __repr__(self):
    return "Point(%s, %s)" % (self.x, self.y)

  def __eq__(self, rhs):
    return self.x == rhs.x and self.y == rhs.y

  def __hash__(self):
    return hash((self.x, self.y)) # hash 값으로는 무엇이든 들어갈 수 있음 (int로 변환하니까?)

In [22]:
s = set()
s.add(Point(3, 5))
print(s)
print(Point(3, 5) in s)

{Point(3, 5)}
True


In [24]:
p = Point(3, 5)
q = Point(3, 5)
print(p == q)

True


In [25]:
print(hash(p))
print(hash(q))

7586885779985432798
7586885779985432798


Python에서 제공하는 id라는 함수: 다르게 나옴 ; 실제로는 다른 object이기 때문

In [23]:
print(id(p))
print(id(q))

134394430250368
134394430256464


The lesson is: hash tables require that keys satisfy the following “contract”

# More Issues

In [26]:
p = Point(3, 5)
s = set()
s.add(p)
print(s)

{Point(3, 5)}


In [27]:
p.y = 9
# mutable objects
print(s)
print(Point(3, 9) in s)
print(Point(3, 5) in s)

# immutable objects: tuple

{Point(3, 9)}
False
False


set, dict 에서 mutable한 obj는 원칙적으로 넣을 수 없게 되어 있음


In [28]:
# ex

d = dict()
d[[1, 2, 3]] = 5

TypeError: unsupported operand type(s) for %: 'list' and 'int'

### Mutable keys

In [None]:
p = Point(3, 5)
s = set()
s.add(p)
print(s)

**WWPP**

In [None]:
p.y = 9
print(s)
print(Point(3, 9) in s)
print(Point(3, 5) in s)

Even though s clearly contains Point(3,9), the set cannot find it. The reason is that p’s hash code has changed after it was added to the hash table, so p is simply in the wrong slot of the hash table!

The lesson here: Never modify keys after they were added to a hash table.
In fact, I would go further and recommend: Never use mutable objects as keys in a hash table. This is yet another example why immutable objects make programming safer and easier.

Python encourages this idea: Python lists and Python sets are themselves not hashable. You cannot put a Python list, or a Python set into a set! What you can do instead is to use a tuple or a frozenset. These objects are hashable, and can be used as keys in a map or as elements of a set.

In [None]:
d[[1, 2, 3]] = 5  # not allowed!