<a href="https://colab.research.google.com/github/Sean-Toroghi/Algorithm/blob/master/DataStructure/HashTables/HashTables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Hash function
---

- Multiplicative hashing with prime number - near-universe hashing



## Multiplicative hashing with prime number - near-universe hashing

$h_{a, p}(x) =  (ax \mod p)\mod m$



In [1]:
import random

def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

def generate_prime_list(hash_table_size):
    # Generate a list of prime numbers up to the hash table size
    primes = []
    num = hash_table_size + 1
    while len(primes) < 10:  # Generate a list of at least 10 prime numbers
        if is_prime(num):
            primes.append(num)
        num += 1
    return primes

def multiplicative_hash(x, p, m):
    min_size = 2 * p
    prime_list = generate_prime_list(min_size)
    a = random.choice(prime_list)
    return ((a * x) % p) % m




Hashed value: 54


In [6]:
# Example usage
# create random int in range 1e6 and 1e7
hash_table = {}
for _ in range(10000):
  x = random.randint(1e6, 1e7)
  p = 99991  # a large prime number
  m = 100  # hash table size

  hashed_value = multiplicative_hash(x, p, m)
  # Detect collision
  if hashed_value in hash_table:
    print(f"Collision detected for key {x}")


## Hash function - bitwise multiplication

$h_i(x) := h(x)⊕i$

In [7]:
import random

# Example of some independent hash functions
def h1(x):
    return hash(x) & 0xFFFFFFFF

def h2(x):
    return (hash(x) >> 32) & 0xFFFFFFFF

def h3(x):
    return (hash(x) >> 64) & 0xFFFFFFFF

# List of independent hash functions
hash_functions = [h1, h2, h3]

def hash_with_probing(x, hash_table_size, probe_count=10):
    # Try to place item in the hash table
    for i in range(probe_count):
        # Select hash function h_i(x) using XOR probing
        h = hash_functions[i % len(hash_functions)]
        index = h(x) ^ i  # XOR with the probe index to get a new position

        # Use modulo to ensure the index is within bounds of the table size
        index = index % hash_table_size

        # Check if the spot is available (you can customize this with your own collision resolution)
        if not hash_table[index]:
            return index  # Found an open slot

    return None  # If the table is full after probing

# Example usage
hash_table_size = 128  # Define size of the hash table
hash_table = [None] * hash_table_size  # Initialize the hash table with None

# Example key-value pairs to insert
keys = ['apple', 'banana', 'cherry', 'date']

for key in keys:
    position = hash_with_probing(key, hash_table_size)
    if position is not None:
        hash_table[position] = key
    else:
        print(f"Could not insert {key} into hash table after probing.")

# Print the hash table
print(hash_table)


[None, None, None, None, None, 'apple', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 'date', None, None, None, None, None, None, 'banana', None, None, None, None, None, None, None, None, None, None, None, 'cherry', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]


# Bloom filter

In [8]:
import hashlib

class BloomFilter:
    def __init__(self, size, num_hashes):
        self.size = size
        self.num_hashes = num_hashes
        self.bit_array = [0] * size

    def _hashes(self, item):
        # Generate multiple hashes for the item
        hashes = []
        for i in range(self.num_hashes):
            hash_value = int(hashlib.md5(item.encode('utf-8')).hexdigest(), 16)
            hashes.append((hash_value + i) % self.size)
        return hashes

    def insert(self, item):
        for hash_index in self._hashes(item):
            self.bit_array[hash_index] = 1

    def check(self, item):
        for hash_index in self._hashes(item):
            if self.bit_array[hash_index] == 0:
                return False
        return True

# Example usage
bloom = BloomFilter(size=10, num_hashes=3)
bloom.insert("apple")
bloom.insert("banana")

print(bloom.check("apple"))  # True
print(bloom.check("banana"))  # True
print(bloom.check("cherry"))  # False


True
True
False


# Ransom Note

Given two strings `ransomNote` and `magazine`, return true if `ransomNote` can be constructed by using the letters from magazine and false otherwise.

Each letter in magazine can only be used once in ransomNote.


```
Example 1:

Input: ransomNote = "a", magazine = "b"
Output: false
Example 2:

Input: ransomNote = "aa", magazine = "ab"
Output: false
Example 3:

Input: ransomNote = "aa", magazine = "aab"
Output: true
```

## Approach - Hashtables for fast lookup

Create a dictionary (hashtable) that stores characters of magazine and the frequency of their occurance.

Then start looking up each character of the ransomNote, and deduct from the count of that letter. If by the end of ransomNote, this iteration can find all letter in the dictionary, returns True. Otherwise, False.

In [None]:
def FindCharacters(ransomNote, magazine):
  '''
  Input: two strings: ransomNote and magazine
  Output: boolean, if all characters in ransomNote can be found in magazine, returns true. Otherwise, false.
  '''
  CharFreq = {}
  for char in magazine:
    CharFreq[char] = CharFreq.get(char, 0) + 1

  for char in ransomNote:
    if char not in CharFreq or CharFreq[char] == 0:
      return False
    CharFreq[char] -= 1
  return True

# Example usage
ransomNote = "aa"
magazine = "aab"
print(FindCharacters(ransomNote, magazine))

True


# Isomorphic Strings

Given two strings s and t, determine if they are isomorphic.

Two strings s and t are isomorphic if the characters in s can be replaced to get t.

All occurrences of a character must be replaced with another character while preserving the order of characters. No two characters may map to the same character, but a character may map to itself.


```
Example 1:

Input: s = "egg", t = "add"

Output: true

Explanation:

The strings s and t can be made identical by:

Mapping 'e' to 'a'.
Mapping 'g' to 'd'.
Example 2:

Input: s = "foo", t = "bar"

Output: false

Explanation:

The strings s and t can not be made identical as 'o' needs to be mapped to both 'a' and 'r'.

Example 3:

Input: s = "paper", t = "title"

Output: true
```

## Approach: Hashtable to encode

Encode the first table with hashtable by replacing the character with the key, then encode the second string the same way. If the two list are similar, the two string are isomorphic.

In [None]:
def is_isomorphic(s, t):
    if len(s) != len(t):
      return False

    s_to_t = {}
    t_to_s = {}
    for char_s, char_t in zip(s, t):
      if char_s in s_to_t:
        if s_to_t[char_s] != char_t:
          return False
      if char_t in t_to_s:
        if t_to_s[char_t] != char_s:
          return False

      s_to_t[char_s]= char_t
      t_to_s[char_t] = char_s

    return True

# Example usage
s = "paper"
t = "title"
print(is_isomorphic(s, t))


True


#  Longest Consecutive Sequence

Given an unsorted array of integers nums, return the length of the longest consecutive elements sequence.

You must write an algorithm that runs in O(n) time.


```
Example 1:

Input: nums = [100,4,200,1,3,2]
Output: 4
Explanation: The longest consecutive elements sequence is [1, 2, 3, 4]. Therefore its length is 4.
Example 2:

Input: nums = [0,3,7,2,5,8,4,6,0,1]
Output: 9
```

## Approach -  

1. Use a Set for Fast Lookups: Insert all elements of the array into a set. This allows for $O(1)$ average-time complexity for checking if a number is in the set.

2. Find Consecutive Sequences: Iterate through each number in the set. For each number, check if it's the start of a sequence (i.e., the number just before it isn't in the set). Then, count the length of this sequence by continuously checking for consecutive numbers.

3. Track the Longest Sequence: Keep track of the length of the longest sequence found during the iteration.


__Time Complexity:__ $O(n )$

__Space complexity__: $O( n)$

In [None]:
def longest_consecutive(nums):
    if not nums:
        return 0

    num_set = set(nums)
    longest_streak = 0

    for num in num_set:
        # Check if it's the start of a sequence
        if num - 1 not in num_set:
            current_num = num
            current_streak = 1

            # Count the length of the consecutive sequence
            while current_num + 1 in num_set:
                current_num += 1
                current_streak += 1

            # Update the longest streak if necessary
            longest_streak = max(longest_streak, current_streak)

    return longest_streak

# Example usage:
print(longest_consecutive([100, 4, 200, 1, 3, 2]))  # Output: 4 (sequence: 1, 2, 3, 4)
print(longest_consecutive([0, -1, 1, 2, -2, -3]))  # Output: 6 (sequence: -3, -2, -1, 0, 1, 2)
print(longest_consecutive([1, 1, 0, 0, 2, 3, 4]))  # Output: 5 (sequence: 0, 1, 2, 3, 4)


4
6
5


# Group Anagrams



Given an array of strings strs, group the
anagrams (An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once) together. You can return the answer in any order.


```
Example 1:

Input: strs = ["eat","tea","tan","ate","nat","bat"]

Output: [["bat"],["nat","tan"],["ate","eat","tea"]]

Explanation:

There is no string in strs that can be rearranged to form "bat".
The strings "nat" and "tan" are anagrams as they can be rearranged to form each other.
The strings "ate", "eat", and "tea" are anagrams as they can be rearranged to form each other.
Example 2:

Input: strs = [""]

Output: [[""]]

Example 3:

Input: strs = ["a"]

Output: [["a"]]
```

## Approach - employ  heap data structures

- Use dictionary (hashtable) to store character of a string, and lookup to see if any of the other strings can be crated with the characters in dictionary. Remove any match string and iterate through the remaining

__Time Complexity:__ $O(n⋅klogk)$: $k$ is average length of the strings, and $n$ is the number of strings. The sorting each string help to create a unique key for each group of anagrams, and takes $O(k \log k)$.

__Space complexity__: $O(n⋅k)$, for storing all strings and their grouped anagrams in the dictionary.

In [None]:
def group_anagrams(strs):
    if not strs:
        return []

    anagram_groups = {}

    for string in strs:
        # Sort the string to get a unique key for anagrams
        sorted_str = ''.join(sorted(string))

        # If the key is not in the dictionary, add it with the current string as its value
        if sorted_str not in anagram_groups:
            anagram_groups[sorted_str] = []

        # Append the original string to the corresponding anagram group
        anagram_groups[sorted_str].append(string)

    # Return the grouped anagrams as a list of lists
    return list(anagram_groups.values())

# Example usage:
print(group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
# Output: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]


[['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]


# Two Sum

iven an array of integers nums and an integer target, return indices of the two numbers such that they add up to target.

You may assume that each input would have exactly one solution, and you may not use the same element twice.

You can return the answer in any order.

```

Example 1:

Input: nums = [2,7,11,15], target = 9
Output: [0,1]
Explanation: Because nums[0] + nums[1] == 9, we return [0, 1].
Example 2:

Input: nums = [3,2,4], target = 6
Output: [1,2]
Example 3:

Input: nums = [3,3], target = 6
Output: [0,1]
```

## Approach - employ hashtables for lookup the complement of each number

- for each integer in the list compute its complement (target - item) and use hashtables to lookup for this complement.
-



In [None]:
def two_sum(nums, target):
    # Dictionary to store the number and its index
    num_to_index = {}

    for index, num in enumerate(nums):
        # Calculate the complement
        complement = target - num

        # Check if the complement is already in the dictionary
        if complement in num_to_index:
            return [num_to_index[complement], index]

        # Store the index of the current number
        num_to_index[num] = index

    # If no solution is found, return an empty list (or handle as needed)
    return []

# Example usage:
print(two_sum([2, 7, 11, 15], 9))  # Output: [0, 1]
print(two_sum([3, 2, 4], 6))       # Output: [1, 2]
print(two_sum([3, 3], 6))          # Output: [0, 1]


[0, 1]
[1, 2]
[0, 1]


# Happy Number

Write an algorithm to determine if a number n is happy.

A happy number is a number defined by the following process:

Starting with any positive integer, replace the number by the sum of the squares of its digits.
Repeat the process until the number equals 1 (where it will stay), or it loops endlessly in a cycle which does not include 1.
Those numbers for which this process ends in 1 are happy.
Return true if n is a happy number, and false if not.


```
Example 1:

Input: n = 19
Output: true
Explanation:
1^2 + 9^2 = 82
8^2 + 2^2 = 68
6^2 + 8^2 = 100
1^2 + 0^2 + 0^2 = 1
Example 2:

Input: n = 2
Output: false
```

## Approach - use set to track if generated number was generated before (cycle)

- create a function to compute the sum of square of the difits
- employ set to check if the output has already generated, if not adding it to the set, otherwise, continue until reaching value 1

__Time complexity__: $O(\log n)$ for each iteration.

__Space complexity__: $O(K)$, where $K$ is the number of unique outputs from the function that computes the sum of square of digits.

In [None]:
def HappyNumber(n):
  def compute_sum_of_squares(num):
    return sum([int(digit) ** 2 for digit in str(num)])

  seen_outputs = set()
  while n != 1 and n not in seen_outputs:
    seen_outputs.add(n)
    n = compute_sum_of_squares(n)

  return n == 1

# Example
print(f"Is 19 a happy number?: {HappyNumber(19)}")
print(f"Is 2 a happy number?: {HappyNumber(2)}")

Is 19 a happy number?: True
Is 2 a happy number?: False


# Word Pattern

Given a pattern and a string s, find if s follows the same pattern.

Here follow means a full match, such that there is a bijection between a letter in pattern and a non-empty word in s. Specifically:

- Each letter in pattern maps to exactly one unique word in s.
- Each unique word in s maps to exactly one letter in pattern.
- No two letters map to the same word, and no two words map to the same letter.

```
Example 1:

Input: pattern = "abba", s = "dog cat cat dog"

Output: true

Explanation:

The bijection can be established as:

'a' maps to "dog".
'b' maps to "cat".
Example 2:

Input: pattern = "abba", s = "dog cat cat fish"

Output: false

Example 3:

Input: pattern = "aaaa", s = "dog cat cat dog"

Output: false
```


## Approach - Hashtables

- employ hashtables to map each character in $pattern$ to each with in corresponding position in $s$. Then build the $s$ by using the mapping function and compare it with the original $s$

In [None]:
def WordPattern(pattern, s):
    words = s.split()
    if len(pattern) != len(words):
        return False

    char_to_word = {}
    word_to_char = {}

    for c, word in zip(pattern, words):
        # Check if the character is already in the mapping
        if c in char_to_word:
            if char_to_word[c] != word:
                return False
        else:
            char_to_word[c] = word

        # Check if the word is already mapped to a different character
        if word in word_to_char:
            if word_to_char[word] != c:
                return False
        else:
            word_to_char[word] = c

    return True


# Example usage
pattern = "abba"
s = "dog cat cat dog"
print(WordPattern(pattern, s))

# Example 2
pattern = "abba"
s = "dog cat cat fish"
print(WordPattern(pattern, s))

# Example 3
pattern = "aaaa"
s = "dog cat cat dog"
print(WordPattern(pattern, s))

True
False
False


# Valid Anagram

Given two strings s and t, return true if t is an
anagram of s, and false otherwise.

An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, using all the original letters exactly once.


```
Example 1:

Input: s = "anagram", t = "nagaram"

Output: true

Example 2:

Input: s = "rat", t = "car"

Output: false
```


## Approach - use hashtable (dictionary) to creating count table

Using dictionary for mapping count of the characters is $s$. Then check if the word $t$ can be builts with the characters in dictionary.

In [None]:
def CheckAnagram(s, t):
  if len(s) != len(t):
    return False

  char_to_count = {}
  for char in s:
    if char in char_to_count:
      char_to_count[char] +=1
    else:
      char_to_count[char] = 1

  for char in t:
    if char not in char_to_count:
      return False
    char_to_count[char] -=1
    if char_to_count[char] == 0:
      del char_to_count[char]

  return len(char_to_count) == 0

# Example
print(CheckAnagram("anagram", "nagaram"))
print(CheckAnagram("rat", "car"))

True
False


# Contains Duplicate II

Given an integer array _'`nums`_ and an integer $k$, return true if there are two distinct indices i and j in the array such that `nums[i] == nums[j]` and `abs(i - j) <= k`.


```
Example 1:

Input: nums = [1,2,3,1], k = 3
Output: true
Example 2:

Input: nums = [1,0,1,1], k = 1
Output: true
Example 3:

Input: nums = [1,2,3,1,2,3], k = 2
Output: false
```

## Approach - building dictionary of position for each int

Buidling a dictionary that stores the position for each new integer in `nums`, or if it has already stored the integer in position, check the condition.

In [None]:
def CountingDuplicate(nums, k):
  if len(nums) <= 1:
    return False
  position = {}
  for i, num in enumerate(nums):
    if num in position and i - position[num] <= k:
      return True
    position[num] = i
  return False

# Example
print(CountingDuplicate([1, 2, 3, 1], 3))
print(CountingDuplicate([1, 0, 1, 1], 1))
print(CountingDuplicate([1, 1, 1, 3, 1, 2, 3], 2))


True
True
True
