# **Hashing**

A hash function is a function that takes an input and deterministically converts it to an integer that is less than a fixed size set by the programmer. Inputs are called keys and the same input will always be converted to the same integer. 

When a hash function is combined with an array, it creates a ***hash*** map, also known as a ***hash table*** or ***dictionary*** With arrays, we ***map*** indices to values. With hash maps, we map ***keys*** to values, and a key can be almost anything. Typically, the only constraint on a hash map's key is that it has to be ***immutable*** (this is language dependent but generally a good rule of thumb). Values can be anything.
Every major language has a built-in implementation of a hash map. For example, in Python they're called dictionaries and declaring one is as simple as dic = {}

To summarize, a hash map is an unordered data structure that stores key-value pairs. A hash map can add and remove elements in ***O(1)***, as well as update values associated with a key and check if a key exists, also in ***O(1)***.

The biggest disadvantage of hash maps is that for smaller input sizes, they can be slower due to overhead. Because big O ignores constants, the O(1) time complexity can sometimes be deceiving - it's usually something more like O(10) because every key needs to go through the hash function, and there can also be ***collisions***, which we will talk about in the next section.

# Collisions

When different keys convert to the same integer, it is called a collision. Without handling collisions, older keys will get overridden and data will be lost. There are multiple ways to handle collisions, but here we'll talk about a common one called ***chaining***.

# Sets

A set is another data structure that is very similar to a hash table. It uses the same mechanism for hashing keys into integers. The difference between a set and hash table is that sets do not map their keys to anything. Sets are more convenient to use when you only care about checking if elements exist. You can add, remove, and check if an element exists in a set all in ***O(1)***.

An important thing to note about sets is that they don't track frequency. If you have a set and add the same element 100 times, the first operation adds it and the next 99 do nothing.

    # Declaration: a hash map is declared like any other variable. The syntax is {}
    hash_map = {}

    # If you want to initialize it with some key value pairs, use the following syntax:
    hash_map = {1: 2, 5: 3, 7: 2}

    # Checking if a key exists: simply use the `in` keyword
    1 in hash_map # True
    9 in hash_map # False

    # Accessing a value given a key: use square brackets, similar to an array.
    hash_map[5] # 3

    # Adding or updating a key: use square brackets, similar to an array.
    # If the key already exists, the value will be updated
    hash_map[5] = 6

    # If the key doesn't exist yet, the key value pair will be inserted
    hash_map[9] = 15

    # Deleting a key: use the del keyword. Key must exist or you will get an error.
    del hash_map[9]

    # Get size
    len(hash_map) # 3

    # Get keys: use .keys(). You can iterate over this using a for loop.
    keys = hash_map.keys()
    for key in keys:
        print(key)

    # Get values: use .values(). You can iterate over this using a for loop.
    values = hash_map.values()
    for val in values:
        print(val)

    my_hash_map = {}

    my_hash_map[4] = 83
    print(my_hash_map[4]) # Prints 83

    print(4 in my_hash_map) # Prints True
    print(854 in my_hash_map) # Prints False

    my_hash_map[8] = 327
    my_hash_map[45] = 82523

    for key, val in my_hash_map.items():
        print(f"{key}: {val}")

Example 1: 1. Two Sum

Given an array of integers nums and an integer target, return indices of two numbers such that they add up to target. You cannot use the same index twice.

In [2]:
def twoSum(nums, target):
    my_dict = {}
    
    for i in range(len(nums)):
        num = nums[i]
        comp = target - num
        
        if comp in my_dict:
            return[i, my_dict[comp]]
        my_dict[num] = i

    return [-1, -1]
            

In [3]:
nums = [5,2,7,3,7]
twoSum(nums, 8)

[3, 0]

Example 2: 2351. First Letter to Appear Twice

Given a string s, return the first character to appear twice. It is guaranteed that the input will have a duplicate character.

In [18]:
def repeatedCharacter(s):
    seen = set()
    for c in s:
        if c in seen:
            return c
        seen.add(c)
    return -1



In [19]:
s = "abccbaacz"
repeatedCharacter(s)

'c'

Example 3: Given an integer array nums, find all the unique numbers x in nums that satisfy the following: x + 1 is not in nums, and x - 1 is not in nums.

In [20]:
def find_numbers(nums):
    seen = set()
    for num in nums:
        if num+1 not in nums and num-1 not in nums:
            seen.add(num)
    return seen

# **Counting**

Counting is a very common pattern with hash maps. By "counting", we are referring to tracking the frequency of things. This means our hash map will be mapping keys to integers. Anytime you need to count anything, think about using a hash map to do it.

Example 1: You are given a string s and an integer k. Find the length of the longest substring that contains at most k distinct characters.

For example, given s = "eceba" and k = 2, return 3. The longest substring with at most 2 distinct characters is "ece".

In [21]:
from collections import defaultdict

def find_longest_substring(s, k):
    counts = defaultdict(int)
    left = ans = 0
    for right in range(len(s)):
        counts[s[right]] += 1
        while len(counts) > k:
            counts[s[left]] -= 1
            if counts[s[left]] == 0:
                del counts[s[left]]
            left += 1
        
        ans = max(ans, right - left + 1)
    
    return ans

Example 2: 2248. Intersection of Multiple Arrays

Given a 2D array nums that contains n arrays of distinct integers, return a sorted array containing all the numbers that appear in all n arrays.

For example, given nums = [[3,1,2,4,5],[1,2,3,4],[3,4,5,6]], return [3, 4]. 3 and 4 are the only numbers that are in all arrays.

In [44]:
def intersection(nums):
    ans = {}
    result = []
    for arr in nums:
        seen = set()
        for i in arr:
            if i not in seen:
                seen.add(i)
                if i in ans:
                    ans[i] += 1
                else:
                    ans[i] = 1
    
    for key, val in ans.items():
        if val == len(nums):
            result.append(key)
    return sorted(result)



In [45]:
nums = [[7,34,45,10,12,27,13],[27,21,45,10,12,13]]
intersection(nums)

[10, 12, 13, 27, 45]

Example 3: 1941. Check if All Characters Have Equal Number of Occurrences

Given a string s, determine if all characters have the same frequency.

For example, given s = "abacbc", return true. All characters appear twice. Given s = "aaabb", return false. "a" appears 3 times, "b" appears 2 times. 3 != 2.

In [None]:
def areOccurrencesEqual( s):
    md = {}
    for i in range(len(s)):
        if s[i] in md:
            md[s[i]] += 1
        else:
            md[s[i]] = 1
    
    #method 1: set()
    return len(set(md.values())) == 1
    
    #method 1: set()
    # first_val = next(iter(my_dict.values()))
    # for val in ans.values():
    #     if val != first_val:
    #         return False
    # return True


# Count the number of subarrays with an "exact" constraint

Example 4: 560. Subarray Sum Equals K

Given an integer array nums and an integer k, find the number of subarrays whose sum is equal to k.

In [77]:
def subarraySum(nums, k):
    count = defaultdict(int)
    count[0] = 1
    prefix_sum = 0
    ans = 0
    for num in nums:
        prefix_sum += num  # Update the running sum
        if prefix_sum - k in count:
            ans += count[prefix_sum - k]  # Add the count of (prefix_sum - k) to ans
        count[prefix_sum] += 1  # Increment the count of the current sum
    return ans

In [78]:
nums = [1, 2, 1, 2, 1]
subarraySum(nums, 3)

4

Example 5: 1248. Count Number of Nice Subarrays

Given an array of positive integers nums and an integer k. Find the number of subarrays with exactly k odd numbers in them.

For example, given nums = [1, 1, 2, 1, 1], k = 3, the answer is 2. The subarrays with 3 odd numbers in them are [1, 1, 2, 1, 1] and [1, 1, 2, 1, 1].

In [None]:
from collections import defaultdict

class Solution:
    def numberOfSubarrays(self, nums: List[int], k: int) -> int:
        counts = defaultdict(int)
        counts[0] = 1
        ans = curr = 0
        
        for num in nums:
            curr += num % 2 # if a number is odd, then when you take it mod 2, the result will be 1. Otherwise, it will be 0.
            ans += counts[curr - k]
            counts[curr] += 1

        return ans

 

# More hashing examples

Hash maps are nearly ubiquitous. We've talked about some of the most common patterns, but there is an unlimited number of ways you can incorporate hash maps into an algorithm. Because of how important hash maps are, we'll look at a couple more examples of how hash maps can be used in various problems. It is crucial that you are comfortable with hash maps if you want to pass interviews.

Example 1: 49. Group Anagrams

Given an array of strings strs, group the anagrams together.

For example, given strs = ["eat","tea","tan","ate","nat","bat"], return [["bat"],["nat","tan"],["ate","eat","tea"]].

In [35]:
from collections import defaultdict
def groupAnagrams(strs):       
    anagrams = defaultdict(list)
    ans = []
    for word in strs:
        sorted_word = ''.join(sorted(word))
        anagrams[sorted_word].append(word)
    return anagrams.values()


In [36]:
strs = ["eat","tea","tan","ate","nat","bat"]
groupAnagrams(strs)

dict_values([['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']])

Example 2: 2260. Minimum Consecutive Cards to Pick Up

Given an integer array cards, find the length of the shortest subarray that contains at least one duplicate. If the array has no duplicates, return -1.

In [62]:
def minimumCardPickup(cards):
    val_pos_dict = defaultdict(list)
    ans = float('inf')
    for i in range(len(cards)):
        val_pos_dict[cards[i]].append(i)

    for pos in val_pos_dict.values():
        if len(pos) >= 2:
            for i in range(1, len(pos)):
                # Calculate the distance between two adjacent occurrences
                distance = pos[i] - pos[i - 1] + 1
                ans = min(ans, distance)
    return ans if ans < float("inf") else -1

In [63]:
cards = [3,4,2,3,4,7]
minimumCardPickup(cards)

4