#### [Python <img src="../../assets/pythonLogo.png" alt="py logo" style="height: 1em; vertical-align: sub;">](../README.md) | Easy 🟢 | [Arrays & Hashing](README.md)
# [705. Design HashSet](https://leetcode.com/problems/design-hashset/description/) (In prog 👷)

Design a HashSet without using any built-in hash table libraries.

Implement `MyHashSet` class:
- `void add(key)` Inserts the value `key` into the HashSet.
- `bool contains(key)` Returns whether the value `key` exists in the HashSet or not.
- `void remove(key)` Removes the value `key` in the HashSet. If `key` does not exist in the HashSet, do nothing.
 
#### Example 1:
> **Input**:  
> `["MyHashSet", "add", "add", "contains", "contains", "add", "contains", "remove", "contains"]`  
> `[[], [1], [2], [1], [3], [2], [2], [2], [2]]`  
> **Output**:  
> `[null, null, null, true, false, null, true, null, false]`  
> 
> **Explanation**  
> `MyHashSet myHashSet = new MyHashSet();`  
> `myHashSet.add(1);`      // set = [1]  
> `myHashSet.add(2);`      // set = [1, 2]  
> `myHashSet.contains(1);` // return True  
> `myHashSet.contains(3);` // return False, (not found)  
> `myHashSet.add(2);`      // set = [1, 2]  
> `myHashSet.contains(2);` // return True  
> `myHashSet.remove(2);`   // set = [1]  
> `myHashSet.contains(2);` // return False, (already removed)

#### Constraints:
- $0 \leq$ `key` $ \leq 10^6$
- At most $10^4$ calls will be made to add, remove, and contains.


## Problem Explanation
- The task here is to design a simple HashSet, which is a data structure that stores a collection of unique elements.
- Unlike a list or array, a HashSet does not allow for duplicate elements, and the order of elements is not guaranteed.
- The challenge is to implement this functionality without using or reluing on built-in hash table libraries, there are several methods and adata structures we can do this with.
***

# Approach 1: Dynamic Array as a Bucket
For this approach we'll use a dynamic array (aka Python List) to store elements of the HashSet. Each element will be placed directly into the array, with a few checks in place to ensure we don't have duplicates inserted

## Intuition
- The dynamic array offers a straightforward way to store elements with constant-time access for adding, checking, and removing elements. (Assuming that these operations are not frequently performed at a large scale.
- This approach leverages Python's built-in list operations to manage elements, which will give a pretty straightforward implementation framework for the HashSet.

## Algorithm
1. **Initialization**: Create an empty dynamic array to represent the `HashSet`.
2. **`Add(key)`:** 
    - Before adding a new `key`, check if it already exists in the `HashSet` to maintain the uniqueness of elements.
    - If the key is not present, append it to the array.
3. **`Remove(key)`:** 
    - Check if the key already exists in the `HashSet`.
    - If so, remove the key from the array.
4. **`Contains(key)`:** 
    - Return `True` if the key exists in the `HashSet`; otherwise return `False`.

## Code Implementation

In [1]:
class MyHashSet:

    def __init__(self):
        # Initialize the dynamic array for the HashSet
        self.hashset = []

    def add(self, key: int) -> None:
        # Add a key to the HashSet if it's not already present
        if not self.contains(key):
            self.hashset.append(key)

    def remove(self, key: int) -> None:
        # Remove the key from the HashSet if it exists
        if self.contains(key):
            self.hashset.remove(key)

    def contains(self, key: int) -> bool:
        # Check if the key exists in the HashSet
        return key in self.hashset

# Your MyHashSet object will be instantiated and called as such:
# obj = MyHashSet()
# obj.add(key)
# obj.remove(key)
# param_3 = obj.contains(key)

### Testing

In [13]:
def test_myHashSet(HashSetClass):
    print("Testing Implementation of MyHashSet")
    print("-----------------------------------")

    # Initialize the HashSet with the given class
    myHashSet = HashSetClass()
    simulated_state = set()  # Simulate the expected state of the HashSet for display
    operations = [
        ('add', 1),
        ('add', 2),
        ('contains', 1, True),
        ('contains', 3, False),
        ('add', 2),  # Duplicate add to test idempotence
        ('contains', 2, True),
        ('remove', 2),
        ('contains', 2, False)
    ]

    test_passed = True
    test_counter = 1

    for operation in operations:
        op, key, *expected_result = operation

        if op == 'add':
            myHashSet.add(key)
            simulated_state.add(key)
            print(f"After '{op}({key})': HashSet state = {sorted(simulated_state)}")
        elif op == 'remove':
            myHashSet.remove(key)
            simulated_state.discard(key)  # Use discard to avoid KeyError if key is not present
            print(f"After '{op}({key})': HashSet state = {sorted(simulated_state)}")
        elif op == 'contains':
            result = myHashSet.contains(key)
            expected = expected_result[0]
            print(f"Operation '{op}({key})': Expected = {expected}, Got = {result}")
            if result == expected:
                print(f"Test {test_counter}: Passed.")
            else:
                print(f"Test {test_counter}: Failed. Expected = {expected}, Got = {result}")
                test_passed = False
            test_counter += 1
            continue  # Skip the next print statement for 'contains' operation

        print(f"Test {test_counter}: Passed.")
        test_counter += 1

    if test_passed:
        print("All tests passed for MyHashSet.")
    else:
        print("Some tests failed for MyHashSet.")

print("Approach: Dynamic Array as a Bucket")
test_myHashSet(MyHashSet)


Approach: Dynamic Array as a Bucket
Testing Implementation of MyHashSet
-----------------------------------
After 'add(1)': HashSet state = [1]
Test 1: Passed.
After 'add(2)': HashSet state = [1, 2]
Test 2: Passed.
Operation 'contains(1)': Expected = True, Got = True
Test 3: Passed.
Operation 'contains(3)': Expected = False, Got = False
Test 4: Passed.
After 'add(2)': HashSet state = [1, 2]
Test 5: Passed.
Operation 'contains(2)': Expected = True, Got = True
Test 6: Passed.
After 'remove(2)': HashSet state = [1]
Test 7: Passed.
Operation 'contains(2)': Expected = False, Got = False
Test 8: Passed.
All tests passed for MyHashSet.


## Complexity Analysis
- ### Time Complexity: 
    - **`add(key)`:** $O(n)$ in the worst case, since we need to check if the key exists before adding it.
    - **`remove(key)`:** $O(n)$ since it needs to search the key before removing it.
    - **`contains(key)`:** $O(n)$ due to the search through of the array to find the key.
- ### Space Complexity: $O(n)$
    - $n$ is the number of unique keys added to the HashSet. This space is required to store the keys in the dynamic array.
***

# Approach 2: Array of Buckets with Chaining
So, we are going to stick with using an array and this time use is to store buckets, where each bucket is a list that chains together all the elements that hash to the same index. This is a classical approach to handling collisions in a hash table.

## Intuition
- The main idea is to distribute all possible keys evenly across a fixed number of buckets to minimize the chances of collisions.
- When a collision does occur (i.e. two different keys hash to the same bucket), the colliding elements are stored together in a list (the chain) at that bucket's index. 
- By doing so, the load is distributed, and while searching for an element, we only need to look through the elements in the corresponding bucket.

## Algorithm
1. **Initialization:** Create an array of empty buckets (lists) based on a predefined size. (We can call this `numBuckets`.)
2. **Hash Function (`hash:`):** Using a simple mod operation `key % numBuckets` can uniformly distribute the keys to the available buckets.
3. **`add(key)`:**
    - Compute the index for the key
    - Add the key to the bucket if it's not already present to ensure uniqueness within the `HashSet`.
4. **`remove(key)`:**
    - Find a bucket where the key would be
    - Remove the key from the bucket if present
5. **`contains(key)`:**
    - Check if the key is in the corresponding bucket

## Code Implementation

In [10]:
class MyHashSet2:
    def __init__(self, numBuckets=1000):
        # Initialize the buckets for the HashSet
        self.buckets = [[] for _ in range(numBuckets)]
        self.numBuckets = numBuckets    # Number of buckets in the HashSet

    # the underscore means that this method is private
    def _hash(self, key):
        return key % self.numBuckets    # Hash function to determine the bucket index
    
    def add(self, key: int) -> None:
        bucket_index = self._hash(key)  # Find the bucket index for the key
        if key not in self.buckets[bucket_index]:  # Add the key to the bucket if it's not present
            self.buckets[bucket_index].append(key)
    
    def remove(self, key: int) -> None:
        bucket_index = self._hash(key)  # Find the bucket index for the key
        bucket = self.buckets[bucket_index] # Get the bucket for the key
        if key in bucket:   # Remove the key from the bucket if it's present
            bucket.remove(key)
    
    def contains(self, key: int) -> bool:
        bucket_index = self._hash(key)  # Find the bucket index for the key
        return key in self.buckets[bucket_index]    # Check if the key exists in the bucket

## Testing

In [14]:
print("Approach: Array of Buckets with Chaining")
test_myHashSet(MyHashSet2)

Approach: Array of Buckets with Chaining
Testing Implementation of MyHashSet
-----------------------------------
After 'add(1)': HashSet state = [1]
Test 1: Passed.
After 'add(2)': HashSet state = [1, 2]
Test 2: Passed.
Operation 'contains(1)': Expected = True, Got = True
Test 3: Passed.
Operation 'contains(3)': Expected = False, Got = False
Test 4: Passed.
After 'add(2)': HashSet state = [1, 2]
Test 5: Passed.
Operation 'contains(2)': Expected = True, Got = True
Test 6: Passed.
After 'remove(2)': HashSet state = [1]
Test 7: Passed.
Operation 'contains(2)': Expected = False, Got = False
Test 8: Passed.
All tests passed for MyHashSet.


## Complexity Analysis
- **Variables**:
    - $n$ is the number of elements
    - $k$ is the number of buckets
    - $m$ is the number of unique elements inserted into the HashSet.
    
- ### Time Complexity: 
    - **Average case for add, remove, & contains:** $O(1 + \frac{n}{k})$
        - Ideally, the hash function should distribute the elements uniformly across the buckets. 
        - Assuming that there is a good distribution, the average time complexity should be $O(1 + \frac{n}{k})$, where $\frac{n}{k})$ represents the average number of elements per bucket, this is also known as the load factor.
        - However, since the load factor $\frac{n}{k}$ is likely to be low and $k$ being sufficiently large, we can approximate the average time to $O(1)$ in the best case.
    - **Worst Case:**
        - The worst case for the operations would degrade to $O(n)$ in scenarios where the hash function doesn't distribute the elements evenly, which would cause a bunch of elements to be grouped in a single bucket.
- ### Space Complexity: $O(k + m)$
    - This reflects the space being used since $k$ is the number of buckets which is a fixed amount, and $m$ is the number of unique elements being inserted into the `HashSet`.
    - $O(k)$ accounts for the initial array of bucket lists.
    - $O(m)$ accounts for the actual data (unique elements) being stored into the `HashSet`.
***

# Approach 3: LinkedList as Bucket
This approach to implement a HashSet uses an array of linked lists in the form of buckets, where each bucket is responsible for storing all the keys that hash to the same index. This approach is similar to the previous and also utilizes "chaining.

## Intuition
- The primary intuition behind using a LinkedList is that they let us handle collisions efficiently. 
- When two or more keys hash to the same index, rather than overwriting the existing key, the new key added to the end of the linked list at that index. This way each bucket can store multiple keys, and the linked list structure allows for efficient insertion and deletion.

## Algorithm
1. **Initialization:** Create an array where each element is the head of a linked list (which is initially `null`) to represent the buckets.
2. **Hash Function:** A simple hash function that maps a key to a bucket index. This is usually done by the modulus operation (i.e. `key % number_of_buckets`).
3. **Add key:**
    - Compute the bucket index for the key.
    - Check if the key already exists in the linked list at that bucket index. If it doesn't, insert it to the beginning or end of the list.
4. **Remove key:**
    - Find the bucket for the key.
    - Search the linked list at that index for the key and remove it if found.
5. **Contains Key**:
    - Determine the bucket index for the key.
    - Search the linked list at that bucket index to check if the key exists.

## Code Implementation

In [15]:
class ListNode:
    def __init__(self, key, next=None):
        self.key = key
        self.next = next

class MyHashSet3:
    def __init__(self):
        self.size = 1000                     # size of the HashSet
        self.buckets = [None] * self.size    # Initialize the buckets for the HashSet

    def _hash(self, key):
        return key % self.size    # Hash function to determine the bucket index
    
    def add(self, key: int) -> None:
        index = self._hash(key)    # Find the bucket index for the key
        if not self.buckets[index]:         # If the bucket is empty, create a new ListNode with the key
            self.buckets[index] = ListNode(key)
        else:                    # If the bucket is not empty, traverse the list to find the key
            curr = self.buckets[index]     
            while curr.next:        
                if curr.key == key:     # key already exists, return   
                    return
                if not curr.next:      
                    curr.next = ListNode(key)       # insert the key at the end of the list
                    return
                curr = curr.next            

    def remove(self, key: int) -> None: 
        index = self._hash(key)    # Find the bucket index for the key
        curr = self.buckets[index] # Get the head node for the bucket
        prev = None               # Initialize the previous node to None
        while curr:
            if curr.key == key:     # If the key is found
                if prev:        # If the previous node exists
                    prev.next = curr.next       # remove by skipping the current node
                else:       # If the previous node does not exist
                    self.buckets[index] = curr.next     # remove the head node
                return    # return after removing the key
            prev = curr                     # update the previous node
            curr = curr.next                # move to the next node
    
    def contains(self, key: int) -> bool:
        index = self._hash(key)    # Find the bucket index for the key
        curr = self.buckets[index]
        while curr:             # Traverse the list to find the key
            if curr.key == key:    # If the key is found, return True
                return True
            curr = curr.next        # Move to the next node
        return False        # If the key is not found, return False

## Testing

In [16]:
print("Approach: LinkedList as Bucket")
test_myHashSet(MyHashSet3)

Approach: LinkedList as Bucket
Testing Implementation of MyHashSet
-----------------------------------
After 'add(1)': HashSet state = [1]
Test 1: Passed.
After 'add(2)': HashSet state = [1, 2]
Test 2: Passed.
Operation 'contains(1)': Expected = True, Got = True
Test 3: Passed.
Operation 'contains(3)': Expected = False, Got = False
Test 4: Passed.
After 'add(2)': HashSet state = [1, 2]
Test 5: Passed.
Operation 'contains(2)': Expected = True, Got = True
Test 6: Passed.
After 'remove(2)': HashSet state = [1]
Test 7: Passed.
Operation 'contains(2)': Expected = False, Got = False
Test 8: Passed.
All tests passed for MyHashSet.


## Complexity Analysis
- **Variables**:
    - $N$ is the total number of keys
    - $K$ is the number of buckets
    - $M$ is the number of unique elements inserted into the HashSet.
    
- ### Time Complexity: 
    - **Average case for add, remove, & contains:** $O(1+\frac{N}{K})$
        - On average, these operations account for the hash computation and also the average length of the linked list to search through.
        - In the worst casem where all the keys are in the same bucket, the runtime of the operations will degrade to $O(N)$
- ### Space Complexity: $O(K + M)$
    - $O(K)$ accounts for the space of the buckets
    - $O(M)$ accounts form the space of the LinkedList nodes.
***

# Approach 4:


## Intuition


## Algorithm


# Approach 5: Using Python Built-in Set 😔


## Intuition


## Algorithm
