# **Problem Statement**  
## **31. Implement a Bloom Filter**

Implement a Bloom Filter, a space-efficient probabilistic data structure that is used to test whether an element is possibly in a set or definitely not in a set.

Bloom filters allow false positives but no false negatives — meaning if the filter says an item is not in the set, it’s guaranteed to be true; but if it says it is, it might be wrong (a small chance of false positive).

You must design the following methods:
1. add(item) – Add an item to the Bloom filter.
2. check(item) – Return True if the item might be present, or False if definitely not present.

### Constraints & Example Inputs/Outputs

- n≤10^6 expected insertions
- False positive rate p≤0.01
- Elements are hashable (e.g., strings, integers).

### Example 1:
```python
bloom = BloomFilter(size=1000, hash_count=3)
bloom.add("apple")
bloom.add("banana")

bloom.check("apple")   # → True (probably)
bloom.check("grape")   # → False (definitely not)


### Solution Approach

A Bloom filter uses:
- A bit array of size m initialized to all 0s.
- k independent hash functions that map each element to a bit index in [0, m-1].

When adding an element:
- Compute all k hash values for the item.
- Set those indices in the bit array to 1.

When checking membership:
- Compute all k hash indices again.
- If any of them is 0, item is definitely not in the set.
- If all are 1, item is probably in the set.

### Solution Code

In [1]:
# Approach1: Brute Force Approach (For Conceptual Comparison)
# A brute-force set comparison (exact membership checking) can be shown as a reference baseline.

class ExactSet:
    def __init__(self):
        self.data = set()

    def add(self, item):
        self.data.add(item)

    def check(self, item):
        return item in self.data

Time Complexity: O(1) per operation

Space Complexity: O(n) (can be large)

### Alternative Solution

In [3]:
!pip3 install mmh3

Collecting mmh3
  Downloading mmh3-5.2.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.metadata (14 kB)
Downloading mmh3-5.2.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (103 kB)
Installing collected packages: mmh3
Successfully installed mmh3-5.2.0


In [5]:
!pip3 install bitarray

Collecting bitarray
  Downloading bitarray-3.7.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (34 kB)
Downloading bitarray-3.7.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (339 kB)
Installing collected packages: bitarray
Successfully installed bitarray-3.7.2


In [7]:
# Approach2: Optimized Approach (Bloom Filter Implementation)
import math
import mmh3  # MurmurHash3
from bitarray import bitarray

class BloomFilter:
    def __init__(self, n, p):
        """
        n: Expected number of elements
        p: Desired false positive probability
        """
        self.size = self.get_size(n, p)
        self.hash_count = self.get_hash_count(self.size, n)
        self.bit_array = bitarray(self.size)
        self.bit_array.setall(0)

    def add(self, item):
        for i in range(self.hash_count):
            index = mmh3.hash(item, i) % self.size
            self.bit_array[index] = 1

    def check(self, item):
        for i in range(self.hash_count):
            index = mmh3.hash(item, i) % self.size
            if self.bit_array[index] == 0:
                return False
        return True

    @staticmethod
    def get_size(n, p):
        """Returns the size of bit array (m) to use for n items and false positive rate p"""
        m = -(n * math.log(p)) / (math.log(2)**2)
        return int(m)

    @staticmethod
    def get_hash_count(m, n):
        """Returns the optimal number of hash functions (k)"""
        k = (m / n) * math.log(2)
        return int(k)


### Step by step walkthrough

Example Initialization

For 10,000 expected items and 1% false positive rate:

- Bit array size ≈ 95,875 bits (~12 KB)
- Number of hash functions ≈ 7

So, we trade a few bits of memory for huge space savings and O(k) lookup time.

### Alternative Approaches

| Method                    | Description                         | Pros                  | Cons                      |
| ------------------------- | ----------------------------------- | --------------------- | ------------------------- |
| **Counting Bloom Filter** | Uses counters instead of bits       | Supports deletion     | Slightly higher memory    |
| **Cuckoo Filter**         | Alternative structure using buckets | Lower false positives | More complex to implement |


### Test Cases 

In [8]:
def test_bloom_filter():
    bloom = BloomFilter(n=1000, p=0.01)
    elements_added = ["apple", "banana", "cherry", "mango"]
    elements_not_added = ["grape", "pineapple", "kiwi"]

    # Add elements
    for e in elements_added:
        bloom.add(e)

    # Check elements (added ones)
    for e in elements_added:
        assert bloom.check(e), f"{e} should probably be in the Bloom Filter"

    # Check elements (not added)
    false_positive_count = 0
    for e in elements_not_added:
        if bloom.check(e):
            false_positive_count += 1

    print("Elements added correctly detected.")
    print(f"False positives detected: {false_positive_count}/{len(elements_not_added)}")

test_bloom_filter()


Elements added correctly detected.
False positives detected: 0/3


## Complexity Analysis

| Operation | Time Complexity   | Space Complexity | Notes                      |
| --------- | ----------------- | ---------------- | -------------------------- |
| Add       | O(k)              | O(m) bits        | Very efficient             |
| Check     | O(k)              | O(m) bits        | False positives possible   |
| Delete    | Not supported     | —                | Cannot remove items safely |
| Accuracy  | Controlled by `p` | —                | Lower `p` = larger memory  |


#### Thank You!!