# DGIM Implementation

The DGIM algorithm is used to count the number of "True" (1) present in a stream of 1/0 considering only the last N elements. N elements essentially constitute the sliding window. The implementation is done using a compact data structure that consumes $O(log^2(N))$ space. Explanation can be found in Chapter 4 of the book.

In the initialize function below. Intialize the following:
1. Set the error rate. Make sure that the error rate is in [0, 1]. Let c be the actual result and e be the result generated by the implmentation. Then $abs(c-e) < error\_rate * c$.

2. Now set the number of buckets (b) to be maintained in the algorithm. Set $b=1/error\_rate$. Check that the maximum number of buckets of the same size is 2.

3. The data structure to hold the buckets can be an array of queues. queue[i] will hold the timestamp of the bucket in descending order. This helps in updating the buckets and also knowing at any point of time that how many buckets of the same size there is.
        a. So, first intialize an array.
        b. Now, find out the maximum number of buckets that can be needed which is log(N) (base 2).
        c. For values computed above, initialize 1 queue each.
4. Set two variables with currentTimeStamp and oldestTimeStamp

In [40]:
##WRITE THE FUNCTION ACCORDINGLY AS INSTRUCTED
import math
from collections import deque
def initialize(N, error_rate = 0.5):
    if not (0 < error_rate <= 1):
            error_msg = ("Invalid value for error_rate: {}. "
                         "Error rate should be in ]0, 1].".format(error_rate))
            raise ValueError(error_msg)
    b = math.ceil(1/error_rate)
    b = max(b, 2)
    queues = []
    if N == 0:
        max_index = -1
    else:
        max_index = int(math.ceil(math.log(N)/math.log(2)))
    queues = [deque() for _ in range(max_index + 1)]
    currentTimeStamp=0
    oldest_timestamp=-1
    return [currentTimeStamp,oldest_timestamp,b,queues]

Now write the update function for adding a new element by doing the following:

1. If the new element is 1. (Do nothing for 0)
    1. Update the currentTimeStamp with a maximum value of 2*N
    2. Check if the oldest bucket needs to be removed by checking if it is too old. If it is then delete that.
3. Update the oldestTimeStamp with currentTimeStamp
4. For all the elements in the queue do the following:
    1. Add the new element
    2. pop the last two elements from the queue.
    3. Merge last two buckets if needed

In [48]:
list=
def update(element):
    if N == 0:
        return
    currentTimeStamp = (currentTimeStamp + 1) % (2 * N)
    #check if oldest bucket should be removed
    if (oldest_timestamp >= 0 and checkIfBucketTooOld(oldest_timestamp)):
        deleteOldestBucket()
    if elt is not True:
        #nothing to do
        return
    carry_over = currentTimeStamp
    if oldest_timestamp == -1:
        oldest_timestamp = self.currentTimeStamp
    for queue in self._queues:
        queue.appendleft(carry_over)
        if len(queue) <= b:
            break
        last = queue.pop()
        second_last = queue.pop()
        # merge last two buckets.
        carry_over = second_last
        if last == oldest_timestamp:
            oldest_timestamp = second_last

Now write the count function to count the number of elements in N and return the estimate of the number of 1s.
1. For all the queues:
    1. Find queue length
    2. For every non-null queue increment count variable with the length times the power of two (starting from 1).
2. Return half of the count computed in the step above.

In [45]:
def count():
    result = 0
    max_value = 0
    power_of_two = 1
    for queue in self._queues:
        queue_length = len(queue)
        if queue_length > 0:
            max_value = power_of_two
            result += queue_length * power_of_two
        power_of_two = power_of_two << 1
    result -= math.floor(max_value/2)
    return int(result)

You can write some auxilliary functions to deleteOldestBucket() and checkIfBucketTooOld()

In [46]:
def deleteOldestBucket(self):
    """Drop oldest bucket timestamp."""
    for queue in reversed(self._queues):
        if len(queue) > 0:
            queue.pop()
            break
    #update oldest bucket timestamp
    oldest_bucket_timestamp = -1
    for queue in reversed(self._queues):
        if len(queue) > 0:
            oldest_bucket_timestamp = queue[-1]
            break

In [47]:
def checkIfBucketTooOld(bucket_timestamp):
    # the buckets are stored modulo 2 * N
    return (currentTimeStamp - bucket_timestamp) % (2 * N) >= N

Now make a call to initialize, make some updates and check the result that you get.

In [38]:
dgim = initialize(32, 0.5)
for i in range(100):
    update(True)
c = count() # This should return something like 30 when the actual count is 32.

UnboundLocalError: local variable 'currentTimeStamp' referenced before assignment