# 12.6 Takeaway - Subarray covering keywords or Sliding Window

## Problem:
Find the smallest subarray of an array1 covering all the keywords in array2.

## Example:
paragraph = ['a', 'b', 'c', 'b', 'a', 'd', 'c', 'a', 'e', 'a', 'a', 'b', 'e']
keywords = ['b', 'c', 'e']

Output = Subarray(3, 8)

## Brainstorming

- This problem is doable with a hashtable to track the indexes of where the keywords are located. 

- We can essentially just iterate through the paragraph array, and we just need 2 variables (start and end) to track between the keywords.
  - The start, end variales should be updated whenever we find a keyword in paragraph

- To make this problem simpler, we can create a helper function for comparing subarrays.

- This problem also benefits from using a namedtuple class since we are returning a range including the start and end variables.


## First Algorithm (O(n * keywords))

My first version of this was doing a min and max on the keyword hashmap's values to track where the start and end need to be whenever we updated the hashmap.

This is a problem because the runtime ends up being O(n * k) because we could potentially check up to the length of the keywords array on every iteration of the paragraph array.

In [1]:
import collections

Subarray = collections.namedtuple('Subarray', ('start', 'end'))

def find_smallest_subarray_covering_set(paragraph, keywords):
    # Convenience function for comparing minimum Subarrays
    def min_subarray(sub1, sub2):
        return sub2 if (sub2.end - sub2.start) < (sub1.end - sub1.start) else sub1

    kw_map = {}
    smallest_subarray = Subarray(0, len(paragraph))

    start, end = 0, 0

    for i, entry in enumerate(paragraph):
        if entry in keywords:
            # If a match is found, update the kw_map[entry]
            kw_map[entry] = i
            # Only need to update if we’re replacing an existing key in the map
            if entry in kw_map:
                start = min(kw_map.values())
                end = max(kw_map.values())
            if len(kw_map.keys()) == len(keywords):
                smallest_subarray = min_subarray(smallest_subarray, Subarray(start, end))
    return smallest_subarray

    

paragraph = ['a', 'b', 'c', 'b', 'a', 'd', 'c', 'a', 'e', 'a', 'a', 'b', 'e']
keywords = ['b', 'c', 'e']

print(find_smallest_subarray_covering_set(paragraph, keywords))

Subarray(start=3, end=8)



## O(n) Optimization with a Doubly Linked List

We want to shoot for O(n). To do this, we need a more optimized solution that shouldn't call min() and max() on the kw_map's values. 

The end/max is *always* going to be the current index we are on because it is the latest we've seen so far! This is an easy fix.

Now the hard part is updating the start/min inside kw_map. (This'll be a good refresher on linked lists :) )

We can keep track of this with a doubly linked list. Our inserts whether using a deque or linked list will be O(1), but removals from the middle of the list will only be O(1) with a doubly linked list. With a deque library, it will be O(n) to remove from the middle of list, so let's implement the doubly linked list!

Basically we can track the start variable as the head of the list. 

We would have the hashmap store linkedlist nodes, so that when it comes time to replace a value in the hashmap, we can easily reference the node and remove it with it's prev and next pointer in O(1) rather than iterating to find it in O(n) time.

We must also update the tail as the latest entry

In [3]:
import collections

Subarray = collections.namedtuple('Subarray', ('start', 'end'))

class DoublyLinkedListNode:
    def __init__(self, data=None):
        self.data = data
        self.next = self.prev = None

class LinkedList:
    def __init__(self):
        self.head = self.tail = None
        self.length = 0
    
    def __len__(self):
        return self.length

    def append(self, value):
        node = DoublyLinkedListNode(value)
        node.prev = self.tail
        if self.tail:
            self.tail.next = node
        else:
            self.head = node
        self.tail = node
        self.length += 1

    def remove(self, node):
        if node.next:
            node.next.prev = node.prev
        else:
            self.tail = node.prev
        if node.prev:
            node.prev.next = node.next
        else:
            self.head = node.next
        
        node.next = node.prev = None
        self.length -= 1

In [4]:
def find_smallest_subarray_covering_set_w_linkedlist(paragraph, keywords):
    # Convenience function for comparing minimum Subarrays
    def min_subarray(sub1, sub2):
        return sub2 if (sub2.end - sub2.start) < (sub1.end - sub1.start) else sub1

    kw_map = {}
    link_list = LinkedList()
    smallest_subarray = Subarray(0, len(paragraph))

    start, end = 0, 0

    for i, entry in enumerate(paragraph):
        if entry in keywords:
            
            # Get node from a hashmap
            node = kw_map.get(entry, None)
            
            # Only need to update if we’re replacing an existing key in the map. Remove so there won't be a duplicate
            if entry in kw_map:
                link_list.remove(node)
            
            # If a match is found, update the kw_map[entry] to our newly added value aka our tail of the linked list 
            link_list.append(i)
            kw_map[entry] = link_list.tail

            # Start will always be at the beginning of the linked list
            start = link_list.head.data
            end = i
            
            if len(kw_map.keys()) == len(keywords):
                smallest_subarray = min_subarray(smallest_subarray, Subarray(start, end))
    return smallest_subarray

paragraph = ['a', 'b', 'c', 'b', 'a', 'd', 'c', 'a', 'e', 'a', 'a', 'b', 'e']
keywords = ['b', 'c', 'e']

print(find_smallest_subarray_covering_set_w_linkedlist(paragraph, keywords))

Subarray(start=3, end=8)
