**5th June 2021 to 19th June 2021**

# Problems covered

1. Find the topK most frequent items in a list. Return in any order. Solution must be better than $O(NlogN)$
2. Median of two sorted lists `m` and `n`. Solution must be better than or equal to `log(m + n)`.
3. Make sentences by breaking a string into valid words.
4. Regex matching for `.` and `*`
5. Merge two sorted linked lists by splicing.
6. Find the minimum substring that contains all characters in `t`
7. CountAndSay
8. Check if a string can be broken into words using a dictionary.
9. $(6^*)$ Find the minimum substring that contains all characters in `t`; revisited.
10. Deepcopy linked-list with random pointers, and no extra space.

## 1. Find the topK most frequent items in a list. Return in any order. Solution must be better than $O(NlogN)$

`[1 2 1 1 2 3]` -- for `k=2`, you should return [1, 2]

### Questions

You can run a counter to get the counter -- O(n)

To get the top K most frequent, you'll have sort by counts, and then pick first K. Sorting will take $nLogn$.

**Solution**
* Start a heap with K elements --> heapify will take K.log.K time
* For the remaining (N - K) elements, pop the smallest element, and then add the next element

KlogK + (N - K)logK ==> NlogK time total.

In [7]:
import heapq
from collections import Counter

class TopKMostFrequent:
    def solve(self, items, k):
        counts = Counter(items).items()
        size = len(counts)
        counts = iter(counts)
        
        holder = []
        for _ in range(k):
            item, freq = next(counts)
            heapq.heappush(holder, (freq, item))
            
        for _ in range(size - k):
            item, freq = next(counts)
            if freq > holder[0][0]:
                heapq.heappop()
                heapq.heappush(holder, (freq, item))
                
        return [t[1] for t in holder]

In [8]:
o = TopKMostFrequent()

In [13]:
o.solve([1, 2, 1, 1, 2, 3], 2)

[2, 1]

## 2. Median of two sorted lists `m` and `n`. Solution must be better than or equal to `log(m + n)`.

There is a linear scan solution -- put everything in one list, and then compute its median.

This will be `O(m+n)`.

### Questions
```
   x1 | x2 x2 x4
y1 y2 y3 y4 | y5 y6
```

The median value will split the JOINT lists into two parts, such that:
* y4 <= x2 and x1 <= y5

Assuming that the first list is smaller, we just need to run a binary search over it, and check each time if we meet this condition.

In [37]:
class MedianOfTwoSorted:
    def solve(self, arr1, arr2):
        s1, s2 = len(arr1), len(arr2)
        if len(arr2) < len(arr1):
            arr1, arr2 = arr2, arr1
            s1, s2 = s2, s1
            
        i = 0
        j = s1
        
        while i <= j:
            mid = i + (j - i)//2
            x2 = arr1[mid] if mid < s1 else float('inf')
            x1 = arr1[mid-1] if mid > 0 else -float('inf')
            
            other = (s1+s2)//2 - mid
            
            print(mid, other)
            
            y2 = arr2[other] if other < s2 else float('inf')
            y1 = arr2[other-1] if other > 0 else -float('inf')
            
            if x1 <= y2 and y1 <= x2:
                if (s1+s2)%2 == 0:
                    return (max(x1, y1) + min(x2, y2))/2.0
                else:
                    return min(x2, y2)
            elif x1 > y2:
                j = mid - 1
            else:
                i = mid + 1

In [38]:
o = MedianOfTwoSorted()

In [42]:
o.solve([0, 0, 1], [1, 2, 3, 4, 9])

1 3
2 2
3 1


1.5

## 3. From a string, construct all possible valid sentences by splitting into different words.

`applet, {'app', 'applet', 'ap', 'let'}` ==> `[applet, app let]`

`bluesky, {'blues', 'sky'}` ==> `[]`

All words in the sentence should be in the dictionary.

A word from the dictionary may be used/repeated multiple times.

### Questions.

I have to create all possible sentences, starting from the first character.

1. Start from 1st char, and move to next char, if you can continue to construct ANY word with the first two chars as predfix.
2. At each point, keep checking IF a word can be made at this point AND word can be made from the NEXT char ...
    * If yes, then you can recursive-call at this letter
    * Else, continue on original path ...

In [241]:
from collections import defaultdict

class MakeSentences:
    def solve(self, text, words):
        """
        text: Input text, which you have to split to make words.
        words: List of valid words. Can be reused.
        
        Soln1
        =====
        
        Start iterating from the first letter
            Check all prefixes in the set.
                If you find a valid prefix:
                    Prefix can be added to sentence IFF you can split the remainder as well.
                    If remainder can be split, then take a cross with it's sentences, and add to master.
                    
        return master
        """
        words = set(words)

        if len(text) == 0:
            return []
        if len(text) == 1:
            return [] if text not in words else [text]
        
        self.ans = []
        def search(start, temp):
            if start == len(text):
                self.ans.append(' '.join(temp))
            
            for i in range(start, len(text)):
                prefix = text[start:i+1]
                if prefix in words:
                    search(i+1, temp+[prefix])
        
        # search(0, [])
        # return self.ans
    
        self.memo = {}
        def withmemo(start):
            if start in self.memo:
                return self.memo[start]
            
            master = []
            for i in range(start, len(text)):
                prefix = text[start : i+1]
                if prefix in words:
                    if i+1 == len(text):
                        master.append([prefix])
                    else:
                        future = withmemo(i+1)
                        if len(future):
                            for sent in future:
                                master.append([prefix]+sent)
        
            self.memo[start] = master
            return master
        
        master = withmemo(0)
        return [' '.join(s) for s in master]

`applet, {'app', 'applet', 'ap', 'let'}` ==> `[applet, app let]`

In [242]:
o = MakeSentences()

In [243]:
o.solve('applet', ['app', 'applet', 'ap', 'let', 'p'])

['ap p let', 'app let', 'applet']

## 4. Regex matching for `.` and  `*`.

Give a pattern, and an input string, check if the entire string matches the full pattern.

`s = "ab", p = ".*"` ==> `true`
* `*` means that the preceeding character can repeat 0 or more times
* `.` means any chracter can appear once
* `.*` means the character `.` can appear 0 or more times

`s = "aab", p = "c*a*b"` ==> `true`

In [269]:
from string import ascii_lowercase
from collections import deque

class RegexMatch:
    def solve(self, s, p):
        if len(s) == len(p) == 0:
            return True
        if len(p) == 0:
            return False
        
        pattern = []
        letters = set(ascii_lowercase)
        
        i = len(p) - 1
        while i >= 0:
            if p[i] in letters or p[i] == '.':
                pattern.append(p[i])
                i -= 1
            else:
                pattern.append(p[i-1:i+1])
                i -= 2
                
        pattern = pattern[::-1]
        print("pattern:", pattern)
        
        lower = len([x for x in pattern if x[-1] != '*'])
        if len(s) < lower:
            return False
        
        # Can I just solve it via DFS?
        def dfs(i, j):
            if i < len(s) and j < len(pattern):
                if pattern[j] in letters:
                    if s[i] == pattern[j]:
                        return dfs(i+1, j+1)
                    else:
                        return False
                elif pattern[j] == '.':
                    return dfs(i+1, j+1)
                else:
                    skip = dfs(i, j+1)
                    if skip:
                        return True
                    else:
                        char = pattern[j][0]
                        if char in letters:
                            if char == s[i]:
                                return dfs(i+1, j)
                            else:
                                return False
                        else:
                            return dfs(i+1, j)
            
            elif i == len(s) and j == len(pattern):
                return True
            
            elif i == len(s):
                # Blocks remain.
                if pattern[j] in letters or pattern[j] == '.':
                    return False  # You can't add anything more.
                else:
                    # This is a *, but you can only skip.
                    # Because you can't add anything more.
                    return dfs(i, j+1)
            else:
                # j == len(pattern)
                # You are out of patterns.
                if pattern[-1] in letters or pattern[-1] == '.':
                    return False  # You can't add anything.
                else:
                    char = pattern[-1][0]
                    if char in letters:
                        if char == s[i]:
                            return dfs(i+1, j)
                        else:
                            return False
                    else:
                        return dfs(i+1, j)
                    
        return dfs(0, 0)

In [270]:
o = RegexMatch()
assert o.solve('a', 'ab*') == True
assert o.solve('a', '.*') == True
assert o.solve('ab', '.*') == True
assert o.solve('ab', '.*.') == True
assert o.solve('ab', '.*..') == True
assert o.solve('ab', 'ab.') == False
assert o.solve('mississipi', 'mis*is*p.*') == False

pattern: ['a', 'b*']
pattern: ['.*']
pattern: ['.*']
pattern: ['.*', '.']
pattern: ['.*', '.', '.']
pattern: ['a', 'b', '.']
pattern: ['m', 'i', 's*', 'i', 's*', 'p', '.*']


## 5. Merge two sorted linked lists by splicing.

1 => 2 => 3

1 => 3 => 4

Return 1 => 1 => 2 => 3 => 3 => 4

In [280]:
class ListNode:
    def __init__(self, val=0, next=None):
        self.val = val
        self.next = next


class MergeTwoSorted:
    def solve(self, l1, l2):
        """
        i, j
        
        i: 1 => 2 => 3
        j: 1 => 3
        """
        if l1 is None and l2 is None:
            return []
        
        if l1 is None:
            return l2
        
        if l2 is None:
            return l1
        
        if l2.val < l1.val:
            l1, l2 = l2, l1
            
        # Now, l1 is the smallest. Eventually, return l1
        i = l1
        j = l2
        
        prev = None
        while i is not None and j is not None:
            if i.val < j.val:
                prev = i
                i = i.next
            else:
                # Add node from L2 before i.
                if prev is None:
                    # j becomes the 1st node.
                    l1 = j
                    
                    prev = j
                    after = j.next
                    j.next = i
                    j = after
                else:
                    prev.next = j
                    after = j.next
                    j.next = i
                    prev = prev.next
                    j = after
        
        if i is None:
            if j is not None:
                prev.next = j
        
        return l1

In [282]:
o = MergeTwoSorted()
one = ListNode(1)
two = ListNode(2)
three = ListNode(3)

_one = ListNode(2)
_three = ListNode(3)
# _four = ListNode(4)

l1 = one
one.next = two
two.next = three

l2 = _one
_one.next = _three
# _three.next = _four

ans = o.solve(l1, l2)
while ans is not None:
    print(ans, ans.val)
    ans = ans.next

<__main__.ListNode object at 0x7fcb6c2d1e90> 1
<__main__.ListNode object at 0x7fcb6c2d1f90> 2
<__main__.ListNode object at 0x7fcb6c2d1910> 2
<__main__.ListNode object at 0x7fcb6c2d1d90> 3
<__main__.ListNode object at 0x7fcb6c2d1250> 3


## 6. Find the minimum substring that contains all characters in `t`

Consider string `s = 'abdobecodebanc'` and target string `t = 'abc'`.

You need to find the smallest substring in `s` which contains all characters of `t`.

Each testcase will have a unique answer.

If no such substring exists, return an empty string `''`.

In [32]:
from collections import Counter, defaultdict

class MinimumSubstringContains:
    def solve(self, s, t):
        if len(s) == 0 or len(t) == 0:
            return ''
        
        include = Counter(t)
        relevant = []
        for i, c in enumerate(s):
            if c in include:
                relevant.append((i, c))
                
        # Now, you have a sorted list of relevant characters,
        # with their positions.
        # Use two pointers.
        
        i = 0
        ans = None
        best = len(s) + 1
        temp = defaultdict(int)
        j = 0
        
        while j < len(relevant):
            if len(temp) == len(include) and all([include[k]<=temp[k] for k in temp.keys()]):
                while i < j and len(temp) == len(include) and all([include[k]<=temp[k] for k in temp.keys()]):
                    size = relevant[j-1][0] - relevant[i][0] + 1
                    if size < best:
                        best = size
                        ans = (relevant[i][0], relevant[j-1][0])
                    
                    _, char = relevant[i]
                    temp[char] -= 1
                    if temp[char] == 0:
                        del temp[char]
                    i += 1
                    
            else:
                _, char = relevant[j]
                temp[char] += 1
                j += 1
                

        if len(temp) == len(include) and all([include[k]<=temp[k] for k in temp.keys()]):
                while i < j and len(temp) == len(include) and all([include[k]<=temp[k] for k in temp.keys()]):
                    size = relevant[j-1][0] - relevant[i][0] + 1
                    if size < best:
                        best = size
                        ans = (relevant[i][0], relevant[j-1][0])
                    
                    _, char = relevant[i]
                    temp[char] -= 1
                    if temp[char] == 0:
                        del temp[char]
                    i += 1
        
        if best == len(s)+1:
            return ''
        
        return s[ans[0]:ans[1]+1]

In [36]:
o = MinimumSubstringContains()
o.solve('AA', 'ABC')

''

## 7. CountAndSay

Given a digit string `332211`, you will:
* group by the contiguous digits (33, 22, 11)
* count the number of each digit => (Two 3, Two 2, Two 1)
* and return this as a string => `232221`

You can to return `CountAndSay(n)` for some `n`, recursively, starting from `n=1`:
* `CountAndSay(1) = '1'`
* `CountAndSay(2) = One 1s => 11`
* `CountAndSay(3) = Two 1s => 21`
* `CountAndSay(4) = One 2s and One 1s => 1211`

In [41]:
class CountAndSay:
    def solve(self, n):
        if n == 1:
            return '1'
        
        def count(string):
            sequence = ''
            i = 0
            prev = None
            for char in string:
                if prev is not None:
                    if prev == char:
                        i += 1
                    else:
                        sequence += str(i)+prev
                        i = 1
                        
                else:
                    i = 1
                    
                prev = char
                
            sequence += str(i)+prev
            return sequence
        
        
        s = '1'
        for i in range(2, n+1):
            s = count(s)
            
        return s

In [43]:
o = CountAndSay()
o.solve(5)

1 1
2 11
3 21
4 1211
5 111221


'111221'

## 8. Check if a string can be broken into words using a dictionary.

*Every sub-word should be present in the dictionary!*

* dict: [apple, pen, app]
* string: 'applepenapple'
* output: True

**Time Complexity**:
* Any string of length $n$ will have $O(n^2)$ sub-strings.
* And to "select" these substrings, you will spend a minimum of $O(n^3)$
* With memoization, you have ensured that you will only do this once, for each position.

Thus, Time Complexity will be $O(n^3)$.

Space Complexity will be $O(n)$ -- depth of recursion, and the size of holding the string.

In [57]:
from functools import lru_cache

class SubWordDict:
    def solve(self, dictionary, string):
        dictionary = set(dictionary)
        
        @lru_cache(maxsize=None)
        def check(start):
            if start == len(string):
                return True  # Reached the end. Valid.
            
            for i in range(start, len(string)):
                prefix = string[start : i+1]
                if prefix in dictionary:
                    if check(i+1):
                        return True
                
            return False
        
        return check(0)

In [55]:
o = SubWordDict()

In [56]:
%timeit o.solve(['apple', 'app', 'pen', 'lepe'], 'applepenappleapplepenappleapplepenappleapplepenapple')

93.2 µs ± 1.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [58]:
%timeit o.solve(['apple', 'app', 'pen', 'lepe'], 'applepenappleapplepenappleapplepenappleapplepenapple')

92.3 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


## 9. $(6^*)$ Find the minimum substring that contains all characters in `t`; revisited.

Given string `s: 'ababaacdba'` and a target sring `t: 'abc'`, find the smallest substring in `s` that contains all the characters (including repeatitions) of `t`.

*Test cases will ensure that there is only one unique answer.*

In [72]:
from collections import Counter
from collections import defaultdict


class MinimumSubstring:
    def solve(self, s, t):
        """
        Find the distribution of `t`
        target = Counter(t)
        
        Two pointer.
        j will keep moving forward, adding new chars to temp.
        i will chase j, dropping characters.
        
        Each char will be visited twice.
        """
        if len(t) == 0:
            return ''
        
        target = Counter(t)
        temp = defaultdict(int)
        
        complete = 0
        
        size = len(s)+1
        start, end = None, None
        
        i = 0
        
        for j in range(len(s)):
            if s[j] in target:
                temp[s[j]] += 1
                
                if temp[s[j]] == target[s[j]]:
                    complete += 1
                    
            while i <= j and complete == len(target):
                if j-i+1 < size:
                    size = j-i+1
                    start, end = i, j
                    
                if s[i] in temp:
                    temp[s[i]] -= 1
                    if temp[s[i]] < target[s[i]]:
                        complete -= 1
                        
                i += 1
        
        if size < len(s)+1:
            return s[start:end+1]
        
        return None

In [77]:
o = MinimumSubstring()
o.solve('ADBDDABCNAB', 'CAB')

'ABC'

## 10. Deepcopy a linked-list with random-pointers to nodes. Do not use any additional space.

In [91]:
class Node:
    def __init__(self, val, next=None, random=None):
        self.val = val
        self.next = next
        self.random = random
        
class DeepcopyLL:
    def solve(self, og):
        if og is None:
            return None
        
        start = og
        while start is not None:
            n = Node(start.val)
            after = start.next
            
            start.next = n
            n.next = after
            start = after
            
        # Now: A => A' => B => B' => C => C' => null
        old = og  # A
        ans = og.next
        
        # Assign the random pointers to the copy.
        while old is not None:
            new = old.next  # A'
            if old.random is not None:
                new.random = old.random.next
                
            old = new.next
                
        # Fix the `next` pointers for both.
        old = og
        while old is not None:
            new = old.next
            old.next = None if new is None else new.next
            old = old.next
            new.next = None if old is None else old.next
            new = new.next
            
        return ans

In [92]:
zero = Node(0)
one = Node(1)
two = Node(2)
three = Node(3)

zero.next = one
one.next = two
two.next = three

zero.random = None
one.random = zero
two.random = three
three.random = one

start = zero
while start is not None:
    print(start.val)
    print(" ", None if start.random is None else start.random.val)
    start = start.next
    
print("="*50)

o = DeepcopyLL()
ans = o.solve(zero)
while ans is not None:
    print(ans.val)
    print(" ", None if ans.random is None else ans.random.val)
    ans = ans.next

0
  None
1
  0
2
  3
3
  1
0
  None
1
  0
2
  3
3
  1
