## Subarray average of size k

Problem Statement: Given an array, find the average of all contiguous subarrays of size ‘K’ in it.

Brute force is as follows:

In [2]:
def subarray_average(a, k):
    n = len(a)
    ret = []
    
    for i in range(n-k+1):
        ret.append( sum(a[i:i+k]) / k)
    
    return ret

In [3]:
a = [1, 3, 2, 6, -1, 4, 1, 8, 2]
k = 5

In [4]:
subarray_average(a, k)

[2.2, 2.8, 2.4, 3.6, 2.8]

**Time: O(n*k)**

**Space: O(1) (Not including required space for return array)**

With a sliding window, we can do better

In [5]:
def subarray_average_sw(a, k):
    running_sum = sum(a[0:k])
    ret = []
    i, j = 0, k
    
    while j < len(a):
        ret.append(running_sum/k)
        
        running_sum -= a[i]
        running_sum += a[j]
        i += 1
        j += 1
        
    ret.append(running_sum/k)
        
    return ret

In [6]:
subarray_average_sw(a, k)

[2.2, 2.8, 2.4, 3.6, 2.8]

**Time: O(n)**

**Space: O(1)** (Not including required space for return array)

In [7]:
import random
inputs_list = [[random.randint(0, 50) for _ in range(random.randint(1, 20))] for _ in range(50)]

In [8]:
k_list = [random.randint(1, len(a)) for a in inputs_list]

In [9]:
all(subarray_average(a, k)==subarray_average_sw(a, k) for a, k in zip(inputs_list, k_list))

True

## Maximum Sum Subarray of Size K

Problem Statement: Given an array of positive numbers and a positive number ‘k’, find the maximum sum of any contiguous subarray of size ‘k’.

Pretty easy and natural sliding window approach:

In [10]:
import math

def subarray_sum(a, k):
    running_sum = sum(a[0:k])
    max_sum = -math.inf
    i, j = 0, k
    
    while j < len(a):
        max_sum = max(max_sum, running_sum)
        
        running_sum -= a[i]
        running_sum += a[j]
        
        i += 1
        j += 1
        
    max_sum = max(max_sum, running_sum)
    
    return max_sum 

In [60]:
a=[2, 1, 5, 1, 3, 2, 24]
k=3 

In [61]:
subarray_sum(a, k)

29

**Time: O(n)**
    
**Space: O(1)**

## Smallest Subarray with a given sum

Problem Statement: Given an array of positive numbers and a positive number ‘S’, find the length of the smallest contiguous subarray whose sum is greater than or equal to ‘S’. Return 0, if no such subarray exists.

Simple variable length sliding window with ugly edge case handling.

In [28]:
def smallest_subarray(s, arr):
    import math
    
    l, r = 0, 0
    running_sum = arr[0]
    min_len = math.inf
    
    while (r < len(arr) and l <= r):
        if running_sum < s:
            r += 1
            try:
                running_sum += arr[r]
            except:
                pass
        else:
            min_len = min(min_len, r-l+1)
            running_sum -= arr[l]
            l += 1
    return min_len

In [29]:
smallest_subarray(4, [1, 2, 3, 10])

1

**Time: O(n)** - the i, j variable encounter each element at most twice giving O(2n)=O(n) time complexity

**Space: O(1)**

Cleaner solution with no need for explicit edge case handling 

In [55]:
## Cleaner solution
import math

def smallest_subarray_clean(s, arr):
    running_sum, l = 0, 0
    min_len = math.inf
    
    for r in range(0, len(arr)):
        running_sum += arr[r]
        
        while running_sum >= s:
            min_len = min(min_len, r-l+1)
            running_sum -= arr[l]
            l += 1
    
    return min_len if min_len != math.inf else 0

In [56]:
smallest_subarray_clean(4, [1, 2, 3, 10])

1

## Longest Substring with K Distinct Characters

**Problem Statement**: Given a string, find the length of the longest substring in it with no more than K distinct characters.

https://leetcode.com/problems/longest-substring-with-at-most-k-distinct-characters/  

A brute force enumeration can be used here. This would enumerate all $\sum_{j=1}^{n} {n \choose j}$ substrings. By the binomial theorem, this sum is $2^{n}$ and so our time complexity would be exponential. Using a sliding window, we can do better.

In [69]:
from collections import Counter

def longest_substring_with_k_distinct(s, k):
    from collections import Counter
    l = 0
    running_len = 0
    max_len = 0
    seen = Counter()

    for r in range(len(s)):
        seen[s[r]] += 1
        running_len += 1

        while len(seen.keys()) > k: # Loop invariant is violated - fix it
            seen[s[l]] -= 1
            if seen[s[l]] == 0: del seen[s[l]]
            l += 1
            running_len -= 1

        max_len = max(max_len, running_len)

    return max_len

This algorithm is correct because the loop invariant ensures the window under consideration is a valid one (i.e. has k or fewer distinct elements). We consider each such valid window and keep track of the maximum one. If we violate the variant, the while loop executes and restores the invariant by decreasing the window size from the left and only then compares the substring length to the previous maximum substring length. Its clear that the largest len substring will be found because the window is "greedy" in the sense that it will keep increasing the substring until a violation of the invariant occurs. Hence the longest substring will be found.

**Time**: Each element is encoutnered at most twice hence **O(n)** time. The len function might seem like it might be O(n) but actually len for most collection data structures in python is O(1) time.

**Space**: We keep a Counter dictionary to keep track of counts hence **O(n)** space