Skip to content
Liu Xinyu edited this page Mar 19, 2017 · 20 revisions

Min free number puzzle

In the book, we only showed the divide and conqueror method to achieve the linear time O(n), and constant space O(1) solution. There are other ways to achieve such performance. Here are another two.

Before we diving into the detail, let's review the important fact we discussed in the book.

1 <= answer <= n + 1

Where n is the length of the sequence, and the answer is n + 1 when the original sequence is some permutation of 1 to n.

Pigeonhole sort

Any comparison based sort will downgrade the time performance to O(n lg n). So all methods, like heap, BST and so on are out. The idea is that we can put the i-th number x to the position where it should be, the x-th number. When x is greater than n, we know that it cann't be the answer, we can skip it. Otherwise, as the result, we exchange the number at the i-th and the x-th positions. Since the number we swapped to i-th position may not be i, we need repeatedly do this until either it exceeds n, or it becomes i. We iterate i from 1 to n, process all numbers. As each number will only swapped one time, this processing is linear (can you prove it formally?).

Next we scan the array 2nd round, if at any position i, the number isn't i, we find the answer. Otherwise, it means the array is 'full', and we return n+1 as the min free number.

Here is the C-style, C++ code.

int findMinFree1(vector<int> nums) {
    int i, n = nums.size();
    for (i = 0; i < n; ++i)
        while (nums[i] <= n && nums[i] != i + 1)
            swap(nums[i], nums[nums[i] - 1]);
    for (i = 0; i < n; ++i)
        if (i + 1 != nums[i])
            return i + 1;
    return n + 1;
}

Sng encoding

No matter using a flag array, or put the number to its 'right' position, we want to know if a number does exist. Exit or not is a kind of binary information. we can encode it as the sign of the number. The idea is that, if number x exists in the sequence, we mark the x-th number as negative. After we did this marking for all numbers if they are not greater than n, we can scan the sequence from left, find the first positive number. And that position is the answer.

Here is the C-style, C++ code

int findMinFree2(vector<int> nums) {
    int i, k, n = nums.size();
    for (i = 0; i < n; ++i)
        if ((k = abs(nums[i])) <= n)
            nums[k - 1] = - abs(nums[k - 1]);
    for (i = 0; i < n; ++i)
        if (nums[i] > 0)
            return i + 1;
    return n + 1;
}

Missing and Duplicated puzzle

Given numbers from 1 to n, after some processing, there are some changes. 1) The order is shuffled; 2) One number x is mutated to y, here both x, y are from 1 to n. Develop a method to find the x and y in linear time with constant space.

Examples [3, 1, 3, 5, 4] ==> x = 2, y = 3

Divide and Conquer

We can divide the list with the middle number m = floor((1 + n) / 2). Move all the numbers <= m to the left; and the rest to the right.

If the length of left < m, we immediately know that the missing number is on the left. denote s = 1 + 2 + ... + m = m(m + 1)/2, then x = s - sum(left). At the same time, we know that, the duplicated numbers are on the right. Denote s' = (m + 1) + (m + 2) + ... + n = (n + m + 1)(n - m)/2, then y = sum(right) - s';

If the length of left > m, we immediately know that the duplicated number is on the left. Use the similar method, we have the missing number x = s' - sum(right), and the duplicated number y = sum(left) - s

Otherwise, it means the length of left is equal to m. We know that there are m numbers not greater than m, but we don't know if they are some permutation of 1, 2, ..., m. We can further compare between sum(left) and s. if they are equal, we know the left side is OK, we can drop the whole left side, and recursively find x, y on the right; Otherwise, we drop the right and recursively find the answer on the left.

During the recursion, we need change 1 in the above discussion to the lower bound of the list. Because each time we halve the list, so the total time, according to the Master Theory, is O(n). Alternatively, we can deduce the time as O(n + n/2 + n/4 + ...) = O(2n) = O(n).

Below are the example code in Haskell and Python.

Haskell divide and conquer example code:

missDup xs = solve xs 1 (toInteger $ length xs) where
  solve xs@(_:_:_) l u | k < m - l + 1 = (sl - sl', sr' - sr)
                       | k > m - l + 1 = (sr - sr', sl' - sl)
                       | sl == sl' = solve bs (m + 1) u
                       | otherwise = solve as l m
      where
          m = (l + u) `div` 2
          (as, bs) = partition (<=m) xs
          k = toInteger $ length as
          sl = (l + m) * (m - l + 1) `div` 2
          sr = (m + 1 + u) * (u - m) `div` 2
          (sl', sr') = (sum as, sum bs)

For Python, if we don't want to generate new lists during partition, we can developed a in-place partition method. It's similar to the N. Lomuto's method in QuickSort.

def partition(xs, l, u, x):
    left = l
    for right in range(l, u):
        if xs[right] < x:
            (xs[left], xs[right]) = (xs[right], xs[left])
            left = left + 1
    return left

def solve_inplace(xs):
    (l, u) = (0, len(xs))
    while l<u:
        m = (l+u)/2
        m1 = partition(xs, l, u, m)
        (nl, nr) = (m1 - l, u - m1);
        (sl, sr) = (sum(xs[l:m1]), sum(xs[m1:u]))
        sl1 = (l + m - 1)*(m - l)/2
        sr1 = (m + u - 1)*(u - m)/2
        if m1 < m:
            return (sl1 - sl, sr - sr1)
        elif m1 > m:
            return (sr1 - sr, sl - sl1)
        else:
            if sl == sl1:
                l = m1
            else:
                u = m1

Here is the Scala example: https://github.com/liuxinyu95/AlgoXY/blob/algoxy/others/problems/miss-dup/MissDup.scala

Pigeonhole Sort

Since all the numbers range from 1 to n, we can use pigeonhole sort to re-arrange them. We scan the number from left to the right, for each position i, if the number x doesn't equal to i, we exchange it with the number at position x. Denote the number at position x as y, if x and y are same, we found the duplicated number, and the missed one is i. We repeat this until either x equals to i or find the duplicated one. Because every number is swapped one time to its right position, the time is bound to O(n).

Here is the example Python program:

def solve3(xs):
    (miss, dup) = (-1, -1)
    for i in range(len(xs)):
        while xs[i] != i:
            j = xs[i]
            if xs[j] == xs[i]:
                dup = xs[j]
                miss = i
                break
            else:
                j = xs[i]
                (xs[i], xs[j]) = (xs[j], xs[i])
    return (miss, dup)

Sgn encoding

Suppose we have a flag array of length n, for each number x, we mark the x-th element in the flag array. When we meet the duplicated number, we'll find that flag has been marked before. Denote the duplicated number as d, We know the result of s = 1 + 2 + ... + n = n * (n + 1) / 2; and sum of all numbers in the list as s'. We can calculate the missing number as m = d + s - s'. However, this solution need an extra space of length n. since the existence of a number is a binary information (yes or not), we can encode it as the sign of the number and reuse the given list as the flag array. For each number x, we make the |x|-th element in the list as negative, where |x| is the absolute value of x. When we meet the element is already negative, we find the duplicated one, and we can next calculate the missing number out.

Below is the example Python program based on this idea:

def solve4(xs):
    miss, dup = -1, -1
    n = len(xs)
    s = sum(xs)
    for i in range(n):
        j = abs(xs[i]) - 1
        if xs[j] < 0:
            dup = j
            miss = dup + n * (n + 1) / 2 - s
            break
        xs[j] = - abs(xs[j])
    return (miss, dup)

Equations

There is another similar puzzle, but it's quite easy. Given a list of numbers from 1 to n, remove an element, and shuffle the list, develop a method to find the missing element. We can sum all the elements in the list, then substract it from n * (n + 1) / 2 to get the answer. This is can be expressed as the equation

m = s - s'

Where m is the missing number, s is the sum of 1 to n, and s' is the sum of the elements in the list. However, for this problem, there is a duplicated element and a missing one. we can't solve two unknow numbers with only one equation:

sum(x[i] - i) = d - m ----(1)

Where the right side is the sum of all difference between the i-th number and i. Can we build a second equation? The idea is to square every number. If we sum all the difference between the square of i-th number and i^2, we can get the below result:

sum(x[i]^2 - i^2) = d^2 - m^2 = (d + m) * (d - m) ----(2)

Since d - m isn't zero, we can divide (2) by (1), to get another equation:

sum(x[i]^2 - i^2) / sum(x[i] - i) = d + m ----(3)

Compare (1), and (3), we have two equations and two unknown number, it's easy to get the result:

m = (sum(x[i]^2 - i^2) / sum(x[i] - i) - sum(x[i] - i)) / 2

d = (sum(x[i]^2 - i^2) / sum(x[i] - i) + sum(x[i] - i)) / 2

Below is the example code in Python and Haskell

Python:

def solve2(xs):
    ys = zip(xs, range(len(xs)))
    a = sum(map(lambda (a, b): a-b, ys))
    b = sum(map(lambda (a, b): a*a - b*b, ys))
    return ((b/a - a)/2, (a + b/a)/2)

Haskell:

missDup' xs = ((b `div` a - a) `div` 2, (b `div` a + a) `div` 2) where
  ys = zip xs [1..]
  a = sum [x - y | (x, y) <- ys]
  b = sum [x^2 - y^2 | (x, y) <- ys]

End