add another solution and edit comments on a couple of previous solutions

johnmee · Jan 29, 2016 · d4dda25 · d4dda25
1 parent bd38627
commit d4dda25
Show file tree

Hide file tree

Showing 4 changed files with 236 additions and 24 deletions.
diff --git a/README.md b/README.md
@@ -16,21 +16,19 @@ So far, their test cases follow a predictable methodology :
 * edge cases - test cases written to root out those awkward -off-by-one- scenarios that inevitably suck up 80% of the time required to devise a solution
 * a simple, or 'small' test case or two - just some basic, as you might reasonably anticipate, examples
 
-### Performance test
+### Performance tests
 
 * worst case scenario is tested - the biggest possible numbers in the biggest resultsets - with the intent to test the speed and space restraints
 * not always, as the problem dictates, some medium sized test cases eg: ~100 - ~5000 length arrays
 * always some 'extreme' test cases typically involving generating maximal random datasets
 
 ## Other notes
 
-* the python `in` operator is a list loop and could contribute an O(N) all on it's own
-    * `foo in bar` is fine
-    * `foo in bar.keys()` impacts your time complexity
-* you're pretty safe to assume they won't test, mark you down for, failing to guard against the explicit assumptions described. So if it says N is 0..1000, they won't feed in an N=1001 just to see if you protected against it.
+* you're safe to assume they won't test, mark you down for, failing to guard against the explicit assumptions described. So if it says N is 0..1000, they won't feed in an N=1001 just to see if you protected against it.
 * the "Open reading material", currently at the top of each lesson, is worth reading before attempting the exercises as they are short and focus exactly on what you'll need to solve the following puzzles
 * during the actual interview testing/exam, the report sent to the candidate is much more sparesly detailed than the one sent to the company?!
 * if you use the browser to actually build your solution - every edit and run is recorded and presented to the client
+* if you are given multiple tasks, you are permitted to read them, and commence them, and submit them in any order.
 * if there seems to be a lack of specificity in every puzzle around what is the correct response to error conditions; look, read, look again, as
    after seeing the solution that apparent lack always seems like a debateably reasonable assumption implied by the specs.
    For example:
@@ -40,3 +38,6 @@ So far, their test cases follow a predictable methodology :
    the largest value—plus one.  Again, seems perfectly reasonable in hindsight, but a source of uncertainty in the moment
 * before submitting your solution, there is no feedback regarding it's efficiency; but it does affect your score and report
 * if it tells you the solution is O(1) you know to go looking for a formulaic solution; O(n) has no loops inside loops, etc.
+* the python `in` operator is a list loop and could contribute an O(N) all on it's own
+    * `foo in bar` is fine
+    * `foo in bar.keys()` impacts your time complexity
diff --git a/ex-5-2-PassingCars.py b/ex-5-2-PassingCars.py
@@ -3,15 +3,16 @@
 Count the number of passing cars on the road.
 
 
-********
-I found this problem discription very difficult to appreciate:
-NB: The cars travelling east only pass the cars travelling west which appear after it in the series!!
+
+***
+Note:
+
+    The cars travelling east only pass cars travelling west if they appear after it in the series.
 
 The only indication of this feature is the example.  And even when I started to suspect it, there is
 no second example with which to establish whether this observation is a feature or a coincidence.
 
-* you can avoid reversing the list (strict prefix sum style) by inverting the problem to a count of cars
-travelling east that pass cars travelling west.
+* you can invert the problem to a count of cars travelling east that pass cars travelling west no problem.
 ********
 
 A non-empty zero-indexed array A consisting of N integers is given. The consecutive elements of array A represent consecutive cars on a road.

diff --git a/ex-5-3-GenomicRangeQuery.py b/ex-5-3-GenomicRangeQuery.py
@@ -5,25 +5,23 @@
 
 https://codility.com/programmers/task/genomic_range_query/
 
-This is a fairly straightforward problem: you can go through every query and pull out the sliced sequence, then
-  inspect the sequence for the minimal-factor.  Taking advantage of the fact the minimal factor values correspond to
-  alphabetical order, sorting the slice into alphabetical order puts the minimum factor magically at the front
-  of the string. Map it to the value and you're done. Yes?  See 'slow_solution' below.
+If you're not fussy, this is a straightforward problem: you go through every query and pull out the sliced sequence and
+  inspect it.  A quick sort to identify the least impact nucleo and you're done. Right?  See 'slow_solution' below.
 
 No. Not done.
 
 The straightforward solution visits every nucleotide in every slice. If the slices overlap a lot, then
-  the solution revisits the same nucleotide multiple times.  If you could arrange the solution to
-  only visit each nucleotide once, then it would be substantially faster.
+  the solution revisits the same nucleotide a lot.  If you could arrange the solution to
+  only visit each nucleotide once, then it would be much faster.
 
 But how?
 
-We need to pass over the sequence and produce an intermediatory data structure which compiles the data we
-need so we can access it directly; that is, without stepping through any part of the sequence again.
+We need to pass over the sequence and produce an intermediatory data structure which aggregates the data we
+need into a directly accessible form; namely, without stepping through any part of the sequence again.
 
 Enter the "prefix sum" pattern.
 
-In this problem, we create an array for each type of item then, at each step through the sequence, record
+For this problem, we create an array for each type of item, then at each step through the sequence, record
  the count of how many of each type we've seen.  When we're done, for the example sequence
  "CAGCCTA", we finish up with:
 
@@ -32,12 +30,17 @@
     sumG = [0,0,0,1,1,1,1,1]
     sumT = [0,0,0,0,0,0,1,1]
 
-Now when we can ask "Are there any 'C' types between points 2 and 4?" we can lookup sumC for the answer:
-  At point 2 we had seen "1" type C nucleo, and at point 4 we'd seen 2. Thus, we can establish that
-  there is exactly 1 (2-1).
+Now it's plain as day where each nucleo of each type appears.  So, when we can ask "Are there any
+ 'C' types between points 2 and 4?" we can lookup sumC for the answer:
+  At point 2 we had seen 1 type C nucleo, and at point 4 we'd seen 2.  We determine
+  there is exactly 1 (2-1) type C nucleo between points 2 and 4 only by looking at the two end-points,
+  saving us from having to inspect every point between them.
 
-So with this approach we can go through all the queries plugging in the start and end indexes and comparing
-the results.  This will tell us which nucleotides appear in each query and thence determine the 'minimum impact'.
+So now we don't need the original sequence. Instead we can look at the sum for each endpoint
+ and comparing the values.  Thus we can quickly identify which nucleotides appear in each query
+ and determine the 'minimum impact' value within each.
+
+See 'fast_solution'.
 """
 import unittest
 import random
@@ -88,8 +91,10 @@ def fast_solution(S, P, Q):
     # Eg: The sum for "C" in "CAGCCTA" are [0,1,1,1,2,3,3,3]
     sumA = [0]; sumC = [0]; sumG = [0]; sumT = [0]
     for nuke in S:
+        # copy the counts in the last cell into this one
         for sum in (sumA, sumC, sumG, sumT):
             sum.append(sum[-1])
+        # increment the sum corresponding to the current nuke
         if nuke == 'A':
             sumA[-1] += 1
         elif nuke == 'C':

diff --git a/ex-5-4-MinAvgTwoSlice.py b/ex-5-4-MinAvgTwoSlice.py
@@ -0,0 +1,205 @@
+"""
+MinAvgTwoSlice
+
+Find the minimal average of any slice containing at least two elements.
+
+https://codility.com/programmers/task/min_avg_two_slice/
+
+
+----------------
+# My Analysis
+
+This problem doesn't even need prefix-sums, so that was a big red-herring. Well mostly.
+ The 'trick' is not in the coding at all, but in a realisation about the nature of the problem...
+
+You're looking for the smallest average of a series of numbers.  At first it looks like you
+ have to permutate over an ever increasing collection of averages of various lengths.
+ But, at some point, you'll realize that a small number will always pull the average down,
+ no matter what numbers are around it.
+
+Thus, given that it, takes a minimum of two numbers to make an average, you're really only looking for that pair
+ of numbers which combine to provide the smallest total.
+ To verify this pressumption, consider the slope that the curve of graphing the moving average
+ of the pair would make. Irrespective of the size of the average, the gradient will always tilt
+ down, however slightly, when you come across a small number.
+
+Coding a two-point average is dead simple: just walk through the sequence from left to right adding each pair
+ together and tracking the position of the smallest pair.
+
+I couldn't believe it would be this easy, but couldn't resist, and gave it a run... 50%.
+
+After a run the report tells you which tests it failed and I couldn't help but notice it failed
+ for 3 point moving averages.  That was perplexing because I couldn't see how to create a three
+ point average that would change the results over a two point average!  So I googled it.
+
+The explanation is sequences with an odd number of integers... namely 3.
+ For example: [-8, -6, -10]
+ In this sequence the two-point averages are -7 and -8, so the answer would be index point 1 (the -6).
+ But note that the three poit average is also -8, and commences one point earlier, on index point 0.
+ So the correct answer is 0.
+
+And we're back questioning whether this scenario will play out for sequences of length 4, 5, 6 and beyond.
+ Will it?
+
+Well, consider a sequence of 4 values. [1,2,2,1]. It has three two-point averages. [1,2],[2,2],[2,1].
+ Which evaluate to [1.5, 2, 1.5].  The four-point average is 1.5.
+ How about [1,-1, 1,-1]?  The two-points averages are [0,0,0] and the four-point average is[0].
+ You can play this out all day, only to find that a four-point average can never be less than one
+ of the two-point averages within it.
+
+Ok, then why do we need the three point average?
+ Consider [1, -1, 1, -1]? The two-point averages all come to 0.  And we've already established that
+ a four-point average (of 0) can never best the two-point averages.
+ But the three-point averages are 0.33 and -0.33.
+ So the correct answer is index point 1.
+
+And a 5-point sequence?
+ Ok, [-1, 1, -1, 1, -1].
+ Two points = [0, 0, 0, 0] (best answer is 0)
+ Three points = [-0.33, 0.33, -0.33] (better answer is 1)
+ Four points = [0, 0] (just like the two points)
+ Five points = [-0.33] (same as three points answer).
+
+By this point my understanding is that the two and three-point averages act like a factorial of all the
+longer length averages.  They may be able to match one or other, but will never beat them.
+
+Thus we can confidently write some trivial code to do a single pass solution which considers just
+ the two and three-point averages.
+
+-------------------
+# Problem Description
+
+A non-empty zero-indexed array A consisting of N integers is given. A pair of integers (P, Q),
+such that 0 <= P < Q < N, is called a slice of array A (notice that the slice contains at least
+two elements). The average of a slice (P, Q) is the sum of A[P] + A[P + 1] + ... + A[Q] divided
+by the length of the slice. To be precise, the average equals (A[P] + A[P + 1] + ... + A[Q]) / (Q - P + 1).
+
+For example, array A such that:
+
+    A[0] = 4
+    A[1] = 2
+    A[2] = 2
+    A[3] = 5
+    A[4] = 1
+    A[5] = 5
+    A[6] = 8
+
+contains the following example slices:
+
+        slice (1, 2), whose average is (2 + 2) / 2 = 2;
+        slice (3, 4), whose average is (5 + 1) / 2 = 3;
+        slice (1, 4), whose average is (2 + 2 + 5 + 1) / 4 = 2.5.
+
+The goal is to find the starting position of a slice whose average is minimal.
+
+Write a function:
+
+    def solution(A)
+
+that, given a non-empty zero-indexed array A consisting of N integers, returns the starting
+position of the slice with the minimal average. If there is more than one slice with a minimal
+average, you should return the smallest starting position of such a slice.
+
+For example, given array A such that:
+
+    A[0] = 4
+    A[1] = 2
+    A[2] = 2
+    A[3] = 5
+    A[4] = 1
+    A[5] = 5
+    A[6] = 8
+
+the function should return 1, as explained above.
+
+Assume that:
+
+        N is an integer within the range [2..100,000];
+        each element of array A is an integer within the range [-10,000..10,000].
+
+Complexity:
+
+        expected worst-case time complexity is O(N);
+        expected worst-case space complexity is O(N), beyond input storage (not counting the
+          storage required for input arguments).
+
+Elements of input arrays can be modified.
+"""
+
+
+import unittest
+import random
+
+
+RANGE_A = (2, 100000)
+RANGE_N = (-10000, 10000)
+
+
+def solution(A):
+    """
+    :param A: array of integers
+    :return: an integer
+    """
+    # the lowest average we've ever seen
+    lowest_avg = RANGE_N[1]
+    # the starting point of the lowest average seen
+    lowest_idx = 0
+    # the value we saw two iterations ago
+    second_last = None
+    # the value we saw last iteration
+    last = None
+
+    # for every number in the sequence
+    for idx, this in enumerate(A):
+
+        # if we have seen three numbers calculate the three-point average
+        # and, if necessary, keep it.
+        if second_last is not None:
+            three_avg = (second_last + last + this) / 3.0
+            if three_avg < lowest_avg:
+                lowest_avg = three_avg
+                lowest_idx = idx - 2
+
+        # if we have seen two numbers calculate the two-point average
+        # and, if necessary, keep it.
+        if last is not None:
+            two_avg = (last + this) / 2.0
+            if two_avg < lowest_avg:
+                lowest_avg = two_avg
+                lowest_idx = idx - 1
+
+        # print idx, second_last, last, this, '\t\t', two_avg, three_avg, '\t\t', lowest_avg, lowest_idx
+
+        second_last = last
+        last = this
+
+    return lowest_idx
+
+
+class TestExercise(unittest.TestCase):
+    def test_example(self):
+        self.assertEqual(solution([4, 2, 2, 5, 1, 5, 8]), 1)
+        self.assertEqual(solution([5, 2, 2, 100, 1, 1, 100]), 4)
+        self.assertEqual(solution([11, 2, 10, 1, 100, 2, 9, 2, 100]), 1)
+
+    def test_three(self):
+        # self.assertEqual(solution([-3, -5, -8, -4, -10]), 2)
+        # self.assertEqual(solution([-8, -6, -10]), 0)
+        self.assertEqual(solution([1, -1, 1, -1]), 1)
+
+    def test_random(self):
+        A = [random.randint(*RANGE_N) for _ in xrange(2, 10)]
+        print A
+        print solution(A)
+
+    def test_large_ones(self):
+        """Numbers from -1 to 1, N = ~100000"""
+        # how to test?
+
+    def test_extreme(self):
+        A = [RANGE_N[1]] * (RANGE_A[1] / 3) + [RANGE_N[0]] * (RANGE_A[1] / 3)
+        idx = solution(A)
+        print idx, A[idx-3:idx+3]
+
+if __name__ == '__main__':
+    unittest.main()