Skip to content

Commit

Permalink
add another solution and edit comments on a couple of previous solutions
Browse files Browse the repository at this point in the history
  • Loading branch information
johnmee committed Jan 29, 2016
1 parent bd38627 commit d4dda25
Show file tree
Hide file tree
Showing 4 changed files with 236 additions and 24 deletions.
11 changes: 6 additions & 5 deletions README.md
Expand Up @@ -16,21 +16,19 @@ So far, their test cases follow a predictable methodology :
* edge cases - test cases written to root out those awkward -off-by-one- scenarios that inevitably suck up 80% of the time required to devise a solution
* a simple, or 'small' test case or two - just some basic, as you might reasonably anticipate, examples

### Performance test
### Performance tests

* worst case scenario is tested - the biggest possible numbers in the biggest resultsets - with the intent to test the speed and space restraints
* not always, as the problem dictates, some medium sized test cases eg: ~100 - ~5000 length arrays
* always some 'extreme' test cases typically involving generating maximal random datasets

## Other notes

* the python `in` operator is a list loop and could contribute an O(N) all on it's own
* `foo in bar` is fine
* `foo in bar.keys()` impacts your time complexity
* you're pretty safe to assume they won't test, mark you down for, failing to guard against the explicit assumptions described. So if it says N is 0..1000, they won't feed in an N=1001 just to see if you protected against it.
* you're safe to assume they won't test, mark you down for, failing to guard against the explicit assumptions described. So if it says N is 0..1000, they won't feed in an N=1001 just to see if you protected against it.
* the "Open reading material", currently at the top of each lesson, is worth reading before attempting the exercises as they are short and focus exactly on what you'll need to solve the following puzzles
* during the actual interview testing/exam, the report sent to the candidate is much more sparesly detailed than the one sent to the company?!
* if you use the browser to actually build your solution - every edit and run is recorded and presented to the client
* if you are given multiple tasks, you are permitted to read them, and commence them, and submit them in any order.
* if there seems to be a lack of specificity in every puzzle around what is the correct response to error conditions; look, read, look again, as
after seeing the solution that apparent lack always seems like a debateably reasonable assumption implied by the specs.
For example:
Expand All @@ -40,3 +38,6 @@ So far, their test cases follow a predictable methodology :
the largest value—plus one. Again, seems perfectly reasonable in hindsight, but a source of uncertainty in the moment
* before submitting your solution, there is no feedback regarding it's efficiency; but it does affect your score and report
* if it tells you the solution is O(1) you know to go looking for a formulaic solution; O(n) has no loops inside loops, etc.
* the python `in` operator is a list loop and could contribute an O(N) all on it's own
* `foo in bar` is fine
* `foo in bar.keys()` impacts your time complexity
11 changes: 6 additions & 5 deletions ex-5-2-PassingCars.py
Expand Up @@ -3,15 +3,16 @@
Count the number of passing cars on the road.
********
I found this problem discription very difficult to appreciate:
NB: The cars travelling east only pass the cars travelling west which appear after it in the series!!
***
Note:
The cars travelling east only pass cars travelling west if they appear after it in the series.
The only indication of this feature is the example. And even when I started to suspect it, there is
no second example with which to establish whether this observation is a feature or a coincidence.
* you can avoid reversing the list (strict prefix sum style) by inverting the problem to a count of cars
travelling east that pass cars travelling west.
* you can invert the problem to a count of cars travelling east that pass cars travelling west no problem.
********
A non-empty zero-indexed array A consisting of N integers is given. The consecutive elements of array A represent consecutive cars on a road.
Expand Down
33 changes: 19 additions & 14 deletions ex-5-3-GenomicRangeQuery.py
Expand Up @@ -5,25 +5,23 @@
https://codility.com/programmers/task/genomic_range_query/
This is a fairly straightforward problem: you can go through every query and pull out the sliced sequence, then
inspect the sequence for the minimal-factor. Taking advantage of the fact the minimal factor values correspond to
alphabetical order, sorting the slice into alphabetical order puts the minimum factor magically at the front
of the string. Map it to the value and you're done. Yes? See 'slow_solution' below.
If you're not fussy, this is a straightforward problem: you go through every query and pull out the sliced sequence and
inspect it. A quick sort to identify the least impact nucleo and you're done. Right? See 'slow_solution' below.
No. Not done.
The straightforward solution visits every nucleotide in every slice. If the slices overlap a lot, then
the solution revisits the same nucleotide multiple times. If you could arrange the solution to
only visit each nucleotide once, then it would be substantially faster.
the solution revisits the same nucleotide a lot. If you could arrange the solution to
only visit each nucleotide once, then it would be much faster.
But how?
We need to pass over the sequence and produce an intermediatory data structure which compiles the data we
need so we can access it directly; that is, without stepping through any part of the sequence again.
We need to pass over the sequence and produce an intermediatory data structure which aggregates the data we
need into a directly accessible form; namely, without stepping through any part of the sequence again.
Enter the "prefix sum" pattern.
In this problem, we create an array for each type of item then, at each step through the sequence, record
For this problem, we create an array for each type of item, then at each step through the sequence, record
the count of how many of each type we've seen. When we're done, for the example sequence
"CAGCCTA", we finish up with:
Expand All @@ -32,12 +30,17 @@
sumG = [0,0,0,1,1,1,1,1]
sumT = [0,0,0,0,0,0,1,1]
Now when we can ask "Are there any 'C' types between points 2 and 4?" we can lookup sumC for the answer:
At point 2 we had seen "1" type C nucleo, and at point 4 we'd seen 2. Thus, we can establish that
there is exactly 1 (2-1).
Now it's plain as day where each nucleo of each type appears. So, when we can ask "Are there any
'C' types between points 2 and 4?" we can lookup sumC for the answer:
At point 2 we had seen 1 type C nucleo, and at point 4 we'd seen 2. We determine
there is exactly 1 (2-1) type C nucleo between points 2 and 4 only by looking at the two end-points,
saving us from having to inspect every point between them.
So with this approach we can go through all the queries plugging in the start and end indexes and comparing
the results. This will tell us which nucleotides appear in each query and thence determine the 'minimum impact'.
So now we don't need the original sequence. Instead we can look at the sum for each endpoint
and comparing the values. Thus we can quickly identify which nucleotides appear in each query
and determine the 'minimum impact' value within each.
See 'fast_solution'.
"""
import unittest
import random
Expand Down Expand Up @@ -88,8 +91,10 @@ def fast_solution(S, P, Q):
# Eg: The sum for "C" in "CAGCCTA" are [0,1,1,1,2,3,3,3]
sumA = [0]; sumC = [0]; sumG = [0]; sumT = [0]
for nuke in S:
# copy the counts in the last cell into this one
for sum in (sumA, sumC, sumG, sumT):
sum.append(sum[-1])
# increment the sum corresponding to the current nuke
if nuke == 'A':
sumA[-1] += 1
elif nuke == 'C':
Expand Down
205 changes: 205 additions & 0 deletions ex-5-4-MinAvgTwoSlice.py
@@ -0,0 +1,205 @@
"""
MinAvgTwoSlice
Find the minimal average of any slice containing at least two elements.
https://codility.com/programmers/task/min_avg_two_slice/
----------------
# My Analysis
This problem doesn't even need prefix-sums, so that was a big red-herring. Well mostly.
The 'trick' is not in the coding at all, but in a realisation about the nature of the problem...
You're looking for the smallest average of a series of numbers. At first it looks like you
have to permutate over an ever increasing collection of averages of various lengths.
But, at some point, you'll realize that a small number will always pull the average down,
no matter what numbers are around it.
Thus, given that it, takes a minimum of two numbers to make an average, you're really only looking for that pair
of numbers which combine to provide the smallest total.
To verify this pressumption, consider the slope that the curve of graphing the moving average
of the pair would make. Irrespective of the size of the average, the gradient will always tilt
down, however slightly, when you come across a small number.
Coding a two-point average is dead simple: just walk through the sequence from left to right adding each pair
together and tracking the position of the smallest pair.
I couldn't believe it would be this easy, but couldn't resist, and gave it a run... 50%.
After a run the report tells you which tests it failed and I couldn't help but notice it failed
for 3 point moving averages. That was perplexing because I couldn't see how to create a three
point average that would change the results over a two point average! So I googled it.
The explanation is sequences with an odd number of integers... namely 3.
For example: [-8, -6, -10]
In this sequence the two-point averages are -7 and -8, so the answer would be index point 1 (the -6).
But note that the three poit average is also -8, and commences one point earlier, on index point 0.
So the correct answer is 0.
And we're back questioning whether this scenario will play out for sequences of length 4, 5, 6 and beyond.
Will it?
Well, consider a sequence of 4 values. [1,2,2,1]. It has three two-point averages. [1,2],[2,2],[2,1].
Which evaluate to [1.5, 2, 1.5]. The four-point average is 1.5.
How about [1,-1, 1,-1]? The two-points averages are [0,0,0] and the four-point average is[0].
You can play this out all day, only to find that a four-point average can never be less than one
of the two-point averages within it.
Ok, then why do we need the three point average?
Consider [1, -1, 1, -1]? The two-point averages all come to 0. And we've already established that
a four-point average (of 0) can never best the two-point averages.
But the three-point averages are 0.33 and -0.33.
So the correct answer is index point 1.
And a 5-point sequence?
Ok, [-1, 1, -1, 1, -1].
Two points = [0, 0, 0, 0] (best answer is 0)
Three points = [-0.33, 0.33, -0.33] (better answer is 1)
Four points = [0, 0] (just like the two points)
Five points = [-0.33] (same as three points answer).
By this point my understanding is that the two and three-point averages act like a factorial of all the
longer length averages. They may be able to match one or other, but will never beat them.
Thus we can confidently write some trivial code to do a single pass solution which considers just
the two and three-point averages.
-------------------
# Problem Description
A non-empty zero-indexed array A consisting of N integers is given. A pair of integers (P, Q),
such that 0 <= P < Q < N, is called a slice of array A (notice that the slice contains at least
two elements). The average of a slice (P, Q) is the sum of A[P] + A[P + 1] + ... + A[Q] divided
by the length of the slice. To be precise, the average equals (A[P] + A[P + 1] + ... + A[Q]) / (Q - P + 1).
For example, array A such that:
A[0] = 4
A[1] = 2
A[2] = 2
A[3] = 5
A[4] = 1
A[5] = 5
A[6] = 8
contains the following example slices:
slice (1, 2), whose average is (2 + 2) / 2 = 2;
slice (3, 4), whose average is (5 + 1) / 2 = 3;
slice (1, 4), whose average is (2 + 2 + 5 + 1) / 4 = 2.5.
The goal is to find the starting position of a slice whose average is minimal.
Write a function:
def solution(A)
that, given a non-empty zero-indexed array A consisting of N integers, returns the starting
position of the slice with the minimal average. If there is more than one slice with a minimal
average, you should return the smallest starting position of such a slice.
For example, given array A such that:
A[0] = 4
A[1] = 2
A[2] = 2
A[3] = 5
A[4] = 1
A[5] = 5
A[6] = 8
the function should return 1, as explained above.
Assume that:
N is an integer within the range [2..100,000];
each element of array A is an integer within the range [-10,000..10,000].
Complexity:
expected worst-case time complexity is O(N);
expected worst-case space complexity is O(N), beyond input storage (not counting the
storage required for input arguments).
Elements of input arrays can be modified.
"""


import unittest
import random


RANGE_A = (2, 100000)
RANGE_N = (-10000, 10000)


def solution(A):
"""
:param A: array of integers
:return: an integer
"""
# the lowest average we've ever seen
lowest_avg = RANGE_N[1]
# the starting point of the lowest average seen
lowest_idx = 0
# the value we saw two iterations ago
second_last = None
# the value we saw last iteration
last = None

# for every number in the sequence
for idx, this in enumerate(A):

# if we have seen three numbers calculate the three-point average
# and, if necessary, keep it.
if second_last is not None:
three_avg = (second_last + last + this) / 3.0
if three_avg < lowest_avg:
lowest_avg = three_avg
lowest_idx = idx - 2

# if we have seen two numbers calculate the two-point average
# and, if necessary, keep it.
if last is not None:
two_avg = (last + this) / 2.0
if two_avg < lowest_avg:
lowest_avg = two_avg
lowest_idx = idx - 1

# print idx, second_last, last, this, '\t\t', two_avg, three_avg, '\t\t', lowest_avg, lowest_idx

second_last = last
last = this

return lowest_idx


class TestExercise(unittest.TestCase):
def test_example(self):
self.assertEqual(solution([4, 2, 2, 5, 1, 5, 8]), 1)
self.assertEqual(solution([5, 2, 2, 100, 1, 1, 100]), 4)
self.assertEqual(solution([11, 2, 10, 1, 100, 2, 9, 2, 100]), 1)

def test_three(self):
# self.assertEqual(solution([-3, -5, -8, -4, -10]), 2)
# self.assertEqual(solution([-8, -6, -10]), 0)
self.assertEqual(solution([1, -1, 1, -1]), 1)

def test_random(self):
A = [random.randint(*RANGE_N) for _ in xrange(2, 10)]
print A
print solution(A)

def test_large_ones(self):
"""Numbers from -1 to 1, N = ~100000"""
# how to test?

def test_extreme(self):
A = [RANGE_N[1]] * (RANGE_A[1] / 3) + [RANGE_N[0]] * (RANGE_A[1] / 3)
idx = solution(A)
print idx, A[idx-3:idx+3]

if __name__ == '__main__':
unittest.main()

0 comments on commit d4dda25

Please sign in to comment.