# Sub-problem: Is the Pancake Stack Sorted?

Let's start with a relatively straightforward sub-problem. You and I could eyeball a stack of pancakes and pretty quickly judge if it's ordered or not. Python can't. Instead, we'll write a function that takes a pancake stack as input and returns `True` if it is sorted (smallest to largest) and `False` otherwise.

Since this is a relatively straightforward sub-problem, we'll spend a little time considering alternative solutions.

## Building a Test Function

It's usually a good idea to test your solution(s) to make sure your code works the way you expect. Since we're going to consider a few alternatives, let's build a function that we can reuse each time we want to test a solution.

In [None]:
def test_is_sorted(func_to_test):
    assert(func_to_test([1, 2, 3, 4, 5]) == True)  # returns True when the stack is sorted
    assert(func_to_test([1, 2, 3, 5, 4]) == False) # returns False when misordered at end of stack
    assert(func_to_test([2, 1, 3, 4, 5]) == False) # returns False when misordered at beginning of stack
    assert(func_to_test([]) == True) # returns True when the stack is empty (so ordered)
    assert(func_to_test([3]) == True) # returns true when the stack has only one item (so ordered)
    return 'passed'

When we want to test a solution, we'll pass the function we want to test as an argument. (Stop for a second and appreciate how *rad* it is that you can pass one function to another function.)

Those `assert` statements will throw an error if `func_to_test` doesn't return the expected result. Otherwise, it will return `"passed"`. Look at the comments to see waht each test is testing. When writing tests, try to think of all the different sorts of input the function might receive, including unlike **edge cases** (like an empty list or a list with a single element).

## Simple(-Looking) Solution

Let's start with a pretty simple-looking solution:

In [None]:
def is_sorted_simple(pancakes):
    return pancakes == sorted(pancakes)

And let's test it:

In [None]:
test_is_sorted(is_sorted_simple)

I said "simple**-looking**" because there's actually a lot of working going on "under the hood":
  - First, the Python interpreter has to sort the list of pancakes. Sorting with `sorted` takes `n log n` or **linearithmic** time. That means that the time it takes to sort a list grows **faster** than the list size itself. Linearithmic time isn't terrible, but it's worth noting.
  - Second, the Python interpreter has to compare each item of the list to its corresponding item in the sorted list.

Is this a "bad" solution? No. It's fairly easy to understand, and the performance penalty for lists with 10 or 15 or 20 items isn't so bad. But it's also the case that we'll likely call this function thousands of times.

## Index-Range Solution

Let's give it another try. In this case, we'll use a `for` loop that will execute `n - 1` times where `n` is the length of the pancake stack. On each iteration of the loop, we'll compare the "current" pancake to the one that comes after it. If a pancake is ever bigger than the pancake that follows it, we'll know the stack isn't sorted, so we'll return `False` and thus stop the function. If we get all the way through the list and never find an out-of-order pair, we'll know the list is sorted, so we'll return `True`.

In [None]:
def is_sorted_index_range(pancakes):
    for index in range(len(pancakes) - 1): # sub 1 because you don't need to compare the last pancake to a "next" pancake
        pancake, next_pancake = pancakes[index], pancakes[index + 1]
        if pancake > next_pancake:
            return False
    return True

Let's use our tests to make sure it's working the way we want.

In [None]:
test_is_sorted(is_sorted_index_range)

This solution doesn't require the linearithmic sort. Since it only needs as many iterations as there are elements in the list, its time will grow in step with the size of the input list. It also takes less memory since it doesn't have to store a (sorted) copy of the list. True, it stores a few extra variables, but we don't need *more* variables as the the size of the input list grows. In fact, we could do without a couple variable assignments. This is equivalent:

```python
def is_sorted_index_range_alt(pancakes):
    for index in range(len(pancakes) - 1):
        if pancakes[index] > pancakes[index + 1]:
          return False
    return True
```

So it's more ***efficient*** than the "simple" solution. But as is often the case, more efficient code can be harder to read and understand. I have to work a little harder to understand what this code does. It's a little easier to make a mistake. For example, see what would happen if you don't subtract 1 in the `range` you create for the loop. So maybe you find yourself writing more comments to explain quirky bits of code or creating extra variables to make it more ***readable***.

## "Enumerate" Solution

Now let's look at a slight modification of the "index-range" solution. 

In [None]:
def is_sorted_enumerate(pancakes):
    for index, pancake in enumerate(pancakes[:-1]):
        next_pancake = pancakes[index + 1]
        if pancake > next_pancake:
            return False
    return True

In [None]:
test_is_sorted(is_sorted_enumerate)

Using `range` to get a list of indices and then using those indices to access the values in the list (as we did in the "range-index" solution) is super common, especially if you need to access more than one list element at a time, for example, to compare an element to its neighbor.
 

`enumerate` gives us a way to access *both* the current element *and* its index. To see why, take a look at what enumerate does:

In [None]:
example_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
list(enumerate(example_list))

`enumerate` creates an *iterator*. We'll learn more about those later. For now, to make it easier to see what's happening, I forced it back into a list. And what does it contain? It has the same length as `example_list`, but in place of each element in `example_list`, there's a **tuple** (a fixed-length list) with two elements, the index number and the original value: `(<index>, <value>)`. 

In the `for` loop, we can then *destructure* the elements of each tuple, assigning the first element in the tuple to the variable `index` and the second element in the tuple to the variable `pancake`. Then we already have the value of the "current" pancake but also have access to its position (index) in the list so we can look up its successor.

Is it better? Not obviously. It more a matter of opinion. But it's good for us to know that there's more than one way to go about it.

## "Zip" Solution

Okay, one more. Before I show you the complete solution, let me (re-)introduce you to the `zip` function.

In [None]:
list_one = ["red", "blue", "green"]
list_two = ["blood", "sky", "lime"]
list(zip(list_two, list_one))

`zip` "zips" together two or more lists. Like `enumerate`, it produces an iterator, but again I've forced it into a list. Its first element is tuple of the first element from each list, its second element is a tuple of the second element from each list, and so on.

One more preliminary. Look at what I get if I offset a list by one and then `zip` it with the original:

In [None]:
ordered = [1, 2, 3, 4, 5, 6]
offset = ordered[1:]
list(zip(ordered, offset))

Because `ordered` is longer than `offset`, the last element of `ordered` is left off. But that's no problem for our purposes. To me, this looks like a promising way to compare one element of a list with the element that follows it.

In [None]:
def is_sorted_zip(pancakes):
    for pancake, next_pancake in zip(pancakes, pancakes[1:]):
        if pancake > next_pancake:
            return False
    return True

In [None]:
test_is_sorted(is_sorted_zip)

It's more efficient than the simple solution and very nearly as efficient as the range-index or enumerate solutions. Is it more readable? I think so, but opinions can vary.

## Timing the Solutions

Curious how long each of these solutions takes to run? Python notebooks have special "magics", one of which will time a function.

In [None]:
%%time
is_sorted_simple([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

"Wall time" means the amount of actual time that passed from when the function started executing to when it stopped. The total "CPU" time, on the other hand, is how much processor time it took up. Why are the two not the same? Because the CPU is shared with other processes, some "wall time" is spent doing work for those other processes.

Your results may vary, but when I ran the cell, above, `is_sorted_simple` took about 10 microseconds of CPU time. That's 10 **millionths** of a second. If you run it again, you'll likely get a different result, and because the number is so small to start with, the difference could be relatively large. For example, I just ran the cell a few more times and got 10, 11, 10, 12, and 18 microseconds. 1 microsecond doesn't seem like much (and in absolute terms, it's not), but if I'm benchmarking a function, 11 microseconds is 10% slower than 10 microseconds. It sounds more significant when you put it that way. And 18 microseconds is a whopping 80% slower.

With that sort of variation between executions of the same function, we're unlikely to be able to draw any firm conclusions when we time different functions.

That doesn't mean we can't compare. To get more reliable comparisons, we need to:
1. increase the size of the list that we're checking (since we've been analyzing performance in terms of the size of the input, we need inputs big enough to make a difference)
2. try the function many, many times and take the average time

Luckily, Python notebooks make that pretty easy.

We'll first create a list with 10,000 elements that we can use for our benchmarks.

In [None]:
timing_list = list(range(10000))

And then we'll use the `%timeit` magic. Instead of running the code just once (as `%time` or `%%time` does), `%timeit` will execute your code again and again for about 2 seconds and then report the **average** it took. Depending on how fast the function it, it'll be executed somewhere 7,000 - 70,000 times, and the average of that many runs is a far more reliable measure of the performance of the function.

In [None]:
%timeit is_sorted_simple(timing_list)

In [None]:
%timeit is_sorted_enumerate(timing_list)

In [None]:
%timeit is_sorted_index_range(timing_list)

In [None]:
%timeit is_sorted_zip(timing_list)

Wait, what? Our analysis lead us to think that `is_sorted_simple` would be the LEAST performant solution, but it was the fastest. By a lot. In my trials, `is_sorted_simple` was about 5 times faster than `is_sorted_zip` and about 12-13 times faster than `is_sorted_enumerate` and `is_sorted_index_range`! Yikes.

So ~were we~ was I just wrong? Maybe. But maybe not.

`sorted` -- the part of `is_sorted_simple` that we said would cause it to be slower, is pretty highly optimized, especially for lists that are already sorted or nearly so. (If you're curious, it uses a hybrid algorithm -- *timsort* -- that was specially created for Python but has since been used in other languages.) And what we used for our benchmarks was an already-sorted list.

What would happen if we used an un-sorted list for our benchmarks. Let's see.

I'll make a copy of our 10,000 element `timing_list`, but then I'll use the `random` library to *shuffle* it. The result will be more realistic.

In [None]:
import random

timing_list_randomized = timing_list.copy()
random.shuffle(timing_list_randomized)
timing_list_randomized

In [None]:
%timeit is_sorted_simple(timing_list_randomized)

In [None]:
%timeit is_sorted_index_range(timing_list_randomized)

In [None]:
%timeit is_sorted_enumerate(timing_list_randomized)

In [None]:
%timeit is_sorted_zip(timing_list_randomized)

These results tell a *very* different story. When I ran these benchmarks, `is_sorted_index_range` was blazingly fast: only 330 nanoseconds (330 *billionths* of a second). `is_sorted_enumerate` and `is_sorted_zip` were much slower, but still fast: about 33 microseconds. 

These results make sense: `is_sorted_index_range` can stop as soon as it finds one out-of-order element and it has almost no setup work to do. `is_sorted_enumerate` and `is_sorted_zip` also can bail early, but they have more setup work to do.

And `is_sorted_simple`? In my results, at least, it took on average 1.2 milliseconds (a millisecond is 1/1000 of a second). That's about 30 times slower than `is_sorted_enumerate` and `is_sorted_zip` and a whopping 3000 times slower than `is_sorted_index_range`.

If we wanted to be even more thorough, we'd try reshuffling the numbers on each execution. That way, a random ordering favorable to one approach or the other wouldn't skew the results. But I'd speculate that the results won't be dramatically different.

## Conclusion

So which would you choose? The simplicity of `is_sorted_simple` is attractive, but looking ahead, there's going to be lots of places where we're really taxing the CPU. If we don't have to here, why should we? I'll leave it up to you.

## On Further Consideration

Wait, isn't there some way we could make the simpler, more readable solution more performant? I think so. 

The problem with `is_sorted_simple` is that it has to resort the stack on every execution. But that isn't really necessary, is it? The goal state (the ordered stack) never changes. We could just pass in the goal state along with the pancake stack we want to test, could we?

In [None]:
def is_sorted_simple_revised(pancakes, goal):
    return pancakes == goal

In [None]:
%timeit is_sorted_simple_revised(timing_list_randomized, timing_list)

Much better. `is_sorted_simple_revised` is the most readable and, on my trial, also the fastest. We were doing lots of pointless work resorting the stack every time.

Sometimes you just need to step back and rethink your approach.