# Shuffling

In this bonus section we consider a bad and a better shuffling algorithm. The bad one is shown below. It swaps cards until every card in the deck is flagged as swapped.

In [1]:
import random

def shuffle1(deck):
    "Norvig's teacher's algorithm."
    n = len(deck)
    swapped = [False] * n
    while not all(swapped):
        i, j = random.randrange(n), random.randrange(n)
        swapped[i] = swapped[j] = True
        swap(deck, i, j)

def swap(deck, i, j):
    "Swap elements i and j of a collection."
    # print(f'Swap {i} and {j}')
    deck[i], deck[j] = deck[j], deck[i]

How long does this approach take? In the quiz the following three choices are given:

1. $O(N)$
2. $O(N^2)$
3. $O(N!)$

I got the wrong answer (I said 3). The correct answer is $O(N^2)$. To see it, consider the case where every element but one is flagged as swapped. It will take on average $N$ attempts before getting that element and flagging it as swapped. We have to do this for $N$ elements, hence $O(N^2)$.

A better shuffling algorithm that Knuth calls the P-algorithm, is shown below.

In [2]:
def shuffle(dec):
    "Knuth's algorithm P"
    n = len(deck)
    for i in range(n-1):
        swap(deck, i, random.randrange(i, n))

def swap(deck, i, j):
    "Swap elements i and j of a collection."
    # print(f'Swap {i} and {j}')
    deck[i], deck[j] = deck[j], deck[i]

Norvig asks what happens if we replace `range(n-1)` with `range(n)` in line 5 above. Choices are:

1. We get an `IndexError`.
2. We get a `ValueError`.
3. We get no error but the results would be unfair, because one result would be more common than any other.
4. We would get no error, and the result would still be fair but a little bit slower.

Let's see with an example.

In [3]:
tmp = list(range(5))
n = len(tmp)
for i in range(n):
    swap(tmp, i, random.randrange(i, n))

My answer was 3, but I was wrong again (reminder to self: think a bit longer before answering). The correct answer is 4. The only problem is that the final step would swap the last element with itself, which wouldn't do anything bad, but it would just do an extra step.

## Is it random?

In the algorithm P every permutation has the same probability (prove it). Is this true also for the teacher's algorithm? The possibilities are:

1. Yes, they have the same probability, it just takes longer.
2. No, some will have a different probability.
3. No, some will have zero probability.

The correct answer is 2. The code below provides some code to test shufflers, and two modified versions of the teacher's shuffler. Now the question for each of these shufflers is what is their runtime, and whether they are correct, in the sense that every swap has the same probability? We can get the answer about the runtime by comparing these algorithms with the ones we have seen so far. As for the correctness, we can get this information by running the code below.

| Shuffler  | $O(N)$ | $O(N^2) | Correct? |
|-----------|--------|---------|----------|
| Shuffle   | Yes    | No      | Yes      |
| Shuffle 1 | No     | Yes     | No       |  
| Shuffle 2 | No     | Yes     | Yes      |
| Shuffle 3 | Yes    | No      | No       |

In [4]:
from collections import defaultdict

def test_shuffler(shuffler, deck='abcd', n=10000):
    counts = defaultdict(int)
    for _ in range(n):
        input = list(deck)
        shuffler(input)
        counts[''.join(input)] += 1
    e = n*1./factorial(len(deck))
    ok = all((0.9 <= counts[item]/e <= 1.1) for item in counts)
    name = shuffler.__name__
    print('%s(%s) %s' % (name, deck, ('ok' if ok else '*** BAD ***')))
    print()
    for item, count in sorted(counts.items()):
        print('%s:%4.1f' % (item, count*100./n))
    print()

def test_shufflers(shufflers=[shuffle, shuffle1], decks=['abc', 'ab']):
    for deck in decks:
        print()
        for f in shufflers:
            test_shuffler(f, deck)

def factorial(n):
    return 1 if (n <= 1) else n*factorial(n-1)

def shuffle2(deck):
    "A modification of the teacher's algorithm."
    n = len(deck)
    swapped = [False] * n
    while not all(swapped):
        i, j = random.randrange(n), random.randrange(n)
        swapped[i] = True
        swap(deck, i, j)

def shuffle3(deck):
    "An easier modification of the teacher's algorithm."
    n = len(deck)
    for i in range(n):
        swap(deck, i, random.randrange(n))

In [5]:
test_shufflers(shufflers=[shuffle2, shuffle3])


shuffle2(abc) ok

abc:15.7
acb:16.3
bac:17.2
bca:16.7
cab:16.6
cba:17.5

shuffle3(abc) *** BAD ***

abc:14.6
acb:18.2
bac:18.4
bca:18.6
cab:15.3
cba:14.9


shuffle2(ab) ok

ab:50.2
ba:49.8

shuffle3(ab) ok

ab:49.5
ba:50.5



## Computing or doing

The shuffle algorithms seen above, and those from the Python Standard Library, all return `None` because they modify their input *in place*. There is a tension between *computing* something and *doing* something. Some functions like `math.sin()`, `math.sqrt()`, compute something. Some others, like `shuffle`, take an input and modify that input. We refer to the former as to **pure functions** and to the latter as **impure functions** or **subroutines**. These are not functions in the mathematical sense, since they have "an effect on the world".

We will see more examples of pure functions than of subroutines, and the reason is that they are easier to test. To test a subroutine, we need to create a state, modify that state, and verify that the modified state is equal to something. Look at the examples below:

In [16]:
# Pure function: sorted()
lst = [3, 2, 1]
assert sorted(lst) == [1, 2, 3]

# Impure function: list.sort()
lst.sort()
assert lst == [1, 2, 3]
# Note that assert lst.sort() == [1, 2, 3] returns an AssertionError
# Note: this wouldn't work because the l.h.s. returns None
# assert [3, 2, 1].sort() == [1, 2, 3]