# Lesson 1

In this unit we write a poker program. This is an example of a general process where we start with a vague understanding of the problem and then we refine our understanding so that it has a formal specification, then we specify that into something that is amenable to be coded, and we end up with working code. In a diagram this looks like this:

```text
      1              2                     3
?? -----> Problem -----> Specifications -----> Code
  Understand     Specify                Design
```

The first thing we need to do is to make an *inventory* of the concepts we will have to deal with.

1. **Hands**: a hand consists of 5 cards. A card has a rank and a suit. The program we are trying to write takes a *list of hands* and returns the *best hand*.
2. In order to select the best hand we need to *rank* hands. Another concept is, therefore, **hand rank**. What are the concepts that make a hand rank? This [Wikipedia page](https://en.wikipedia.org/wiki/List_of_poker_hands) details all the rules of poker, but we are primarily interested in the following three concepts:
   1. **n-kind**: when a hand contains $n$ cards of the same *rank* (not suit).
   2. **straight**: when we have *5 consecutive ranks*.
   3. **flush**: when all 5 cards have the same suit, and the ranks don't matter.

Now we are ready to move to the desigh phase.

## Representing hands

What is a good representation of a hand? Consider these three examples:

- `['JS', 'JD', '2S', '2C', '7H']`
- `[(11, 'S'), (11, 'D'), (2, 'S'), (2, 'C'), (7, 'H')]`
- `set(['JS', 'JD', '2S', '2C', '7H'])`
- `'JS JS 2S 2C 7H'`

Which one(s) are best suited to our problem? Which ones are not? The set representation wouldn't work if we used two decks, otherwise all the above representations will work, if not with the same efficacy. For example, the first and the last one are equivalent, but for the string representation you will need to call `split()`. Overall, the first two are the best ones.

## The `poker()` function

We can start thinking about the `poker()` function. Out of a list of hands, this function should return the highest ranking hand, therefore we can intuitively imagine that a call to `max()` should appear here.

In [1]:
def poker(hands):
    "Return the best hand: poker([hand, ...]) => hand"
    pass

To make sure we understand how `max()` works, look at the example below. The returned values might be counterintuitive at first. Note the use of `key=abs` (and not `key=abs()`). Look at the documentation for [`list.sort()`](https://docs.python.org/3/library/stdtypes.html#list.sort) and at the [Sorting HOWTO](https://docs.python.org/3/howto/sorting.html#sortinghowto).

In [2]:
max([3, 4, 5, 0]), max([3, 4, -5, 0], key=abs)

(5, -5)

For now we can write the `poker()` function based on a yet-to-define `hand_rank()` function.

In [3]:
def poker(hands):
    "Return the best hand: poker([hand, .   ..]) => hand"
    return max(hands, key=hand_rank)

def hand_rank(hand):
    return None

We should also have some tests in place so that we can check whether our functions are doing the right thing. We include tests for a single hand (which should correspond to the return value) and for 100 hands.

In [4]:
def test():
    "Test cases for the functions in poker program"
    sf = "6C 7C 8C 9C TC".split() # => ['6C', '7C', '8C', '9C', 'TC']
    fk = "9D 9H 9S 9C 7D".split() # Four of a kind
    fh = "TD TC TH 7C 7D".split() # Full House
    assert poker([sf, fk, fh]) == sf
    assert poker([fk, fh]) == fk
    assert poker([fh, fh]) == fh
    assert poker([fk]) == fk
    assert poker([sf, fk] + [fh] * 98) == fh


## Hand Rank Attempt

This is our most complicated function. It takes a hand, but what does it return? It should return a value that is comparable by `max()`, but we don't yet know what it should be. Why not a number? We can rank hands from 0 to 8, with 0 for an empty hand and 8 for a Straight Flush (note that Royal Flushes are ace-high Straight Flushes). We know it is going to be important to look at ranks, so let's imagine we have written a function `card_ranks()` that returns the ranks of the cards in a hand.

We may start enumerating all the cases, starting from a Straight Flush and moving on. Note that in the definition below we are including several functions we have not yet defined, like `straight()`, `flush()` and `kind(n, ranks)`.

```python
def hand_rank(hand):
    "Return a value indicating the ranking of a hand."
    ranks = card_ranks(hand) # We haven't written this one yet.
    if straight(ranks) and flush(hand):
        return 8
    elif kind(4, ranks):
        return 7
    # etc etc
```

The question is, would such a function work? The answer is, sometimes it would, and sometimes it would not. It would work for comparisons between a full house and a straight, for example, but what if two players had a pair of 10s and of 9s respectively. We need to disambiguate and rank these two pairs.

## Representing Rank

We need a better way to represent rank. There are several possibilities. We may continue to use integers, but bigger ones. For example, if we have two four-of-a-kind hands, with ranks [9, 9, 9, 9, 5] and [3, 3, 3, 3, 2] respectively. In the formulation above they would be both ranked as 7. We may represent them as 70905 and 70302, respectively. The first digit would be the rank, the third the rank of the four cards, and the last digit the rank of the remaining card. The zeros allow for ranks greater than 9.
We could otherwise use real numbers, like 7.0905 and 7.0302. Or we could use tuples like (7, 9, 5) and (7, 3, 2).

All these representations would actually work, and `max()` would do the right thing, but the tuple-based representation is the most convenient, since the other require fairly complicated arithmetic.

### Ordering in Tuples

We want to compare tuples. Tuples are compared element by element, just like strings. This is called **lexigraphic ordering**. Let's look at some examples.

In [2]:
print((7, 9, 5) > (7, 3, 2))
print((7, 9, 5) > (7, 5, 9))
print((7, 9, 5) > (7, 9, 4))
print((7, 9, 5) > (7, 9, 5, 1))
print((3, 2, 1) > (2, 11, 3))


True
True
True
False
True


## Wild West Poker

To understand the new ordering mechanism we can look at a few examples.

- Straight Flush, Jack High: `['JC', 'TC', '9C', '8C', '7C']`: (8, 11 )
- Four Aces and a Queen Kicker: `['AS', 'AH', 'AD', 'AC', 'QH']`: (7, 14, 12). 14 is the rank of an Ace.
- Full House, 8s over Kings: `['8S', '8H', '8D', 'KS', 'KC']`: (6, 8, 13)
- Flush 10-8: `['TD', '8D', '7D', '5D', '3D']`: usually the two highest cards are enough to break ties, but this is not always the case. In actuality we may want all the ranks, therefore we return (5, [10, 8, 7, 5, 3]). It is still possible that another player will have the same cards in another suit.
- Straight, Jack High: `['JC', 'TS', '9D', '8C', '7C']` (4, 11).
- Three Sevens: `['7H', '7D', '7C', '5C', '2C']`: (3, 7, [7, 7, 7, 5, 2]).
- Two pairs, Jacks and Threes: `['JD', 'JC', '3S', '3H', 'KH']`: (2, 11, 3, [13, 11, 11, 3, 3]).
- Pair of Twos, Jack High: `['2H', '2S', 'JD', '6H', '3C']`: (1, 2, [11, 6, 3, 2, 2]).
- I've got nothing: `['7C', '5C', '4C', '3C', '2D']`: (0, 7, 5, 4, 3, 2)

In some cases, in the examples above we return a tuple that contains a list. We cannot compare a number with a list. For example, this comparison would return an error: `(2, 7, 6) > (2, 7, [7, 7, 5, 5, 2])`.

Let's go back to `hand_rank()`. Note that we are using `kind(4, ranks)` both as a logical value (in the `elif` statement) and to return the ranks that appears 4 times and 4 times respectively.

Note that we have to start our `if elif else` statements with the highest ranks at the top. Four of a kind is also three of a kind and also a pair.

In [6]:
def hand_rank(hand):
    ranks = card_ranks(hand)
    if straight(ranks) and flush(hand):            # straight flush
        return (8, max(ranks))
    elif kind(4, ranks):                           # 4 of a kind
        return (7, kind(4, ranks), kind(1, ranks))
    elif kind(3, ranks) and kind(2, ranks):        # full house
        return (6, kind(3, ranks), kind(2, ranks))
    elif flush(hand):                              # flush
        return (5, ranks)
    elif straight(ranks):                          # straight
        return (4, max(ranks))
    elif kind(3, ranks):                           # 3 of a kind
        return (3, kind(3, ranks), ranks)
    elif two_pair(ranks):                          # 2 pair
        return (2, two_pair(ranks))
    elif kind(2, ranks):                           # kind
        return (1, kind(2, ranks), ranks)
    else:                                          # high card
        return (0, ranks)

We can now define the `card_ranks()` function. Note how simple is the implementation based on `.index(r)`, where we just put two `-` placeholders at the beginning, so that the ranking is correct. My initial attempt used a dictionary. Note also the use of `.sort(reverse=True)`.

In [7]:
def card_ranks(hand):
    "Return a list of the ranks, sorted with higher first."
    ranks = ['--23456789TJQKA'.index(r) for r, s in hand]
    return ranks.sort(reverse=True)

Similarly, we can implement the `straight()` and `flush()` functions. Note the arguments: `straigh(ranks)` and `flush(hand)`. The former only need the ranks to decide whether it is a straight or not. The latter needs the suits. Since the ranks are needed by other functions, it makes sense to have a `card_ranks(hand)` function that extracts the ranks once.

In [8]:
def straight(ranks):
    "Return True if the ordered ranks form a 5-card straight."
    return (max(ranks) - min(ranks) == 4) and (len(set(ranks)) == 5)

def flush(hand):
    "Return True if all the cards have the same suit."
    # Note how neat: this only works because all ranks are 1-character long.
    suits = [s for r, s in hand]
    return len(set(suits)) == 1

For the `kind()` function I initially used a counter, but this is not necessary, and we can just use the `.count()` method for lists.

Note that in the code below we use `return r` that will break the loop at the first occurrence of a `True` condition. This is a frequent pattern: when you expect that multiple elements may satisfy a condition, but you are only interested in the first one, `return` is your friend.

In [9]:
def kind(n, ranks):
    """Return the first rank that this hand has exactly n of.
    Return None if there is no n-of-a-kind in the hand."""
    for r in ranks:
        if ranks.count(r) == n:
            return r
    return None

The `kind()` function works fine when there is one single pair. If there are two pairs, only the highest ranked will be returned, since the ranks are sorted in decreasing order. If we apply `kind(2, reversed(ranks))` this will return the lowest ranked pair.

In [10]:
def two_pair(ranks):
    """If there are two pair, return the two ranks as a tuple: (highest, lowest)
    otherwise return None."""
    pair = kind(2, ranks)
    # Remember to put reversed() into a list.
    lowpair = kind(2, list(reversed(ranks)))
    if pair and lowpair != pair:
        return (pair, lowpair)
    else:
        return None

We can also update our `test()` function.

In [11]:
def test():
    "Test cases for the functions in poker program"
    sf1 = "6C 7C 8C 9C TC".split() # Straight Flush
    sf2 = "6D 7D 8D 9D TD".split() # Straight Flush
    fk = "9D 9H 9S 9C 7D".split() # Four of a Kind
    fh = "TD TC TH 7C 7D".split() # Full House
    tp = "TD 9H TH 7C 3S".split() # Two Pair
    s1 = "AS 2S 3S 4S 5C".split() # A-5 straight - TRICKY!
    s2 = "2C 3C 4C 5S 6S".split() # 2-6 straight
    ah = "AS 2S 3S 4S 6C".split() # Ace high
    sh = "2S 3S 4S 6C 7D".split() # 7 high
    al = "AC 2D 4H 3D 5S".split() # Ace-Low Straight
    fkranks = card_ranks(fk)
    tpranks = card_ranks(tp)
    assert poker([sf1, fk, fh]) == sf1
    assert poker([s1, s2, ah, fh]) == s2 # This one fails
    assert poker([fk, fh]) == fk
    assert poker([fh, fh]) == fh
    assert poker([sf1]) == sf1
    assert poker([sf1] + 99*[fh]) == sf1
    # New assertions
    assert hand_rank(sf1) == (8, 10)
    assert hand_rank(fk) == (7, 9, 7)
    assert hand_rank(fh) == (6, 10, 7)
    assert card_ranks(sf1) == [10, 9, 8, 7, 6]
    assert card_ranks(fk) == [9, 9, 9, 9, 7]
    assert card_ranks(fh) == [10, 10, 10, 7, 7]
    assert straight(card_ranks(al)) == True 

There is a tricky bit here. The hand `"AS 2S 3S 4S 5C"` is an exception, and it should be counted as a straight [5, 4, 3, 2, 1] and not as [14, 5, 4, 3, 2]. In other words, in a hand like this the aces should be considered as 1, not as the highest card. We need to change our code somewhere in order to cover this exception, but where? The amount of change should be proportional to the amount of change in the conceptualization, i.e., we want to isolate the special case. We have the following candidates:

1. `poker()`.
2. `hand_rank()`.
3. `card_ranks()`.
4. `straight()`.

### Changing `poker()`

`poker()` returns the max of the hands based on `hand_rank()`, so it would make more sense to change `hand_rank()` than `poker()`.

### Changing `hand_ranks()`

The ranking depends on the hand being correctly classified. We want `"AS 2S 3S 4S 5C"` to be classified as a Straight, so it would make more sense to operate on `straight()`.

### Changing `card_ranks()`

Here the rank of the ace changes depending on the cards in the same hand, and we have this information in this function. This would work.

### Changing `straight()`

The limitation with this approach is that it takes a single `if-then` to identify the case and classify it as a Straight, but `straight()` only returns a boolean value, which means that the value of the ace would be unchanged.

ULtimately, the best solution is to change `card_ranks()`.

In [12]:
def card_ranks(hand):
    "Return a list of the ranks, sorted with higher first."
    ranks = ['--23456789TJQKA'.index(r) for r, s in hand]
    ranks.sort(reverse=True)
    return [5, 4, 3, 2, 1] if (ranks == [14, 5, 4, 3, 2]) else ranks

## Handling Ties

In the initial specification we did not worry about ties, but now we want to account for them. What shall we modify?

- `hand_rank()`
- `poker()`
- Or shall we add a new function?

We are dealing with the case where two hands have the same rank, and we don't want to change that rank, so there is no need to touch `hand_rank()`. It can make sense to modify `poker()`, or to add a new function, say `poker_with_ties()`. Changing `poker()` looks like the best option. We are still adding a new function `allmax(iterable, key=None)`, that returns a list of all items equal to the max of the iterable, according to the function specified by key.

In [13]:
def allmax(iterable, key=None):
    "Return a list of all items equal to the max of the iterable."
    # Note, if you compute max(iterable, key=key) you will get the item with
    # the maximum score, not the maximum score, which is what you actually want.
    max_rank = max([key(item) for item in iterable])
    return [item for item in iterable if key(item) == max_rank]

# We need to redefine poker()
def poker(hands):
    "Return the best hand: poker([hand,...]) => hand"
    return allmax(hands, key=hand_rank)

## Deal

We want to create a function that deals hands of cards. More precisely we want to write a function `deal(numhands, n=5, deck)`, that deals `numhands` hands with `n` cards each.

In [14]:
import random

mydeck = [r+s for r in '23456789TJQK' for s in 'SHDC']

def deal(numhands, n=5, deck=mydeck):
    random.shuffle(deck)
    return [deck[n*i:n*(i + 1)] for i in range(numhands)]

## Hand Frequencies

This [Wikipedia page](https://en.wikipedia.org/wiki/Poker_probability) contains the probabilities for the various hands. More precisely

| Hand | Probability |
|------|-------------|
| Royal Flush | 0.000154 %|
| Straight Flush (w/o Royal Flush) | 0.00139% |
| Four of a kind | 0.02401% |
| Full House | 0.1441% |
| Flush (excl. Royal and straight) | 0.1965% |
| Straight (excl. Royal and straight flush)| 0.3925% |
| Three of a kind | 2.1128% |
| Two pair | 4.7539% |
| Pair | 42.2569% |
| High card | 50.1177% |

In [15]:
hand_names = ['High Card',
 'Pair',
 'Two pair',
 'Three of a kind',
 'Straight',
 'Flush',
 'Full house',
 'Four of a kind',
 'Straigh Flush']


def hand_percentages(n=1000_1000):
    "Sample random hands and print a table of frequencies."
    counts = [0] * 9
    for i in range(n // 10):
        for hand in deal(10):
            ranking = hand_rank(hand)[0] # The first element identifies the hand
            counts[ranking] += 1
    for i in reversed(range(9)):
        print(f"%14s: %6.3f %%" % (hand_names[i], 100 * counts[i] / n))

In [16]:
%%time
hand_percentages()

 Straigh Flush:  0.002 %
Four of a kind:  0.028 %
    Full house:  0.163 %
         Flush:  0.673 %
      Straight:  0.423 %
Three of a kind:  2.245 %
      Two pair:  4.995 %
          Pair: 41.794 %
     High Card: 49.677 %
CPU times: user 1min 4s, sys: 1.81 ms, total: 1min 4s
Wall time: 1min 4s


## Dimensions of Programming

The dimensions we consider are:

1. Correctness.
2. Efficiency.
3. Features.
4. Elegance.

Voltaire: "The best is the enemy of the good". There is a trade-off between the cost of improving things and the actual gains we derive.

### Refactoring

Refactoring means changing a program in a way that it doesn't do anything different, but it becomes easier to read and to maintain. For example, in `hands_rank()` there is this repetitive snippet.

```python
elif kind(3, ranks) and kind(2, ranks):  # full house
    return (6, kind(3, ranks), kind(2, ranks))
```

while trying to remove this repetitiveness, we may come across a different way to represent hands. For this purpose we can write a function `group()` that returns two values: a tuple with the count of each rank in the hand, in decreasing order, and a tuple with the corresponding card ranks. Therefore, if we have a hand with ranks (7, 10, 7, 9, 7), `group()` will return `[(3, 7), (1, 10), (1, 9)]`.

The snippet below is another remarkable example. My solution was more complicated, with the extraction of the element with the highest count, the sorting of the remaining elements by rank etc. Here a single sort is enough. Why? Let's consider the following possible forms of the `counts` tuple.

- (4, 1)
- (3, 2)
- (3, 1, 1)
- (2, 2, 1)
- (2, 1, 1, 1)
- (1, 1, 1, 1, 1)

In all these cases, sorting will do the right thing.

In [17]:
def group(items):
    """Returns a list of [(count, x)...], highest count first,
    then highest x first."""
    groups = [(items.count(x), x) for x in set(items)]
    return sorted(groups, reverse=True)

This function would allow to rewrite `hands_rank()` in a different and more concise way.

In [18]:
def hand_rank(hand):
    "Return a value indicating how high the hand ranks."
    # We create the groups straight away for each hand.
    groups = (['--23456789TJQKA'.index(r) for r, s in hand])
    counts, ranks = unzip(groups) # Not yet defined.
    if ranks == (14, 5, 4, 3, 2):
        ranks = (5, 4, 3, 2, 1)
    # We check for straights and flushes here, without the need
    # for helper functions
    straight = len(ranks) == 5 and max(ranks) - min(ranks) == 4
    flush = len(set([s for r,s in hand])) == 1
    return (9 if (5,) == counts else
            8 if straight and flush else
            7 if (4, 1) == counts else
            6 if (3, 2) == counts else
            5 if flush else
            4 if straight else
            3 if (3, 1, 1) == counts else
            2 if (2, 2, 1) == counts else
            1 if (2, 1, 1, 1) == counts else
            0), ranks

The `unzip()` function is small but interesting. It pretty much transposes a list of lists or tuples. In our case, for example, we have a list `[(count1, rank1), (count2, rank2), ...]` and we want to assign `[count1, count2,...]` to `counts` and `[rank1, rank2, ...]` to `ranks. This is what this function does.

In [19]:
def unzip(pairs):
    return zip(*pairs)

list(unzip([(1, 2), (3, 4)]))

[(1, 3), (2, 4)]

This new version is not only shorter, but it also highlights the fact that the various hand types are partitions of 5 sorted lexicographically.

## Final refactoring

We can further simplify `hand_rank()`: instead of having a long "case" statement, we can use a dictionary as a lookup table as shown below. Note that the ranking of a straight flush is now 9, but that's OK, as the relative ranking is unchanged. Note also that we have a five-of-a-kind scenario, which is not possible when using a single deck.

In [20]:
count_rankings = {
    (5,): 10,
    (4, 1): 7,
    (3, 2): 6,
    (3, 1, 1): 3,
    (2, 2, 1): 2,
    (2, 1, 1, 1): 1,
    (1, 1, 1, 1, 1): 0
}

def hand_rank(hand):
    "Return a value indicating how high the hand ranks."
    groups = (['--23456789TJQKA'.index(r) for r, s in hand])
    counts, ranks = unzip(groups) # Not yet defined.
    if ranks == (14, 5, 4, 3, 2):
        ranks = (5, 4, 3, 2, 1)
    straight = len(ranks) == 5 and max(ranks) - min(ranks) == 4
    flush = len(set([s for r,s in hand])) == 1
    return max(count_rankings[counts], 4*straight, 5*flush), ranks

## Lessons Learned

1. Understand the problem. Does it make sense? Does it not? Do you need more information? Which parts are obscure or unclear or vague?
2. Define the various "pieces". Figure out everything that needs to be represented in the problem.
3. As much as you can, try to reuse the pieces you have already.
4. Make sure you have tests and run them.
5. Explore the design space. Move across the four dimensions of correctness, efficiency, features, and elegance.