# Hockey Tournament

In [1]:
from soccer_functions import sequences
from collections import defaultdict
import numpy as np
import pandas as pd

## Background

The 4 Nations Face-Off features four national teams in competitive ice hockey. There are no ties, and points are awarded as follows:

<ul>
    <li>3 points for a win in regulation</li>
    <li>2 points for a win in overtime</li>
    <li>1 point for a loss in overtime</li>
    <li>0 points for a loss in regulation</li>
    </ul>
    
Note that there will be three points in total awarded for each game.

There are six games total: We can say WLOG that: a plays b; then a plays c; then a plays d; then b plays c; then b plays d; then c plays d. Each game has four different outcomes. For the first game, either:

<ul>
    <li>a defeats b in regulation (3 points to a, 0 points to b)</li> or
    <li>a defeats b in overtime (2 points to a, 1 point to b)</li> or
    <li>a loses to b in overtime (1 point to a, 2 points to b)</li> or
    <li>a loses to b in regulation (0 points to a, 3 points to b)</li>
    </ul>

## Comparison to World Cup Analysis

In previous work we made a similar [analysis](https://www.github.com/gadamico/soccer_and_statistics) in thinking about the group play stage in soccer's World Cup. Let's bring in our sequences function from that earlier work.

In [3]:
print(sequences.__doc__)


    We assume that we want to build a sequence of characters
    of a given length where we have a set of choices for each
    position in the sequence. This function generates all possible
    such sequences. The function assumes that all choice sets have
    the same length.
    
    Example
    --------
    
    sequences(['ab', 'cd']) = ['ac', 'ad', 'bc', 'bd']
    
    


Suppose we use a capital letter to denote a regulation win for a particular team, and a lowercase letter for an overtime win. Then we can grab a hold of all possible results in the following way:

In [3]:
poss = sequences(['AabB', 'AacC', 'AadD', 'BbcC', 'BbdD', 'CcdD'])
len(poss)

4096

That's four possible results for each of six games.

In [5]:
4**6

4096

For the World Cup soccer tournament, I had a "standings" function that would convert a sequence of game results to a set of standings. For soccer, my "standings" function looked like this:

In [6]:
def standings(results):
    """This function will return a DataFrame of the rankings
    of Teams a, b, c, and d given an input string of the
    outcomes of the six games, where those games are arranged
    in the following order: a vs. b, a vs. c, a vs. d, b vs. c,
    b vs. d, c vs. d."""
    import numpy as np
    import pandas as pd
    
    # Initialize the point totals at 0
    a = 0
    b = 0
    c = 0
    d = 0
    
    # Identify the ties in the outcome string
    ties = [j for j in range(6) if results[j] == 't']
    
    # Assign the ties' points accordingly
    if 0 in ties:
        a += 1
        b += 1
    if 1 in ties:
        a += 1
        c += 1
    if 2 in ties:
        a += 1
        d += 1
    if 3 in ties:
        b += 1
        c += 1
    if 4 in ties:
        b += 1
        d += 1
    if 5 in ties:
        c += 1
        d += 1
    
    # Assign the wins' points accordingly
    a += 3*results.count('a')
    b += 3*results.count('b')
    c += 3*results.count('c')
    d += 3*results.count('d')
    
    return pd.DataFrame(np.array([a, b, c, d]),
                     index=['a', 'b', 'c', 'd'],
                       columns=['points']).sort_values(by='points', ascending=False)

## From Results to Standings

We just need to modify this slightly for our new hockey format:

In [8]:
def hockeyStandings(results):
    """This function will return a DataFrame of the rankings
    of Teams a, b, c, and d given an input string of the
    outcomes of the six games, where those games are arranged
    in the following order: a vs. b, a vs. c, a vs. d, b vs. c,
    b vs. d, c vs. d."""
    import numpy as np
    import pandas as pd
    
    # Initialize the point totals at 0
    a = 0
    b = 0
    c = 0
    d = 0
    
    # Assign the regulation wins' points accordingly
    a += 3*results.count('A')
    b += 3*results.count('B')
    c += 3*results.count('C')
    d += 3*results.count('D')
    
    # Assign the overtime points accordingly
    if results[0] == 'a':
        a += 2
        b += 1
    if results[0] == 'b':
        a += 1
        b += 2
    if results[1] == 'a':
        a += 2
        c += 1
    if results[1] == 'c':
        a += 1
        c += 2
    if results[2] == 'a':
        a += 2
        d += 1
    if results[2] == 'd':
        a += 1
        d += 2
    if results[3] == 'b':
        b += 2
        c += 1
    if results[3] == 'c':
        b += 1
        c += 2
    if results[4] == 'b':
        b += 2
        d += 1
    if results[4] == 'd':
        b += 1
        d += 2
    if results[5] == 'c':
        c += 2
        d += 1
    if results[5] == 'd':
        c += 1
        d += 2
    
    return pd.DataFrame(np.array([a, b, c, d]),
                     index=['a', 'b', 'c', 'd'],
                       columns=['points']).sort_values(by='points', ascending=False)

Let's try it out!

In [9]:
hockeyStandings('aaDbbd')

Unnamed: 0,points
d,6
b,5
a,4
c,3


## Counting

Let's now gather all possible sets of standings.

In [13]:
coll = []
for stand in poss:
    coll.append(hockeyStandings(stand))

In [15]:
# These DataFrames may prove unwieldy for our purposes. Let's
# make a large NumPy array instead:

big_arr = np.concatenate([df.values for df in coll], axis=1)

big_arr

array([[9, 9, 9, ..., 8, 8, 9],
       [6, 6, 6, ..., 7, 7, 6],
       [3, 2, 2, ..., 3, 3, 3],
       [0, 1, 1, ..., 0, 0, 0]])

In [16]:
# Let's now use defaultdict to count how many of each set of final
# standings we have:

counter = defaultdict(int)

# Technical note: We can't use the NumPy arrays themselves as keys,
# since they're not hashable. So we'll turn them into strings first!
for stand in big_arr.T:
    counter[str(stand)] += 1

len(counter.keys())

37

In [17]:
counter

defaultdict(int,
            {'[9 6 3 0]': 24,
             '[9 6 2 1]': 24,
             '[9 5 3 1]': 48,
             '[9 5 2 2]': 24,
             '[9 5 4 0]': 24,
             '[9 4 3 2]': 72,
             '[9 4 4 1]': 24,
             '[9 3 3 3]': 16,
             '[8 6 3 1]': 96,
             '[8 6 2 2]': 48,
             '[8 6 4 0]': 48,
             '[8 5 3 2]': 168,
             '[8 5 4 1]': 120,
             '[8 5 5 0]': 24,
             '[8 4 3 3]': 120,
             '[8 4 4 2]': 96,
             '[8 7 3 0]': 24,
             '[8 7 2 1]': 24,
             '[7 6 3 2]': 240,
             '[7 6 4 1]': 168,
             '[7 6 5 0]': 72,
             '[7 5 3 3]': 192,
             '[7 5 4 2]': 312,
             '[7 5 5 1]': 96,
             '[7 4 4 3]': 216,
             '[7 7 4 0]': 24,
             '[7 7 3 1]': 48,
             '[7 7 2 2]': 24,
             '[6 6 3 3]': 120,
             '[6 6 4 2]': 192,
             '[6 6 5 1]': 120,
             '[6 6 6 0]': 16,
            

In [18]:
sum(counter.values())

4096