# Bingo Generator Testing and Design

We need a way to generate a bunch of randomly generated bingo cards from a list of words.

This presents us with the following requirements:
* main.py file with the ability to accept CLI flags
* a specifiable input file and default output folder
* ability to ingest a list and automatically generate randomly arranged tables
* the ability to format and style those tables, and export them as images for printing
* The ability to anticipate that any generated card is at least xx% different from another generated card
* default functionality to standard Bingo card generation

A utility function will be created to create an array from a newline-delimited list of terms. We'll then use numpy and pandas to generate random arrangements of these terms across 25 squares (with the center square always being a FREE space, of course). From there, we'll use Pandas' `to_html()` to format and style the text with CSS, and imgkit to export this all as images in a designated output directory.

## Random Arrangement Testing

In [1]:
import pandas as pd
import numpy as np
import imgkit
import time
import os

In [2]:
num_list = np.arange(0, 50)

def table_gen(source_list):
    table = []
    np.random.shuffle(source_list)
    for i in range(0, 5): # rows
        row = []
        for j in range(0, 5): # columns
            if i == 2 and j == 2:
                row.append("FREE")
            else:
                row.append(source_list[(i * 5) + j])
        table.append(row)
    return table

In [3]:
print(table_gen(num_list))
print(table_gen(num_list))
print(table_gen(num_list))
print(table_gen(num_list))

[[1, 6, 7, 19, 29], [14, 12, 22, 28, 38], [10, 23, 'FREE', 2, 24], [40, 21, 33, 44, 46], [27, 32, 25, 4, 47]]
[[22, 41, 8, 18, 40], [33, 9, 32, 46, 42], [17, 49, 'FREE', 20, 35], [31, 25, 44, 47, 6], [45, 11, 28, 19, 21]]
[[20, 28, 35, 42, 14], [19, 11, 23, 8, 31], [9, 13, 'FREE', 6, 45], [17, 30, 32, 10, 7], [21, 41, 40, 49, 27]]
[[22, 45, 16, 1, 34], [15, 29, 46, 37, 4], [30, 39, 'FREE', 24, 19], [27, 20, 21, 14, 38], [2, 9, 48, 11, 35]]


## Determining Randomness

Determining that a card if xx% unique is difficult to predict, especially on a tight timetable. Generally speaking, there are three degrees of randomness, in order of severity:
* A value is on another card in the same exact spot
* A value is on another card in a similar row/column
* A value is on another card

Checking for all of these is also computationally expensive, especially at scale. Generally speaking, most people won't be generating 50,000 unique bingo cards on one go. For this initial test, we'll use a point system - 5 points when the card has a value that is at an identical spot on another card, and 1 when the number is anywhere else on that card.

To do this, we'll flatten each 2D array and compare each item to the corresponding list and verify, that for each card, the score is < 18 (ignoring the FREE space, of course). There's no logic to choosing this number, but score limits too low resulted in an endless running array. There's likely an optimal computation here (!n-items / 24 spaces / x number of cards, or similar).

In [4]:
def bingo_match_score_exceed(arr1, arr2, score_limit=18):
    score = 0
    for i in range(0, len(arr1)):
        if arr1[i] == 'FREE':
            pass
        elif arr1[i] == arr2[i]:
            score += 5
        elif arr1[i] in arr2:
            score += 1
        else:
            score += 0
        
        if score > score_limit:
            return True
    return False

def iteration_test(runs=20, score_limit=18):
    start_time = time.time()
    arrays = []
    itr = 0
    while(len(arrays) < runs):
        cur_list = table_gen(num_list)
        if len(arrays) == 0:
            arrays.append(cur_list)
        else:
            bool_sum = False
            for arr in arrays:
                bool_sum += bingo_match_score_exceed(np.ravel(cur_list), np.ravel(arr), score_limit)
            if not bool_sum:
                arrays.append(cur_list)
        itr += 1
    print("It took {} iterations and {} seconds to generate {} arrays under a score limit of {}"
          .format(itr, round(time.time() - start_time, 3), len(arrays), score_limit))

In [5]:
# Testing score limits
iteration_test(10, 20)
iteration_test(10, 19)
iteration_test(10, 18)
iteration_test(10, 17)
iteration_test(10, 16)
iteration_test(10, 15)
iteration_test(10, 14)
iteration_test(10, 13)
# Note how drastically iterations increase for a score of 13. Below this, we get into serious trouble.

It took 11 iterations and 0.048 seconds to generate 10 arrays under a score limit of 20
It took 14 iterations and 0.112 seconds to generate 10 arrays under a score limit of 19
It took 12 iterations and 0.057 seconds to generate 10 arrays under a score limit of 18
It took 19 iterations and 0.139 seconds to generate 10 arrays under a score limit of 17
It took 51 iterations and 1.083 seconds to generate 10 arrays under a score limit of 16
It took 42 iterations and 1.054 seconds to generate 10 arrays under a score limit of 15
It took 242 iterations and 3.176 seconds to generate 10 arrays under a score limit of 14
It took 192 iterations and 1.288 seconds to generate 10 arrays under a score limit of 13


In [6]:
# Trying the same thing with a greater number of cards.
iteration_test(50, 20)
iteration_test(50, 19)
iteration_test(50, 18)
# We can see here that 18 reaches that limit pretty quickly as well.

It took 208 iterations and 6.692 seconds to generate 50 arrays under a score limit of 20
It took 404 iterations and 13.009 seconds to generate 50 arrays under a score limit of 19
It took 989 iterations and 33.415 seconds to generate 50 arrays under a score limit of 18


## Pandas Image Output

Note that imgkit require an install of `wkhtmltopdf`. This is simple on Linux: `sudo apt install wkhtmltopdf`, and likely on Mac if you're using Homebrow, but I'm unsure how Windows will fair.

In [7]:
css = """
<style type="text/css">

table {
color: #333;
font-family: Helvetica, Arial, sans-serif;
border-collapse:
collapse; 
border-spacing: 0;
}

td, th {
border: 1px solid transparent; /* No more visible border */
height: 180px;
width: 240px;
text-align: center;
}

th {
background: #414141; /* Darken header a bit */
font-weight: bold;
font-size: 92px;
color: #DFDFDF;
}

td {
background: #FAFAFA;
font-size: 32px;
}

table tr:nth-child(odd) td:nth-child(even),
tr:nth-child(even) td:nth-child(odd) {
background-color: #DFDFDF;
}
</style>
"""

def bingo_table_to_image(table, css, dest_name="bingo.jpg", format="jpg"):
    try:
        os.remove("bingo.html")
    except:
        pass
    
    df = pd.DataFrame(table, columns=["B", "I", "N", "G", "O"])
    
    f = open("bingo.html", "a")
    f.write(css)
    f.write(df.to_html(index=False)) # Index = false will remove that annoying row header
    f.close()
    
    imgkit.from_file("bingo.html", dest_name, {"format": format})

In [8]:
bingo_table_to_image(table_gen(num_list), css)

Loading page (1/2)


## Extracting rows and columns

This is needed for the new table class. This is mostly just a sanity check before implementation.

In [18]:
tbl = table_gen(num_list)

def generate_win_states(tbl):
        win_states = []

        # get list of columns
        for x in range(5):
            win_states.append([tbl[x][y] for y in range(5)])

        # get list of columns
        for y in range(5):
            win_states.append([tbl[x][y] for x in range(5)])

        # get diagonals
        win_states.append([
                tbl[0][0],
                tbl[1][1],
                tbl[2][2],
                tbl[3][3],
                tbl[4][4]])

        win_states.append([
                tbl[0][4],
                tbl[1][3],
                tbl[2][2],
                tbl[3][1],
                tbl[4][0]])


        return win_states
    
win_states = generate_win_states(tbl)

print(len(win_states))

for state in win_states:
    print(state)

12
[14, 33, 3, 25, 32]
[8, 21, 36, 10, 11]
[7, 1, 'FREE', 24, 0]
[46, 41, 4, 34, 20]
[39, 26, 23, 12, 37]
[14, 8, 7, 46, 39]
[33, 21, 1, 41, 26]
[3, 36, 'FREE', 4, 23]
[25, 10, 24, 34, 12]
[32, 11, 0, 20, 37]
[14, 21, 'FREE', 34, 37]
[32, 10, 'FREE', 41, 39]


## List Comparison

I'd like to avoid sorting. Sorting 5 element array is fast, but there's a lot of arrays to sort.

In [23]:
l1 = [42, 32, 25, 16, 1]
l2 = [42, 31, 25, 16, 1]
l3 = [32, 16, 25, 42, 1]

# l2 is not quite equal to l1
# l3 has the same contents as l1, but in a different order

print(set(l1) - set(l2))
print(type(set(l1) - set(l2)))
print(set(l1) - set(l3))

print(set(l1) == set(l3))
print(set(l1) == set(l2))

{32}
<class 'set'>
set()
True
False
