## Advent of Code - Day 2

In [1]:
import tempfile
from contextlib import contextmanager

In [2]:
@contextmanager
def test_file(test_input):
    with tempfile.NamedTemporaryFile('r+') as f:
        f.write(test_input)
        f.seek(0)
        yield f

### Part 1

In [3]:
from collections import Counter
from functools import reduce
from operator import mul

In [4]:
def get_box_ids(path):
    """Yields each box id from the file at path."""
    with open(path, 'r') as id_file:
        for id_ in id_file:
            yield id_.strip()
            
def letter_counts(box_id):
    """Returns a set of letter counts for `box_id`.
    
    Example:
        >>> a, b, c appears once, twice, and three times respectively
        >>> counts('bababc')
        {1, 2, 3}
        >>> a and b both appear three times
        >>> counts('ababab')
        {3}
    """
    return set(Counter(Counter(box_id).values()))

def repeat_counts(path):
    """Returns sum of `letter_counts` for each box id in path."""
    counts = Counter()
    for box_id in get_box_ids(path):
        counts.update(letter_counts(box_id))
    return counts

def checksum(path):
    """Returns the checksum for the file of box ids at path."""
    counts = repeat_counts(path)
    return reduce(mul, [n for c, n in counts.items() if c > 1], 1)

In [5]:
# test to see that it's working on example

part1_test_input = """abcdef
bababc
abbcde
abcccd
aabcdd
abcdee
ababab
"""

with test_file(part1_test_input) as f:
    print(checksum(f.name))

12


In [6]:
!ls

day_2.ipynb  input  requirements.txt


In [7]:
checksum('input')

6474

### Part 2

The idea is to iterate over each of the `n` box ids of length `m`. For each box id we replace, in turn, the character at position `[1, 2, ..., m]` with a `?`. For each of these `m` strings containing a `?` we perform a lookup in a trie - if present we know that we have previously seen a box id that differs by at most one character at position `i`. If not present we insert the string into the trie and continue. The question only asks for the differing characters so, if present, we just remove the `?` and return the string.

There are `n` box ids and we perform at most `m` lookups and `m` entries for each id. Hence the complexity should be $O(nm^2)$. The space usage is $O(nm^2)$ as we store at most `m` entries with length `m` for each of the `n` strings which, at worst, could be unique. `m` is typically some small constant (e.g. 26 for my given input).

In [8]:
import pygtrie

In [9]:
def all_single_unknown(box_id, unknown='?'):
    """Yields all variations of `box_id` with one char replaced with `unknown`.
    
    Example:
        >>> list(all_single_unknown('abcde'))
        ['?bcde', 'a?cde', 'ab?de', 'abc?e', 'abcd?']
    """
    for i in range(0, len(box_id)):
        yield box_id[:i] + unknown + box_id[i + 1:]

In [10]:
def find_correct_ids(path):
    """Returns the common letters from box ids with edit distance 1 at path."""
    trie = pygtrie.StringTrie()
    for box_id in get_box_ids(path):
        for unk_id in all_single_unknown(box_id):
            if unk_id in trie:
                return unk_id.replace('?', '')
            trie[unk_id] = None

In [11]:
# test to see that it's working on example

part2_test_input = """abcde
fghij
klmno
pqrst
fguij
axcye
wvxyz
"""

with test_file(part2_test_input) as f:
    print(find_correct_ids(f.name))

fgij


In [12]:
!ls

day_2.ipynb  input  requirements.txt


In [13]:
print(find_correct_ids('input'))

mxhwoglxgeauywfkztndcvjqr
