In [1]:
from collections import defaultdict
import operator

In [2]:
def get_puzzle_input(file):
    with open(file) as fh:
        lines = fh.readlines()
    return [l.strip() for l in lines if l.strip()]

### Part One

For example, if you see the following box IDs:

```
abcdef contains no letters that appear exactly two or three times.
bababc contains two a and three b, so it counts for both.
abbcde contains two b, but no letter appears exactly three times.
abcccd contains three c, but no letter appears exactly two times.
aabcdd contains two a and two d, but it only counts once.
abcdee contains two e.
ababab contains three a and three b, but it only counts once.
```

Of these box IDs, four of them contain a letter which appears exactly twice, and three of them contain a letter which appears exactly three times. Multiplying these together produces a checksum of 4 * 3 = 12.

In [3]:
def repeated_letters(box_id, repeats=(2, 3)):
    letters_seen = defaultdict(lambda *_: 0)
    
    for letter in box_id:
        letters_seen[letter] += 1
 
    return tuple(
        int(r in letters_seen.values())
        for r in repeats
    )

In [4]:
assert repeated_letters('abcdef') == (0, 0) 
assert repeated_letters('bababc') == (1, 1) 
assert repeated_letters('abbcde') == (1, 0) 
assert repeated_letters('abcccd') == (0, 1) 
assert repeated_letters('aabcdd') == (1, 0) 
assert repeated_letters('abcdee') == (1, 0)  

In [5]:
def checksum_boxes(box_ids):
    total_2s = 0
    total_3s = 0
    for box_id in box_ids:
        num_2s, num_3s = repeated_letters(box_id)
        total_2s += num_2s
        total_3s += num_3s
    
    return total_2s * total_3s   

In [6]:
assert checksum_boxes(('abcdef', 'bababc', 'abbcde', 'abcccd', 'aabcdd', 'abcdee', 'ababab')) == 12

In [7]:
puzzle_input = get_puzzle_input('/tmp/input.txt')

In [8]:
checksum_boxes(puzzle_input)

6642

### Part Two

Confident that your list of box IDs is complete, you're ready to find the boxes full of prototype fabric.

The boxes will have IDs which differ by exactly one character at the same position in both strings. For example, given the following box IDs:

```
abcde
fghij
klmno
pqrst
fguij
axcye
wvxyz
```

The IDs abcde and axcye are close, but they differ by two characters (the second and fourth). However, the IDs fghij and fguij differ by exactly one character, the third (h and u). Those must be the correct boxes.

What letters are common between the two correct box IDs? (In the example above, this is found by removing the differing character from either ID, producing fgij.)

In [9]:
len(puzzle_input)

250

Only 124 * 250 comparisons comparing every pair of elements so no need for a heuristic

In [10]:
def difference(left, right):
    return sum(1 for l, r in zip(left, right) if l != r)

In [11]:
assert difference('abc', 'abd') == 1
assert difference('abc', 'abc') == 0

In [12]:
def find_partial_match(boxes):
    for num in range(len(boxes)):
        test_box = boxes[num]
        for other in boxes[num + 1:]:
            if difference(test_box, other) == 1:
                return test_box, other      

In [13]:
find_partial_match(puzzle_input)

('cvfqlbidheyujgtrswxmckqnap', 'cvzqlbidheyujgtrswxmckqnap')