# Day 21 - set operations

* https://adventofcode.com/2020/day/21

We are asked to figure out what ingredients are allergens. We can solve this using [set operations](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset); for any allergen name, intersect all the sets of ingredients for the foods where the allergen appears.

We also need to count the ingredients that are not allergens. A `Counter` can take care of that.

In [1]:
from collections import Counter
from dataclasses import dataclass
from typing import Set

@dataclass
class Food:
    ingredients: Set
    allergens: Set
    
    def __str__(self):
        return (
            f"{' '.join(sorted(self.ingredients))} "
            f"(contains {', '.join(self.allergens)})"
        )
    
    @classmethod
    def from_line(cls, line):
        ingr, _, allergens = line.partition(" (contains ")
        return cls(frozenset(ingr.split()), frozenset(allergens.rstrip(")").split(", ")))


def identify_allergens(foods):
    candidates = {}
    ingredients = Counter()
    for f in foods:
        ingredients.update(f.ingredients)
        for allergen in f.allergens:
            candidates.setdefault(allergen, f.ingredients)
            candidates[allergen] &= f.ingredients
    
    allergens = {}
    while candidates:
        allergen, (ingr,) = min(candidates.items(), key=lambda ac: len(ac[1]))
        allergens[allergen] = ingr
        del ingredients[ingr]
        del candidates[allergen]
        candidates = {a: ic - {ingr} for a, ic in candidates.items()}
    return allergens, sum(ingredients.values())


test_foods = [Food.from_line(line) for line in """\
mxmxvkd kfcds sqjhc nhms (contains dairy, fish)
trh fvjkl sbzzf mxmxvkd (contains dairy)
sqjhc fvjkl (contains soy)
sqjhc mxmxvkd sbzzf (contains fish)
""".splitlines()]

assert identify_allergens(test_foods)[1] == 5

In [2]:
import aocd
foods = [Food.from_line(l) for l in aocd.get_data(day=21, year=2020).splitlines()]

In [3]:
allergens, ingr_count = identify_allergens(foods)
print("Part 1:", ingr_count)

Part 1: 2324


## Part 2

I inadvertently did most of the work for part 2 in part 1 already: sorting out what allergen goes with what ingredient, simply using the same techniques as we used for the train ticket data on [day 16](./Day%2016.ipynb).

In [4]:
canonical = [allergens[n] for n in sorted(allergens)]
print("Part 2:", ",".join(canonical))

Part 2: bxjvzk,hqgqj,sp,spl,hsksz,qzzzf,fmpgn,tpnnkc
