# Exponential growth in polymers

If we were to do as the puzzle suggests and actually build the string up, we'd run out of memory quite fast; for a starting template of length $k$, there are $k - 1$ pairs and every round doubles the pairs; the output length after $n$ steps is $((k - 1) 2^n) + 1$. After 10 steps, you'd need about $k$ kilobyte, after 20 steps $k$ megabyte, and after 30 steps $k$ gigabytes. Given that the puzzle input gives us a template of 20 characters that would _just_ fit into my laptop memory, but not comfortably!

So we need an alternative approach. The order of the polymer pairs doesn't actually matter here, so we can just _count pairs_, and map each pair to two new pairs. For example, the first pair in the template, `NN`, maps to two new pairs, `NC` and `CN`. For every *pair -> inserted element* we can generate a *pair -> (pair1, pair2)* mapping instead, and then count how many pairs there are at each step.

E.g. the input template `NNCB` should really be interpreted as `NN`, `NC`, and `CB`, which then maps to `NC`, `CN`, `NB`, `BC`, `CH` and `HB`, all unique pairs still. The next step then gives us:

 | elem | count |
 | ---- | ----: |
 | `NB` | 2 |
 | `BC` | 2 |
 | `CC` | 1 |
 | `CN` | 1 |
 | `NB` | 1 |
 | `BB` | 2 |
 | `CB` | 2 |
 | `BH` | 1 |
 | `HC` | 1 |
 
 We only need to keep up to 16 counts this way (though counts may differ for the puzzle input).

The final step is to turn the pairs into counts for individual elements, but we have to _half_ these counts as each pair we count is actually overlapping with two other pairs, with the exception of the start and end element of the original template; e.g. for the test input template `NNCB`, `N` and `B` start and end the template, and those two elements stay constant throughout the growing polymer. If you count the elements from the pair counts, after step one we have:

| elem | count |
| ---- | ----: |
| `B` | 3 |
| `C` | 4 |
| `H` | 2 |
| `N` | 3 |

You can find the actual counts by subtracting 1 for the `N` and `B`, and then halving the counts, then adding back 1 for the ends; so the actual counts are `B` x 2, `C` x 2, `H` x 1 and `N` x 2.

In [1]:
from __future__ import annotations
from collections import Counter
from dataclasses import dataclass, replace
from functools import cached_property, reduce
from itertools import chain, islice
from typing import Iterator, NamedTuple


Rules = dict[str, tuple[str, str]]


class Extremes(NamedTuple):
    min: int
    max: int


@dataclass(frozen=True)
class Polymerization:
    chain: Counter[str]
    start: str
    end: str
    rules: Rules

    @classmethod
    def from_instructions(cls, instructions: str) -> Polymerization:
        templ, rulelines = instructions.split("\n\n")
        template = Counter(f"{l1}{l2}" for l1, l2 in zip(templ, templ[1:]))
        rule_pairs = (line.split(" -> ") for line in rulelines.splitlines())
        rules = {
            pair: (f"{pair[0]}{target}", f"{target}{pair[1]}")
            for pair, target in rule_pairs
        }
        return cls(template, templ[0], templ[-1], rules)

    def __len__(self) -> int:
        return self.chain.total() + 1

    def __iter__(self) -> Iterator[Polymerization]:
        step, rules = self, self.rules
        while True:
            newchain = Counter()
            for pair, count in step.chain.items():
                for newpair in rules[pair]:
                    newchain[newpair] += count
            yield (step := replace(step, chain=newchain))

    @cached_property
    def extremes(self) -> Extremes:
        elems = Counter()
        for pair, count in self.chain.items():
            for elem in pair:
                elems[elem] += count
        elems.subtract((self.start, self.end))
        counts = [v // 2 + (k in {self.start, self.end}) for k, v in elems.items()]
        return Extremes(min(counts), max(counts))


test_instructions = """\
NNCB

CH -> B
HH -> N
CB -> H
NH -> C
HB -> C
HC -> B
HN -> C
NN -> C
BH -> H
NC -> B
NB -> B
BN -> B
BB -> N
BC -> B
CC -> N
CN -> C
"""

test_reaction = Polymerization.from_instructions(test_instructions)
test_step10 = next(islice(test_reaction, 9, None))
assert test_step10.extremes == (161, 1749)


In [2]:
import aocd
reaction = Polymerization.from_instructions(aocd.get_data(day=14, year=2021))
step10 = next(islice(reaction, 9, None))
print("Part 1:", step10.extremes.max - step10.extremes.min)

Part 1: 3259


# Part 2: run it until you would run out of memory

As I suspected, part 2 is to run the reaction 40 times, which would require about 20GB for the puzzle input ($(k - 1) 2^40 + 1 = 20890720927745$ elements). Good thing we are only tracking pair counts!

In [3]:
test_step40 = next(islice(test_step10, 29, None))
assert test_step40.extremes == (3849876073, 2192039569602)

In [4]:
step40 = next(islice(step10, 29, None))
print("Part 2:", step40.extremes.max - step40.extremes.min)

Part 2: 3459174981021
