# Advent of Code Parsers examples: 2021

This notebook will go through all of the inputs in the [Advent of Code 2021](https://adventofcode.com/2021/) problems and try to appy `aocp` in order to parse the inputs.

In [1]:
from pathlib import Path
from pprint import pprint
from dataclasses import dataclass
from collections import namedtuple

from aocp import (
    IntParser, 
    ListParser, 
    IntListParser, 
    TupleParser, 
    BoolParser, 
    SortTransform, 
    SetParser,
)

from aocd.models import Puzzle


def get_raw_data(day: int) -> str:
    puzzle = Puzzle(year=2021, day=day)
    return puzzle.input_data

def print_head(string: str, limit: int = 10):
    for line in string.splitlines()[:limit]:
        print(line)

def print_start(string: str, limit: int = 50):
    print(string[:limit])


## [Day 1](https://adventofcode.com/2021/day/1)

In [2]:
raw_data = get_raw_data(1)
print_head(raw_data)

141
140
160
161
162
172
178
185
184
186


The input is just a series of lines with integers, so we use a `ListParser` with a nested `int`.

Splitters are deduced, so no need to specify them!

In [3]:
parser = ListParser(int)
pprint(parser.parse(raw_data)[:10])

[141, 140, 160, 161, 162, 172, 178, 185, 184, 186]


Note that we could also use `IntListParser` if we wished, which is less flexible but faster.

In [4]:
parser = IntListParser()
pprint(parser.parse(raw_data)[:10])

[141, 140, 160, 161, 162, 172, 178, 185, 184, 186]


## [Day 2](https://adventofcode.com/2021/day/2)

In [5]:
raw_data = get_raw_data(2)
print_head(raw_data)

forward 3
down 9
forward 5
up 1
forward 2
down 1
down 7
down 5
up 6
forward 3


Here, we parse for a list of tuples with two elements, a list and an integer.

Splitters are deduced, so no need to specify them!

In [6]:
parser = ListParser(TupleParser((str, int)))
pprint(parser.parse(raw_data)[:10])

[('forward', 3),
 ('down', 9),
 ('forward', 5),
 ('up', 1),
 ('forward', 2),
 ('down', 1),
 ('down', 7),
 ('down', 5),
 ('up', 6),
 ('forward', 3)]


## [Day 3](https://adventofcode.com/2021/day/3)

In [7]:
raw_data = get_raw_data(3)
print_head(raw_data)

000000011010
011001111011
100101011101
000110000110
101010001010
010010000011
011001111001
100111000000
011101011010
000000110001


This example depends more on what you want to do with each line of input.

For instance, we can parse for a list of lists of integers.

In [8]:
parser = ListParser(ListParser(int))
pprint(parser.parse(raw_data)[:10])

[[0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0],
 [0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1],
 [1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1],
 [0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0],
 [1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0],
 [0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1],
 [0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1],
 [1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0],
 [0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0],
 [0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1]]


Alternatively, we could parse a list of binary integers (although this is not very practical for this particular problem)

In [9]:
parser = ListParser(IntParser(base=2))
pprint(parser.parse(raw_data)[:10])

[26, 1659, 2397, 390, 2698, 1155, 1657, 2496, 1882, 49]


We can also parse booleans instead of integers.

Note that for that we need to use `BoolParser`, since Pythons `bool` will not interpret the string input properly.

In [10]:
parser = ListParser(ListParser(BoolParser()))
pprint(parser.parse(raw_data)[:10], width=120)

[[False, False, False, False, False, False, False, True, True, False, True, False],
 [False, True, True, False, False, True, True, True, True, False, True, True],
 [True, False, False, True, False, True, False, True, True, True, False, True],
 [False, False, False, True, True, False, False, False, False, True, True, False],
 [True, False, True, False, True, False, False, False, True, False, True, False],
 [False, True, False, False, True, False, False, False, False, False, True, True],
 [False, True, True, False, False, True, True, True, True, False, False, True],
 [True, False, False, True, True, True, False, False, False, False, False, False],
 [False, True, True, True, False, True, False, True, True, False, True, False],
 [False, False, False, False, False, False, True, True, False, False, False, True]]


## [Day 4](https://adventofcode.com/2021/day/4)

In [11]:
raw_data = get_raw_data(4)
print_head(raw_data, 20)

46,79,77,45,57,34,44,13,32,88,86,82,91,97,89,1,48,31,18,10,55,74,24,11,80,78,28,37,47,17,21,61,26,85,99,96,23,70,3,54,5,41,50,63,14,64,42,36,95,52,76,68,29,9,98,35,84,83,71,49,73,58,56,66,92,30,51,20,81,69,65,15,6,16,39,43,67,7,59,40,60,4,90,72,22,0,93,94,38,53,87,27,12,2,25,19,8,62,33,75

84 94 24 52 44
96 33 74 35 13
60 51 41 19 95
50 93 27 40  1
67 23 37 88 85

12 85  6 97 77
79 28 24 70 51
71 72 78 55 73
11 36  5 98 19
30 67 89 95 62

54 38 70 29 51
16 19 80 96 63
76 23 10 30 24
45 81 97 82 90
60 94 28 11 83



This is a bit more interesting!

In this problem we have esentially two inputs:
 1. A sequence of comma-separated positive integers
 2. A series of positive integer matrices. 
  * The atrices are separated by two newlines
  * The lines in each matrix are separated by single newlines
  * The integers in each line are separated by spaces (one or more)

Thankfully, AOCP can handle this without much trouble, while deducing the separators (we could also specify them if we needed to).

 * First, we use a `TupleParser` to break the two inputs. It will deduce how to separate them if it can imply we are looking for a 2-tuple from the number of subparsers we then specify.
 * For the first part, we simply parse for a list of integers.
 * For the second part, we can use a list of matrices, which are a list of lists of integers.

In [12]:
parser = TupleParser(
    (
        IntListParser(),
        ListParser(ListParser(IntListParser())),
    )
)
sequence, matrices = parser.parse(raw_data)
pprint(sequence[:10])
pprint(matrices[:5])

[46, 79, 77, 45, 57, 34, 44, 13, 32, 88]
[[[84, 94, 24, 52, 44],
  [96, 33, 74, 35, 13],
  [60, 51, 41, 19, 95],
  [50, 93, 27, 40, 1],
  [67, 23, 37, 88, 85]],
 [[12, 85, 6, 97, 77],
  [79, 28, 24, 70, 51],
  [71, 72, 78, 55, 73],
  [11, 36, 5, 98, 19],
  [30, 67, 89, 95, 62]],
 [[54, 38, 70, 29, 51],
  [16, 19, 80, 96, 63],
  [76, 23, 10, 30, 24],
  [45, 81, 97, 82, 90],
  [60, 94, 28, 11, 83]],
 [[50, 56, 42, 68, 48],
  [6, 70, 78, 22, 27],
  [75, 11, 63, 24, 47],
  [29, 99, 91, 73, 97],
  [7, 16, 28, 12, 44]],
 [[20, 62, 50, 36, 12],
  [3, 10, 40, 8, 56],
  [78, 61, 66, 37, 89],
  [72, 26, 19, 65, 22],
  [30, 91, 27, 5, 63]]]


## [Day 5](https://adventofcode.com/2021/day/5)

In [13]:
raw_data = get_raw_data(5)
print_head(raw_data)

60,28 -> 893,861
934,945 -> 222,233
125,246 -> 125,306
490,255 -> 490,847
457,868 -> 364,961
610,46 -> 610,826
338,711 -> 982,67
199,581 -> 295,581
578,489 -> 522,545
180,516 -> 180,904


This example depends on what you want to do with each line of input, so we will use it to explore a few different possibilities.

The quickest way is to just use `IntListParser`.

In [14]:
parser = ListParser(IntListParser())
pprint(parser.parse(raw_data)[:10])

[[60, 28, 893, 861],
 [934, 945, 222, 233],
 [125, 246, 125, 306],
 [490, 255, 490, 847],
 [457, 868, 364, 961],
 [610, 46, 610, 826],
 [338, 711, 982, 67],
 [199, 581, 295, 581],
 [578, 489, 522, 545],
 [180, 516, 180, 904]]


There are other ways to do this problem. For instance, we can get a list of 2-tuples with 2-tuples inside.

In [15]:
parser = ListParser(TupleParser(TupleParser(int)))
pprint(parser.parse(raw_data)[:10])

[((60, 28), (893, 861)),
 ((934, 945), (222, 233)),
 ((125, 246), (125, 306)),
 ((490, 255), (490, 847)),
 ((457, 868), (364, 961)),
 ((610, 46), (610, 826)),
 ((338, 711), (982, 67)),
 ((199, 581), (295, 581)),
 ((578, 489), (522, 545)),
 ((180, 516), (180, 904))]


However, you might find more practical to get a 4-tuple for each line. In that case we need to specify that we want to use both `","` and `"->"` as splitters.

In [16]:
parser = ListParser(TupleParser(int, splitter=[",", "->"]))
pprint(parser.parse(raw_data)[:10])

[(60, 28, 893, 861),
 (934, 945, 222, 233),
 (125, 246, 125, 306),
 (490, 255, 490, 847),
 (457, 868, 364, 961),
 (610, 46, 610, 826),
 (338, 711, 982, 67),
 (199, 581, 295, 581),
 (578, 489, 522, 545),
 (180, 516, 180, 904)]


If we want to spend the time, AOCP let's you go even fancier and use dataclasses!

In [17]:
@dataclass
class Point:
    x: int
    y: int


@dataclass
class Vent:
    start: Point
    end: Point

parser = ListParser(TupleParser(TupleParser(int, dataclass=Point), dataclass=Vent))

pprint(parser.parse(raw_data)[:10])

[Vent(start=Point(x=60, y=28), end=Point(x=893, y=861)),
 Vent(start=Point(x=934, y=945), end=Point(x=222, y=233)),
 Vent(start=Point(x=125, y=246), end=Point(x=125, y=306)),
 Vent(start=Point(x=490, y=255), end=Point(x=490, y=847)),
 Vent(start=Point(x=457, y=868), end=Point(x=364, y=961)),
 Vent(start=Point(x=610, y=46), end=Point(x=610, y=826)),
 Vent(start=Point(x=338, y=711), end=Point(x=982, y=67)),
 Vent(start=Point(x=199, y=581), end=Point(x=295, y=581)),
 Vent(start=Point(x=578, y=489), end=Point(x=522, y=545)),
 Vent(start=Point(x=180, y=516), end=Point(x=180, y=904))]


It also works with namedtuples, which you might want to use for brevity in AoC solutions:

In [18]:
point = namedtuple("Point", "x y")
vent = namedtuple("Vent", "start end")

parser = ListParser(TupleParser(TupleParser(int, dataclass=point), dataclass=vent))

pprint(parser.parse(raw_data)[:10])

[Vent(start=Point(x=60, y=28), end=Point(x=893, y=861)),
 Vent(start=Point(x=934, y=945), end=Point(x=222, y=233)),
 Vent(start=Point(x=125, y=246), end=Point(x=125, y=306)),
 Vent(start=Point(x=490, y=255), end=Point(x=490, y=847)),
 Vent(start=Point(x=457, y=868), end=Point(x=364, y=961)),
 Vent(start=Point(x=610, y=46), end=Point(x=610, y=826)),
 Vent(start=Point(x=338, y=711), end=Point(x=982, y=67)),
 Vent(start=Point(x=199, y=581), end=Point(x=295, y=581)),
 Vent(start=Point(x=578, y=489), end=Point(x=522, y=545)),
 Vent(start=Point(x=180, y=516), end=Point(x=180, y=904))]


## [Day 6](https://adventofcode.com/2021/day/6)

In [19]:
raw_data = get_raw_data(6)
print(raw_data)

1,3,1,5,5,1,1,1,5,1,1,1,3,1,1,4,3,1,1,2,2,4,2,1,3,3,2,4,4,4,1,3,1,1,4,3,1,5,5,1,1,3,4,2,1,5,3,4,5,5,2,5,5,1,5,5,2,1,5,1,1,2,1,1,1,4,4,1,3,3,1,5,4,4,3,4,3,3,1,1,3,4,1,5,5,2,5,2,2,4,1,2,5,2,1,2,5,4,1,1,1,1,1,4,1,1,3,1,5,2,5,1,3,1,5,3,3,2,2,1,5,1,1,1,2,1,1,2,1,1,2,1,5,3,5,2,5,2,2,2,1,1,1,5,5,2,2,1,1,3,4,1,1,3,1,3,5,1,4,1,4,1,3,1,4,1,1,1,1,2,1,4,5,4,5,5,2,1,3,1,4,2,5,1,1,3,5,2,1,2,2,5,1,2,2,4,5,2,1,1,1,1,2,2,3,1,5,5,5,3,2,4,2,4,1,5,3,1,4,4,2,4,2,2,4,4,4,4,1,3,4,3,2,1,3,5,3,1,5,5,4,1,5,1,2,4,2,5,4,1,3,3,1,4,1,3,3,3,1,3,1,1,1,1,4,1,2,3,1,3,3,5,2,3,1,1,1,5,5,4,1,2,3,1,3,1,1,4,1,3,2,2,1,1,1,3,4,3,1,3


Not much to say here, here the parsing was trivial in any case : )

In [20]:
parser = IntListParser()
pprint(parser.parse(raw_data)[:10])

[1, 3, 1, 5, 5, 1, 1, 1, 5, 1]


It also works with:

In [21]:
parser = ListParser(int)
pprint(parser.parse(raw_data)[:10])

[1, 3, 1, 5, 5, 1, 1, 1, 5, 1]


## [Day 7](https://adventofcode.com/2021/day/7)

In [22]:
raw_data = get_raw_data(7)
print_start(raw_data)

1101,1,29,67,1102,0,1,65,1008,65,35,66,1005,66,28,


Another trivial parsing.

In [23]:
parser = IntListParser()
pprint(parser.parse(raw_data)[:10])

[1101, 1, 29, 67, 1102, 0, 1, 65, 1008, 65]


## [Day 8](https://adventofcode.com/2021/day/8)

In [24]:
raw_data = get_raw_data(8)
print_head(raw_data)

ecdbfag deacfb acdgb cdg acdbf gdfb efacdg gd cagdbf beacg | cdg dcebgaf gbdf bdacg
fadecg gdbecaf agbfd fgdcb gab ebagdf feabcg deab gdefa ab | adfbg ab fcgdbae bfgecda
cgebad edfagcb fg fedg ebfca gcefb fcedgb dbagcf cgf cdbeg | cfg acfbe bcgdafe dgeafcb
bgcde cbefg gd dbeafc afbcgde bedgca gacd dbg cedba fbegda | agfcebd adgfbce dgb fgceb
gadcbe gcade debfac fdagce egdbf cfedg fbgcade gafc dcf fc | fcga eacfdg gfca fdcbea
dbgcae gdeaf cefga cfa dcbgfa cgfabe cefb cf ebfgcda acgbe | gebacdf fcgdeab bacdge cfeb
fgde efc dacgf cbdgfa fe abdcfe afdbecg gaefdc gcfae abceg | ef agdcbf bfdagec efdg
egfacd bfcdeg ac facbg acbe cfa fbgace fgecb gfdba fbdcgae | ac ca ca fgeacb
acefg dae dfbec abfcedg cfdea dgeafc ad dfag eacgbd bcagfe | dfag gfad dgfa beagfcd
gae cabfgd fbcag ecbgad gfec ge agbfe gdefbca bfeda cfbgea | geabf gecf efgc gafbc


Again, this depents on how you want your formatted input to be.

For a basic case, we might go for a 2-tuple of lists of strings for each line.

In [25]:
parser = ListParser(TupleParser(ListParser()))
pprint(parser.parse(raw_data)[:10], width=120)

[(['ecdbfag', 'deacfb', 'acdgb', 'cdg', 'acdbf', 'gdfb', 'efacdg', 'gd', 'cagdbf', 'beacg'],
  ['cdg', 'dcebgaf', 'gbdf', 'bdacg']),
 (['fadecg', 'gdbecaf', 'agbfd', 'fgdcb', 'gab', 'ebagdf', 'feabcg', 'deab', 'gdefa', 'ab'],
  ['adfbg', 'ab', 'fcgdbae', 'bfgecda']),
 (['cgebad', 'edfagcb', 'fg', 'fedg', 'ebfca', 'gcefb', 'fcedgb', 'dbagcf', 'cgf', 'cdbeg'],
  ['cfg', 'acfbe', 'bcgdafe', 'dgeafcb']),
 (['bgcde', 'cbefg', 'gd', 'dbeafc', 'afbcgde', 'bedgca', 'gacd', 'dbg', 'cedba', 'fbegda'],
  ['agfcebd', 'adgfbce', 'dgb', 'fgceb']),
 (['gadcbe', 'gcade', 'debfac', 'fdagce', 'egdbf', 'cfedg', 'fbgcade', 'gafc', 'dcf', 'fc'],
  ['fcga', 'eacfdg', 'gfca', 'fdcbea']),
 (['dbgcae', 'gdeaf', 'cefga', 'cfa', 'dcbgfa', 'cgfabe', 'cefb', 'cf', 'ebfgcda', 'acgbe'],
  ['gebacdf', 'fcgdeab', 'bacdge', 'cfeb']),
 (['fgde', 'efc', 'dacgf', 'cbdgfa', 'fe', 'abdcfe', 'afdbecg', 'gaefdc', 'gcfae', 'abceg'],
  ['ef', 'agdcbf', 'bfdagec', 'efdg']),
 (['egfacd', 'bfcdeg', 'ac', 'facbg', 'acbe', 'cfa', 'f

We might also want to sort the strings in each list alphabetically, which is useful for this problem. For that, we can use the `SortTransform` applied to the final elements.

In [26]:
parser = ListParser(TupleParser(ListParser(SortTransform())))
pprint(parser.parse(raw_data)[:10], width=120)

[(['abcdefg', 'abcdef', 'abcdg', 'cdg', 'abcdf', 'bdfg', 'acdefg', 'dg', 'abcdfg', 'abceg'],
  ['cdg', 'abcdefg', 'bdfg', 'abcdg']),
 (['acdefg', 'abcdefg', 'abdfg', 'bcdfg', 'abg', 'abdefg', 'abcefg', 'abde', 'adefg', 'ab'],
  ['abdfg', 'ab', 'abcdefg', 'abcdefg']),
 (['abcdeg', 'abcdefg', 'fg', 'defg', 'abcef', 'bcefg', 'bcdefg', 'abcdfg', 'cfg', 'bcdeg'],
  ['cfg', 'abcef', 'abcdefg', 'abcdefg']),
 (['bcdeg', 'bcefg', 'dg', 'abcdef', 'abcdefg', 'abcdeg', 'acdg', 'bdg', 'abcde', 'abdefg'],
  ['abcdefg', 'abcdefg', 'bdg', 'bcefg']),
 (['abcdeg', 'acdeg', 'abcdef', 'acdefg', 'bdefg', 'cdefg', 'abcdefg', 'acfg', 'cdf', 'cf'],
  ['acfg', 'acdefg', 'acfg', 'abcdef']),
 (['abcdeg', 'adefg', 'acefg', 'acf', 'abcdfg', 'abcefg', 'bcef', 'cf', 'abcdefg', 'abceg'],
  ['abcdefg', 'abcdefg', 'abcdeg', 'bcef']),
 (['defg', 'cef', 'acdfg', 'abcdfg', 'ef', 'abcdef', 'abcdefg', 'acdefg', 'acefg', 'abceg'],
  ['ef', 'abcdfg', 'abcdefg', 'defg']),
 (['acdefg', 'bcdefg', 'ac', 'abcfg', 'abce', 'acf', 'a

Alternatively, we could treat the strings as sets of characters, which is also useful for the problem.

In [27]:
parser = ListParser(TupleParser(ListParser(SetParser())))
pprint(parser.parse(raw_data)[:2], width=120)

[([{'b', 'd', 'a', 'e', 'f', 'c', 'g'},
   {'b', 'd', 'a', 'e', 'f', 'c'},
   {'b', 'd', 'a', 'c', 'g'},
   {'d', 'g', 'c'},
   {'b', 'd', 'a', 'f', 'c'},
   {'b', 'd', 'f', 'g'},
   {'d', 'a', 'e', 'f', 'c', 'g'},
   {'d', 'g'},
   {'b', 'd', 'a', 'f', 'c', 'g'},
   {'b', 'a', 'e', 'c', 'g'}],
  [{'d', 'g', 'c'}, {'b', 'd', 'a', 'e', 'f', 'c', 'g'}, {'b', 'd', 'f', 'g'}, {'b', 'd', 'a', 'c', 'g'}]),
 ([{'d', 'a', 'e', 'f', 'c', 'g'},
   {'b', 'd', 'a', 'e', 'f', 'c', 'g'},
   {'b', 'd', 'a', 'f', 'g'},
   {'b', 'd', 'f', 'c', 'g'},
   {'b', 'a', 'g'},
   {'b', 'd', 'a', 'e', 'f', 'g'},
   {'b', 'a', 'e', 'f', 'c', 'g'},
   {'b', 'd', 'a', 'e'},
   {'d', 'a', 'e', 'f', 'g'},
   {'b', 'a'}],
  [{'b', 'd', 'a', 'f', 'g'}, {'b', 'a'}, {'b', 'd', 'a', 'e', 'f', 'c', 'g'}, {'b', 'd', 'a', 'e', 'f', 'c', 'g'}])]
