In [23]:
%load_ext autoreload
%autoreload 2
from typing import *

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [2]:
from utils import load_input

# Part 1

As you're walking to yet another connecting flight, you realize that one of the legs of your re-routed trip coming up is on a high-speed train. However, the train ticket you were given is in a language you don't understand. You should probably figure out what it says before you get to the train station after the next flight.

Unfortunately, you can't actually read the words on the ticket. You can, however, read the numbers, and so you figure out the fields these tickets must have and the valid ranges for values in those fields.

You collect the rules for ticket fields, the numbers on your ticket, and the numbers on other nearby tickets for the same train service (via the airport security cameras) together into a single document you can reference (your puzzle input).

The rules for ticket fields specify a list of fields that exist somewhere on the ticket and the valid ranges of values for each field. For example, a rule like class: 1-3 or 5-7 means that one of the fields in every ticket is named class and can be any value in the ranges 1-3 or 5-7 (inclusive, such that 3 and 5 are both valid in this field, but 4 is not).

Each ticket is represented by a single line of comma-separated values. The values are the numbers on the ticket in the order they appear; every ticket has the same format. For example, consider this ticket:
```
.--------------------------------------------------------.
| ????: 101    ?????: 102   ??????????: 103     ???: 104 |
|                                                        |
| ??: 301  ??: 302             ???????: 303      ??????? |
| ??: 401  ??: 402           ???? ????: 403    ????????? |
'--------------------------------------------------------'
```
Here, ? represents text in a language you don't understand. This ticket might be represented as 101,102,103,104,301,302,303,401,402,403; of course, the actual train tickets you're looking at are much more complicated. In any case, you've extracted just the numbers in such a way that the first number is always the same specific field, the second number is always a different specific field, and so on - you just don't know what each position actually means!

Start by determining which tickets are completely invalid; these are tickets that contain values which aren't valid for any field. Ignore your ticket for now.

For example, suppose you have the following notes:
```
class: 1-3 or 5-7
row: 6-11 or 33-44
seat: 13-40 or 45-50

your ticket:
7,1,14

nearby tickets:
7,3,47
40,4,50
55,2,20
38,6,12
```
It doesn't matter which position corresponds to which field; you can identify invalid nearby tickets by considering only whether tickets contain values that are not valid for any field. In this example, the values on the first nearby ticket are all valid for at least one field. This is not true of the other three nearby tickets: the values 4, 55, and 12 are are not valid for any field. Adding together all of the invalid values produces your ticket scanning error rate: 4 + 55 + 12 = 71.

Consider the validity of the nearby tickets you scanned. What is your ticket scanning error rate?



In [3]:
raw_data = load_input(16, splitlines=False)

In [6]:
raw_rules, raw_my_ticket, raw_nearby_tickets = raw_data.split("\n\n")

In [5]:
raw_rules

'departure location: 30-260 or 284-950\ndeparture station: 29-856 or 863-974\ndeparture platform: 32-600 or 611-967\ndeparture track: 44-452 or 473-965\ndeparture date: 36-115 or 129-950\ndeparture time: 50-766 or 776-972\narrival location: 40-90 or 104-961\narrival station: 40-864 or 887-971\narrival platform: 32-920 or 932-964\narrival track: 45-416 or 427-959\nclass: 47-536 or 557-964\nduration: 33-229 or 246-969\nprice: 25-147 or 172-969\nroute: 32-328 or 349-970\nrow: 50-692 or 709-964\nseat: 49-292 or 307-964\ntrain: 28-726 or 748-954\ntype: 37-430 or 438-950\nwagon: 46-628 or 638-973\nzone: 39-786 or 807-969'

In [33]:
import re 

def parse_rules(raw_rules) -> Dict[str, Set[int]]:
    out = {}
    for rule in raw_rules.splitlines():
        label, ranges = rule.split(": ")
        bounds = [int(i) for i in re.findall(r'\d+', ranges)]
        out[label] = set(range(bounds[0], bounds[1]+1)) | set(range(bounds[2], bounds[3]+1))
    return out

rules = parse_rules(raw_rules)

In [13]:
my_ticket = [int(i) for i in raw_my_ticket.splitlines()[1].split(",")]

In [22]:
nearby_tickets = [[int(i) for i in l.split(",")] for l in raw_nearby_tickets.splitlines()[1:]]

In [35]:
valid_numbers = set().union(*rules.values())

In [97]:
sum(n for ticket in nearby_tickets for n in ticket if n not in valid_numbers)

32835

# Part 2

Now that you've identified which tickets contain invalid values, discard those tickets entirely. Use the remaining valid tickets to determine which field is which.

Using the valid ranges for each field, determine what order the fields appear on the tickets. The order is consistent between all tickets: if seat is the third field, it is the third field on every ticket, including your ticket.

For example, suppose you have the following notes:
```
class: 0-1 or 4-19
row: 0-5 or 8-19
seat: 0-13 or 16-19

your ticket:

11,12,13

nearby tickets:
3,9,18
15,1,5
5,14,9
```

Based on the nearby tickets in the above example, the first position must be row, the second position must be class, and the third position must be seat; you can conclude that in your ticket, class is 12, row is 11, and seat is 13.

Once you work out which field is which, look for the six fields on your ticket that start with the word departure. What do you get if you multiply those six values together?



In [98]:
valid_tickets = list(filter(lambda t: all(n in valid_numbers for n in t), nearby_tickets))

In [99]:
len(valid_tickets)

190

In [140]:
# glad the interview prep was useful
ticket_cols = list(zip(*reversed(valid_tickets)))

In [141]:
from collections import defaultdict
label_to_possible_idx = defaultdict(set)
for label, rule in rules.items():
    for idx, col in enumerate(ticket_cols):
        if all(v in rule for v in col):
            label_to_possible_idx[label].add(idx)


In [142]:
len(set([len(v) for v in label_to_possible_idx.values()]))

20

In [143]:
label_to_idx = {}
looking_for_len = 1
while True:
    if len(label_to_possible_idx) == 0:
        break
    #terminate when all emtpy
    len_to_label = {len(v): k for k,v in label_to_possible_idx.items()}
    remove_label = len_to_label[1]
    idx = label_to_possible_idx.pop(remove_label)
    label_to_idx[remove_label] = list(idx)[0]
    for k in label_to_possible_idx.keys():
        label_to_possible_idx[k] = label_to_possible_idx[k].difference(idx)

In [144]:
label_to_idx

{'price': 19,
 'zone': 8,
 'wagon': 0,
 'arrival platform': 5,
 'row': 17,
 'arrival station': 18,
 'departure location': 15,
 'departure date': 14,
 'departure station': 2,
 'departure track': 13,
 'departure time': 1,
 'departure platform': 6,
 'type': 16,
 'route': 3,
 'seat': 4,
 'duration': 10,
 'train': 11,
 'arrival track': 7,
 'arrival location': 9,
 'class': 12}

In [146]:
out = 1
for k, idx in label_to_idx.items():
    if "departure" in k:
        out *= my_ticket[idx]
out


514662805187