# Day 16

As you're walking to yet another connecting flight, you realize that one of the legs of your re-routed trip coming up is on a high-speed train. However, the train ticket you were given is in a language you don't understand. You should probably figure out what it says before you get to the train station after the next flight.

Unfortunately, you can't actually read the words on the ticket. You can, however, read the numbers, and so you figure out the fields these tickets must have and the valid ranges for values in those fields.

You collect the rules for ticket fields, the numbers on your ticket, and the numbers on other nearby tickets for the same train service (via the airport security cameras) together into a single document you can reference (your puzzle input).

The rules for ticket fields specify a list of fields that exist somewhere on the ticket and the valid ranges of values for each field. For example, a rule like class: 1-3 or 5-7 means that one of the fields in every ticket is named class and can be any value in the ranges 1-3 or 5-7 (inclusive, such that 3 and 5 are both valid in this field, but 4 is not).

Each ticket is represented by a single line of comma-separated values. The values are the numbers on the ticket in the order they appear; every ticket has the same format. For example, consider this ticket:

- .--------------------------------------------------------.
- | ????: 101    ?????: 102   ??????????: 103     ???: 104 |
- |                                                        |
- | ??: 301  ??: 302             ???????: 303      ??????? |
- | ??: 401  ??: 402           ???? ????: 403    ????????? |
- '--------------------------------------------------------'

Here, ? represents text in a language you don't understand. This ticket might be represented as 101,102,103,104,301,302,303,401,402,403; of course, the actual train tickets you're looking at are much more complicated. In any case, you've extracted just the numbers in such a way that the first number is always the same specific field, the second number is always a different specific field, and so on - you just don't know what each position actually means!

Start by determining which tickets are completely invalid; these are tickets that contain values which aren't valid for any field. Ignore your ticket for now.

For example, suppose you have the following notes:

- class: 1-3 or 5-7
- row: 6-11 or 33-44
- seat: 13-40 or 45-50

your ticket:
- 7,1,14

nearby tickets:
- 7,3,47
- 40,4,50
- 55,2,20
- 38,6,12

It doesn't matter which position corresponds to which field; you can identify invalid nearby tickets by considering only whether tickets contain values that are not valid for any field. In this example, the values on the first nearby ticket are all valid for at least one field. This is not true of the other three nearby tickets: the values 4, 55, and 12 are are not valid for any field. Adding together all of the invalid values produces your ticket scanning error rate: 4 + 55 + 12 = 71.

Consider the validity of the nearby tickets you scanned. What is your ticket scanning error rate?

In [1]:
import pandas as pd
import numpy as np

In [108]:
# Open file
f = open("input_data/Day16.txt", "r")

# initialize our rules df
rules = pd.DataFrame(columns = ['rule','r1_min','r1_max','r2_min','r2_max'])

line = f.readline()
while line != "\n":
    # we have another rule
    line = line.strip('\n')
    rule, ranges = line.split(':')
    range1, range2 = ranges.split(' or ')
    range1 = range1.strip()
    range2 = range2.strip()
    r1_min, r1_max = range1.split('-')
    r2_min, r2_max = range2.split('-')
    rules = rules.append(dict(zip(rules.columns, [rule, r1_min, r1_max, r2_min, r2_max])), ignore_index=True)

    line=f.readline()

# convert ranges to ints
rules[['r1_min','r1_max','r2_min','r2_max']] = rules[['r1_min','r1_max','r2_min','r2_max']].astype(int)

# Get number of rules
num_rules = rules.shape[0]

# Read 'your ticket' line
line = f.readline()
# Read next line with my ticket values
line = f.readline()
my_vals = np.array(line.split(','),dtype='int')

# read empty line
line = f.readline()
# read 'nearby tickets'
line = f.readline()

# initialize our ticket list
tickets = []

# read first ticket
line= f.readline()
while line != '':
    
    # process ticket
    ticket_nums = line.split(',')
    ticket_nums = [ticket.strip('\n') for ticket in ticket_nums]

    tickets.append(ticket_nums)
                             
    line= f.readline()

# convert our list of lists to a numpy array
tickets = np.array(tickets, dtype='int')

# Close file
f.close()

In [109]:
rules

Unnamed: 0,rule,r1_min,r1_max,r2_min,r2_max
0,departure location,47,874,885,960
1,departure station,25,616,622,964
2,departure platform,42,807,825,966
3,departure track,36,560,583,965
4,departure date,37,264,289,968
5,departure time,27,325,346,954
6,arrival location,37,384,391,950
7,arrival station,35,233,244,963
8,arrival platform,26,652,675,949
9,arrival track,41,689,710,954


In [110]:
tickets

array([[191, 477, 199, ..., 302, 376, 252],
       [598, 628, 446, ..., 102, 708, 637],
       [890, 168, 741, ..., 632, 226, 862],
       ...,
       [711, 119, 891, ..., 828, 225, 178],
       [880, 257, 383, ..., 253, 395, 318],
       [183, 940, 255, ..., 645, 855, 380]])

In [111]:
rules.apply(lambda x: (x['r1_min']<= 3 <=x['r1_max']) | (x['r2_min']<=3<=x['r2_max)']), axis=1).sum()

0

In [112]:
def test_values(ticket):
    '''takes a ticket which is an array of values and a set of rules (df) and returns an array of booleans'''
    
    ret_vals = []
    for val in ticket:
        # loop through our values testing each one against the set of rules.  If it's good add a True if not a False
        if rules.apply(lambda x: (x['r1_min'] <= val <= x['r1_max']) | 
                       (x['r2_min'] <= val <= x['r2_max']), axis=1).sum():
            ret_vals.append(True)
        else:
            ret_vals.append(False)
    
    return ret_vals

In [113]:
invalids = []

# Loop through our tickets
for ticket in tickets:
    
    valid_list = test_values(ticket)
    for idx, val in enumerate(valid_list):
        if not val:
            invalids.append(ticket[idx])

In [114]:
sum(invalids)

19087

### Part 2
Now that you've identified which tickets contain invalid values, discard those tickets entirely. Use the remaining valid tickets to determine which field is which.

Using the valid ranges for each field, determine what order the fields appear on the tickets. The order is consistent between all tickets: if seat is the third field, it is the third field on every ticket, including your ticket.

For example, suppose you have the following notes:

- class: 0-1 or 4-19
- row: 0-5 or 8-19
- seat: 0-13 or 16-19

your ticket:
-11,12,13

nearby tickets:
- 3,9,18
- 15,1,5
- 5,14,9

Based on the nearby tickets in the above example, the first position must be row, the second position must be class, and the third position must be seat; you can conclude that in your ticket, class is 12, row is 11, and seat is 13.

Once you work out which field is which, look for the six fields on your ticket that start with the word departure. What do you get if you multiply those six values together?

In [115]:
tickets.shape

(236, 20)

In [116]:
val_tickets = []

for idx, ticket in enumerate(tickets):
    
    valid_list = test_values(ticket)
    if sum(valid_list) == len(valid_list):
        val_tickets.append(ticket.tolist())

In [117]:
val_tickets = np.array(val_tickets)

In [118]:
val_tickets.shape

(190, 20)

In [119]:
# let's transpose our valid tickets because we're going to going through them one position at a time
val_tickets_T = np.transpose(val_tickets)

In [120]:
val_tickets_T.shape

(20, 190)

In [121]:
rules.shape

(20, 5)

Good, our number of rules matches our number of positions, so that is good.

In [122]:
def test_pos(values):
    '''takes an array of values and returns an array of booleans indicating which positions it could fit'''
    
    test_array = np.array([True for i in range(num_rules)])
    for val in values:
        # Get whether this value is valid for each rule
        valid = np.array(rules.apply(lambda x: (x['r1_min'] <= val <= x['r1_max']) | 
                       (x['r2_min'] <= val <= x['r2_max']), axis=1))
 
        # now and that with our test_array
        test_array = valid & test_array
#        print(val, test_array)
      
    return test_array

This tells us that the first column can only be position 1, but we could have more than one valid position so we need to build a dataframe and then work through that.  

In [123]:
valid_positions = pd.DataFrame()

for idx, pos_vals in enumerate(val_tickets_T):

    valid_positions[idx] = test_pos(pos_vals)

In [126]:
# Now we loop through there, looking for columns which have only one true, then dropping the corresponding row
positions_to_rules = [-1 for i in range(rules.shape[0])]

while valid_positions.shape[0]>0:
    
    # This gives us the column where we have only one True
    pos_num = np.where((valid_positions.sum(axis=0))==1)[0][0]
    
    # Get position in that column of the True
    rule_num = valid_positions[pos_num][valid_positions[pos_num]==True].index[0]
    
    # Store that rule_num in the positions_to_rules array
    positions_to_rules[pos_num] = rule_num
    
    # drop that row from our array
    valid_positions = valid_positions.drop(rule_num)

In [127]:
positions_to_rules

[5, 3, 10, 11, 9, 19, 16, 12, 8, 2, 7, 6, 14, 13, 1, 0, 17, 4, 18, 15]

In [128]:
rules.head()

Unnamed: 0,rule,r1_min,r1_max,r2_min,r2_max
0,departure location,47,874,885,960
1,departure station,25,616,622,964
2,departure platform,42,807,825,966
3,departure track,36,560,583,965
4,departure date,37,264,289,968


In [130]:
# let's find 6 fields that start with word departure
rules[rules['rule'].str.find('departure')==0]

Unnamed: 0,rule,r1_min,r1_max,r2_min,r2_max
0,departure location,47,874,885,960
1,departure station,25,616,622,964
2,departure platform,42,807,825,966
3,departure track,36,560,583,965
4,departure date,37,264,289,968
5,departure time,27,325,346,954


In [138]:
#looks like that's rules 0 through 5 so find those positions in our positions_to_rules array

answer = 1
for i in range(6):
    answer *= my_vals[positions_to_rules.index(i)]

In [139]:
answer

1382443095281