**<center><font size="40">Advent of Code 2021</font></center>**

# Puzzle 6
## Part I

--- Day 6: Lanternfish ---
The sea floor is getting steeper. Maybe the sleigh keys got carried this way?


A massive school of glowing lanternfish swims past. They must spawn quickly to reach such large numbers - maybe exponentially quickly? You should model their growth rate to be sure.


Although you know nothing about this specific species of lanternfish, you make some guesses about their attributes. Surely, each lanternfish creates a new lanternfish once every 7 days.


However, this process isn't necessarily synchronized between every lanternfish - one lanternfish might have 2 days left until it creates another lanternfish, while another might have 4. So, you can model each fish as a single number that represents **the number of days until it creates a new lanternfish**.


Furthermore, you reason, a **new** lanternfish would surely need slightly longer before it's capable of producing more lanternfish: two more days for its first cycle.


So, suppose you have a lanternfish with an internal timer value of 3:


* After one day, its internal timer would become 2.
* After another day, its internal timer would become 1.
* After another day, its internal timer would become 0.
* After another day, its internal timer would reset to 6, and it would create a **new** lanternfish with an internal timer of 8.
* After another day, the first lanternfish would have an internal timer of 5, and the second lanternfish would have an internal timer of 7.


A lanternfish that creates a new fish resets its timer to 6, **not** 7 (because 0 is included as a valid timer value). The new lanternfish starts with an internal timer of 8 and does not start counting down until the next day.


Realizing what you're trying to do, the submarine automatically produces a list of the ages of several hundred nearby lanternfish (your puzzle input). For example, suppose you were given the following list:


3,4,3,1,2
This list means that the first fish has an internal timer of 3, the second fish has an internal timer of 4, and so on until the fifth fish, which has an internal timer of 2. Simulating these fish over several days would proceed as follows:


Initial state: 3,4,3,1,2<br>
After  1 day:  2,3,2,0,1<br>
After  2 days: 1,2,1,6,0,8<br>
After  3 days: 0,1,0,5,6,7,8<br>
After  4 days: 6,0,6,4,5,6,7,8,8<br>
After  5 days: 5,6,5,3,4,5,6,7,7,8<br>


Each day, a 0 becomes a 6 and adds a new 8 to the end of the list, while each other number decreases by 1 if it was present at the start of the day.


In this example, after 18 days, there are a total of 26 fish. After 80 days, there would be a total of 5934.


Find a way to simulate lanternfish. **How many lanternfish would there be after 80 days?**

In [1]:
#read data into a numpy list
import numpy as np
import csv

with open('data/pz06.txt', newline='') as f:
    reader = csv.reader(f)
    data = list(reader)

pz6_data = np.array([int(i) for i in data[0]])
print(pz6_data)

[1 1 3 5 3 1 1 4 1 1 5 2 4 3 1 1 3 1 1 5 5 1 3 2 5 4 1 1 5 1 4 2 1 4 2 1 4
 4 1 5 1 4 4 1 1 5 1 5 1 5 1 1 1 5 1 2 5 1 1 3 2 2 2 1 4 1 1 2 4 1 3 1 2 1
 3 5 2 3 5 1 1 4 3 3 5 1 5 3 1 2 3 4 1 1 5 4 1 3 4 4 1 2 4 4 1 1 3 5 3 1 2
 2 5 1 4 1 3 3 3 3 1 1 2 1 5 3 4 5 1 5 2 5 3 2 1 4 2 1 1 1 4 1 2 1 2 2 4 5
 5 5 4 1 4 1 4 2 3 2 3 1 1 2 3 1 1 1 5 2 2 5 3 1 4 1 2 1 1 5 3 1 4 5 1 4 2
 1 1 5 1 5 4 1 5 5 2 3 1 3 5 1 1 1 1 3 1 1 4 1 5 2 1 1 3 5 1 1 4 2 1 2 5 2
 5 1 1 1 2 3 5 5 1 4 3 2 2 3 2 1 1 4 1 3 5 2 3 1 1 5 1 3 5 1 1 5 5 3 1 3 3
 1 2 3 1 5 1 3 2 1 3 1 1 2 3 5 3 5 5 4 3 1 5 1 1 2 3 2 2 1 1 2 1 4 1 2 3 3
 3 1 3 5]


In [66]:
#create the list of fish after 80 Days
fish_list = pz6_data.copy()
n_days = 80
days = 0
while days < n_days:
    n_0s = (fish_list == 0).sum()
    if n_0s >=1:
        fish_list -= 1
        fish_list = np.where(fish_list<0, 6, fish_list)
        fish_list = np.append(fish_list, [8] * n_0s)
        days += 1
    else:
        fish_list -= 1
        days += 1

print("Total of %d fish after %d days" % (len(fish_list), days))

Total of 361169 fish after 80 days


## Part II
--- Part Two ---
Suppose the lanternfish live forever and have unlimited food and space. Would they take over the entire ocean?


After 256 days in the example above, there would be a total of **26984457539** lanternfish!


**How many lanternfish would there be after 256 days?**

We can't approach this problem the same way we did before since we're going to run out of memory. Instead of actually appending 8-s to a list we can just count how many timer 1, timer 2, timer 8 fish are there - creating a histogram!

In our list we're following total number of certain fish:

[0,1,2,3,4,5,6,7,8]

Since we have 9 different fish we only need a list of lenth 9. For example, if we had fish: [3,4,3,1,2] we can represent it by counting each as:

* 0 - 0 fish
* 1 - 1 fish
* 1 - 2 fish
* 2 - 3 fish
* 1 - 4 fish
* 0 - 5 fish
* 0 - 6 fish
* 0 - 7 fish
* 0 - 8 fish

resulting:

[0,1,1,2,1,0,0,0,0]

When a day passes, we just shift all the numbers to the left and **add** first element to the 6th element since that many new 6 will be now and replace 0-th element with 9-th element.

In [2]:
from collections import Counter
#data structure in list that counts number of fishes:
#[0,1,2,3,4,5,6,7,8]
# test list [3,4,3,1,2] -->[ 0, 1, 1, 2, 1, 0, 0, 0, 0]
#test_list = np.array([0, 1, 1, 2, 1, 0, 0, 0, 0])

def n_lanternfish(array, n_days):
    #create the numpy array for the loop
    f_list = np.zeros(9)
    
    #find frequency of the vlaues
    freqs = dict(Counter(array))
    for k,v in freqs.items():
        f_list[k] = v
    
    for i in range(n_days):
        f_list[7] += f_list[0]
        f_list = np.append(f_list[1:], f_list[0])

    print("Total of %d after %d days." % (f_list.sum(), n_days))

In [65]:
#test run
n_lanternfish([3,4,3,1,2], 256)

Total of 26984457539 after 256 days.


In [68]:
#n lanternfish after 256 days
n_lanternfish(pz6_data, 256)

Total of 1634946868992 after 256 days.


# Puzzle 7
## Part I

--- Day 7: The Treachery of Whales ---


A giant whale has decided your submarine is its next meal, and it's much faster than you are. There's nowhere to run!


Suddenly, a swarm of crabs (each in its own tiny submarine - it's too deep for them otherwise) zooms in to rescue you! They seem to be preparing to blast a hole in the ocean floor; sensors indicate a massive underground cave system just beyond where they're aiming!


The crab submarines all need to be aligned before they'll have enough power to blast a large enough hole for your submarine to get through. However, it doesn't look like they'll be aligned before the whale catches you! Maybe you can help?


There's one major catch - crab submarines can only move horizontally.


You quickly make a list of **the horizontal position of each crab** (your puzzle input). Crab submarines have limited fuel, so you need to find a way to make all of their horizontal positions match while requiring them to spend as little fuel as possible.


For example, consider the following horizontal positions:


<code>16,1,2,0,4,2,7,1,2,14</code>


This means there's a crab with horizontal position 16, a crab with horizontal position 1, and so on.


Each change of 1 step in horizontal position of a single crab costs 1 fuel. You could choose any horizontal position to align them all on, but the one that costs the least fuel is horizontal position 2:


Move from 16 to 2: 14 fuel<br>
Move from 1 to 2: 1 fuel<br>
Move from 2 to 2: 0 fuel<br>
Move from 0 to 2: 2 fuel<br>
Move from 4 to 2: 2 fuel<br>
Move from 2 to 2: 0 fuel<br>
Move from 7 to 2: 5 fuel<br>
Move from 1 to 2: 1 fuel<br>
Move from 2 to 2: 0 fuel<br>
Move from 14 to 2: 12 fuel<br>


This costs a total of 37 fuel. This is the cheapest possible outcome; more expensive outcomes include aligning at position 1 (41 fuel), position 3 (39 fuel), or position 10 (71 fuel).


Determine the horizontal position that the crabs can align to using the least fuel possible. **How much fuel must they spend to align to that position?**

In [5]:
#test set
pz7_test = [16,1,2,0,4,2,7,1,2,14]

#find range of positions
pz7_test_range = [i for i in range(min(pz7_test), max(pz7_test) + 1)]

#for each crab calculate it's fuel consumption to
#each position in the range and sum, then find min
pz7_test_fuels = []
for pos in pz7_test_range:
    fuel = sum([abs(pos - crab) for crab in pz7_test])
    pz7_test_fuels.append(fuel)
    
pz7_test_min_fuel = min(pz7_test_fuels)
pz7_test_min_fuel

37

In [12]:
#test result is correct, lets import the data
with open('data/pz07.txt', newline='') as f:
    reader = csv.reader(f)
    data = list(reader)

pz7_data = np.array([int(i) for i in data[0]])
print("Data type:", type(pz7_data))
print("Data length:",len(pz7_data))
print("Data range [%d;%d]" % (pz7_data.min(), pz7_data.max()))

Data type: <class 'numpy.ndarray'>
Data length: 1000
Data range [0;1862]


In [25]:
#create a function to return least fuel
def least_fuel(array):
    #convert array to numpy array if not already
    if type(array).__module__ != np.__name__:
        array = np.array(array)
    
    pos_range = [i for i in range(array.min(), array.max()+1)]
    
    fuels = np.array([], dtype='int64')
    for pos in pos_range:
        fuel = np.array([abs(pos - crab) for crab in array]).sum()
        fuels = np.append(fuels, fuel)
    
    least_f = fuels.min()
    print("Least fuel to align the crabs costs:", least_f)

In [26]:
least_fuel(pz7_test)

Least fuel to align the crabs costs: 37


In [27]:
least_fuel(pz7_data)

Least fuel to align the crabs costs: 328318


## Part II

--- Part Two ---


The crabs don't seem interested in your proposed solution. Perhaps you misunderstand crab engineering?


As it turns out, crab submarine engines don't burn fuel at a constant rate. Instead, each change of 1 step in horizontal position costs 1 more unit of fuel than the last: the first step costs 1, the second step costs 2, the third step costs 3, and so on.


As each crab moves, moving further becomes more expensive. This changes the best horizontal position to align them all on; in the example above, this becomes 5:


Move from 16 to 5: 66 fuel<br>
Move from 1 to 5: 10 fuel<br>
Move from 2 to 5: 6 fuel<br>
Move from 0 to 5: 15 fuel<br>
Move from 4 to 5: 1 fuel<br>
Move from 2 to 5: 6 fuel<br>
Move from 7 to 5: 3 fuel<br>
Move from 1 to 5: 10 fuel<br>
Move from 2 to 5: 6 fuel<br>
Move from 14 to 5: 45 fuel<br>


This costs a total of **168** fuel. This is the new cheapest possible outcome; the old alignment position (2) now costs 206 fuel instead.


Determine the horizontal position that the crabs can align to using the least fuel possible so they can make you an escape route! **How much fuel must they spend to align to that position?**

In [49]:
#create a function to return least fuel
def least_fuel_v2(array):
    #convert array to numpy array if not already
    if type(array).__module__ != np.__name__:
        array = np.array(array)
    
    pos_range = [i for i in range(array.min(), array.max()+1)]
     
    fuels = np.array([
                     np.array([ 
                              np.array([(n+1) for n in range(abs(pos-crab))]).sum() \
                     for crab in array]).sum() \
            for pos in pos_range], dtype="int64")
    
    least_f = fuels.min()
    print("Least fuel to align the crabs costs:", least_f)

In [50]:
least_fuel_v2(pz7_test)

Least fuel to align the crabs costs: 168


In [51]:
#least_fuel_v2(pz7_data)
# correct answer 89791146

Least fuel to align the crabs costs: 89791146


# Puzzle 8

## Part I

--- Day 8: Seven Segment Search ---
You barely reach the safety of the cave when the whale smashes into the cave mouth, collapsing it. Sensors indicate another exit to this cave at a much greater depth, so you have no choice but to press on.

As your submarine slowly makes its way through the cave system, you notice that the four-digit seven-segment displays in your submarine are malfunctioning; they must have been damaged during the escape. You'll be in a lot of trouble without them, so you'd better figure out what's wrong.

Each digit of a seven-segment display is rendered by turning on or off any of seven segments named a through g:


<pre>  0:      1:      2:      3:      4:
 aaaa    ....    aaaa    aaaa    ....
b    c  .    c  .    c  .    c  b    c
b    c  .    c  .    c  .    c  b    c
 ....    ....    dddd    dddd    dddd
e    f  .    f  e    .  .    f  .    f
e    f  .    f  e    .  .    f  .    f
 gggg    ....    gggg    gggg    ....

  5:      6:      7:      8:      9:
 aaaa    aaaa    aaaa    aaaa    aaaa
b    .  b    .  .    c  b    c  b    c
b    .  b    .  .    c  b    c  b    c
 dddd    dddd    ....    dddd    dddd
.    f  e    f  .    f  e    f  .    f
.    f  e    f  .    f  e    f  .    f
 gggg    gggg    ....    gggg    gggg</pre>



So, to render a 1, only segments c and f would be turned on; the rest would be off. To render a 7, only segments a, c, and f would be turned on.


The problem is that the signals which control the segments have been mixed up on each display. The submarine is still trying to display numbers by producing output on signal wires a through g, but those wires are connected to segments **randomly**. Worse, the wire/segment connections are mixed up separately for each four-digit display! (All of the digits **within** a display use the same connections, though.)


So, you might know that only signal wires b and g are turned on, but that doesn't mean **segment** b and g are turned on: the only digit that uses two segments is 1, so it must mean segments c and f are meant to be on. With just that information, you still can't tell which wire (b/g) goes to which segment (c/f). For that, you'll need to collect more information.


For each display, you watch the changing signals for a while, make a note of **all ten unique signal patterns** you see, and then write down a single **four digit output value** (your puzzle input). Using the signal patterns, you should be able to work out which pattern corresponds to which digit.


For example, here is what you might see in a single entry in your notes:


<pre>acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab |
cdfeb fcadb cdfeb cdbaf</pre>


(The entry is wrapped here to two lines so it fits; in your notes, it will all be on a single line.)


Each entry consists of ten **unique signal patterns**, a | delimiter, and finally the **four digit output value**. Within an entry, the same wire/segment connections are used (but you don't know what the connections actually are). The unique signal patterns correspond to the ten different ways the submarine tries to render a digit using the current wire/segment connections. Because 7 is the only digit that uses three segments, dab in the above example means that to render a 7, signal lines d, a, and b are on. Because 4 is the only digit that uses four segments, eafb means that to render a 4, signal lines e, a, f, and b are on.


Using this information, you should be able to work out which combination of signal wires corresponds to each of the ten digits. Then, you can decode the four digit output value. Unfortunately, in the above example, all of the digits in the output value (cdfeb fcadb cdfeb cdbaf) use five segments and are more difficult to deduce.


For now, **focus on the easy digits**. Consider this larger example:


be cfbegad cbdgef fgaecd cgeb fdcge agebfd fecdb fabcd edb |<br>
**fdgacbe** cefdb cefbgd **gcbe**<br>
edbfga begcd cbg gc gcadebf fbgde acbgfd abcde gfcbed gfec |<br>
fcgedb **cgb dgebacf gc**<br>
fgaebd cg bdaec gdafb agbcfd gdcbef bgcad gfac gcb cdgabef |<br>
**cg cg** fdcagb **cbg**<br>
fbegcd cbd adcefb dageb afcb bc aefdc ecdab fgdeca fcdbega |<br>
efabcd cedba gadfec **cb**<br>
aecbfdg fbg gf bafeg dbefa fcge gcbea fcaegb dgceab fcbdga |<br>
**gecf egdcabf bgf** bfgea<br>
fgeab ca afcebg bdacfeg cfaedg gcfdb baec bfadeg bafgc acf |<br>
**gebdcfa ecba ca fadegcb**<br>
dbcfg fgd bdegcaf fgec aegbdf ecdfab fbedc dacgb gdcebf gf |<br>
**cefg** dcbef **fcge gbcadfe**<br>
bdfegc cbegaf gecbf dfcage bdacg ed bedf ced adcbefg gebcd |<br>
**ed** bcgafe cdgba cbgef<br>
egadfb cdbfeg cegd fecab cgb gbdefca cg fgcdab egfdb bfceg |<br>
**gbdfcae bgc cg cgb**<br>
gcafb gcf dcaebfg ecagb gf abcdeg gaef cafbge fdbac fegbdc |<br>
**fgae** cfgab **fg** bagce<br>


Because the digits 1, 4, 7, and 8 each use a unique number of segments, you should be able to tell which combinations of signals correspond to those digits. Counting **only digits in the output values** (the part after | on each line), in the above example, there are 26 instances of digits that use a unique number of segments (highlighted above).

**In the output values, how many times do digits 1, 4, 7, or 8 appear?**

In [201]:
#import test data
import pandas as pd

#read in the test data
df8_test = pd.read_csv('data/pz08_test.txt', header=None,
                       sep=r"\s\|\s|\s",engine='python',
                       names=[i for i in range(10)] + ['n1','n2','n3','n4'])
df8_test.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,n1,n2,n3,n4
0,be,cfbegad,cbdgef,fgaecd,cgeb,fdcge,agebfd,fecdb,fabcd,edb,fdgacbe,cefdb,cefbgd,gcbe
1,edbfga,begcd,cbg,gc,gcadebf,fbgde,acbgfd,abcde,gfcbed,gfec,fcgedb,cgb,dgebacf,gc
2,fgaebd,cg,bdaec,gdafb,agbcfd,gdcbef,bgcad,gfac,gcb,cdgabef,cg,cg,fdcagb,cbg


Unique number of segments:
* 1 : 2
* 4 : 4
* 7 : 3
* 8 : 7

In [166]:
#convert strings in df to numbers 1,4,7,8 or -1 for else
def decode_string_by_length(string, other=-1):
    #dict of unique segment lengths
    length_dict = {2:1, 4:4, 3:7, 7:8}
    try:
        return length_dict[len(string)]
    except:'data/pz08_test.txt', header=None,
                       sep=r"\s\|\s|\s",engine='python',
                       names=[i for i in range(10)] + ['n1','n2','n3','n4']
        return other

In [202]:
df8_test2 = df8_test.apply(np.vectorize(decode_string_by_length))
df8_test2.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,n1,n2,n3,n4
0,1,8,-1,-1,4,-1,-1,-1,-1,7,8,-1,-1,4
1,-1,-1,7,1,8,-1,-1,-1,-1,4,-1,7,8,1
2,-1,1,-1,-1,-1,-1,-1,4,7,8,1,1,-1,7


In [203]:
from collections import Counter

def n_1478(df):
    data_flat = df.to_numpy().flatten()
    counter = Counter(i for i in data_flat if i != -1)
    total_1478 = sum(list(counter.values()))
    print("Total number of digits 1,4,7 and 8 in output:", total_1478)
    
n_1478(df8_test2.loc[:,'n1':'n4'])

Total number of digits 1,4,7 and 8 in output: 26


In [204]:
#import challenge set
df8 = pd.read_csv('data/pz08.txt', header=None,
                  sep=r"\s\|\s|\s",engine='python',
                  names=[i for i in range(10)] + ['n1','n2','n3','n4'])
df8.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,n1,n2,n3,n4
0,dcga,cadgbfe,gecba,cbfde,eda,cdbea,gbadfe,fegcba,bedgca,da,bgefdac,bdace,ad,agcd
1,fe,ecf,fdbagec,dcgfab,defbca,efbcga,daceg,cfdea,bfed,fdbca,aefdc,fbde,abdefc,dcgae
2,fbg,cgafe,bf,bfdc,ebdcag,fgcdba,gbdecaf,bcfag,badgc,gdefab,bfg,bf,bf,bdgeca


In [205]:
df8_v2 = df8.apply(np.vectorize(decode_string_by_length))
df8_v2.head(3)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,n1,n2,n3,n4
0,4,8,-1,-1,7,-1,-1,-1,-1,1,8,-1,1,4
1,1,7,8,-1,-1,-1,-1,-1,4,-1,-1,4,-1,-1
2,7,-1,1,4,-1,-1,8,-1,-1,-1,7,1,1,-1


In [206]:
n_1478(df8_v2.loc[:,'n1':'n4'])

Total number of digits 1,4,7 and 8 in output: 479


## Part II

--- Part Two ---


Through a little deduction, you should now be able to determine the remaining digits. Consider again the first example above:


<pre>acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab |
cdfeb fcadb cdfeb cdbaf</pre>


After some careful analysis, the mapping between signal wires and segments only make sense in the following configuration:


<pre> dddd
e    a
e    a
 ffff
g    b
g    b
 cccc</pre>
 
 
So, the unique signal patterns would correspond to the following digits:


* acedgfb: 8
* cdfbe: 5
* gcdfa: 2
* fbcad: 3
* dab: 7
* cefabd: 9
* cdfgeb: 6
* eafb: 4
* cagedb: 0
* ab: 1


Then, the four digits of the output value can be decoded:


* cdfeb: 5
* fcadb: 3
* cdfeb: 5
* cdbaf: 3


Therefore, the output value for this entry is 5353.


Following this same process for each entry in the second, larger example above, the output value of each entry can be determined:


* fdgacbe cefdb cefbgd gcbe: 8394
* fcgedb cgb dgebacf gc: 9781
* cg cg fdcagb cbg: 1197
* efabcd cedba gadfec cb: 9361
* gecf egdcabf bgf bfgea: 4873
* gebdcfa ecba ca fadegcb: 8418
* cefg dcbef fcge gbcadfe: 4548
* ed bcgafe cdgba cbgef: 1625
* gbdfcae bgc cg cgb: 8717
* fgae cfgab fg bagce: 4315


Adding all of the output values in this larger example produces 61229.


For each entry, determine all of the wire/segment connections and decode the four-digit output values. **What do you get if you add up all of the output values?**

In [310]:
sample = df8_test

#sort characters in all strings in df
def sort_letters(string):
    return "".join(sorted(string))

# find proper mapping between signal wires and segments
def find_mask(row):
    mask = {}
    signal = [i for i in row.loc[0:9]]
    n_069 = [i for i in signal if len(i) == 6]
    n_235 = [i for i in signal if len(i) == 5]
    
    #find strings for 1,4,7,8
    for string in signal:
        if len(string) == 2:
            mask[1] = string
        if len(string) == 3:
            mask[7] = string
        if len(string) == 4:
            mask[4] = string
        if len(string) == 7:
            mask[8] = string
    
    #find stings for 0,6,9
    for string in n_069:
        if set(mask[4]) == set(string) & set(mask[4]):
            mask[9] = string
        elif set(mask[1]) == set(string) & set(mask[1]):
            mask[0] = string
        else:
            mask[6] = string
    
    #find strings for 2,3,5
    unique_4_set = set(mask[4]).difference(set(mask[1]))
    for string in n_235:
        if set(mask[7]) == set(string) & set(mask[7]):
            mask[3] = string
        elif unique_4_set == set(string) & unique_4_set:
            mask[5] = string
        else:
            mask[2] = string

    #reverse numbers with strings
    mask_decode = {v:k for k,v in mask.items()}
    return mask_decode


#entire workflow
def output_sum(df):
    
    #sort all letters in df alphapetically
    df = df.apply(np.vectorize(sort_letters))
    
    #convert output letters to digits
    for idx in df.index:
        mask = find_mask(df.iloc[idx,:])
        df.loc[idx,"n1":'n4'] = df.loc[idx,"n1":"n4"].apply(lambda x: mask[x])
    
    #convert dtypes to string
    df = df.astype('str')
    
    #join digits into a 4-digit output numbersample = sample.astype('str')
#sample['n1']+sample['n2']+sample['n3']+sample['n4']
sample['output'] = sample[['n1','n2','n3','n4']].agg("".join, axis=1)
sample['output'].astype('int64').sum()
    df['output'] = df[['n1','n2','n3','n4']].agg("".join, axis=1)
    
    #sum the output values
    output_sum = df['output'].astype('int64').sum()
    
    print("Output sum:", output_sum)

In [311]:
output_sum(df8)

Output sum: 1041746


# Puzzle 9
## Part I

--- Day 9: Smoke Basin ---


These caves seem to be lava tubes. Parts are even still volcanically active; small hydrothermal vents release smoke into the caves that slowly settles like rain.


If you can model how the smoke flows through the caves, you might be able to avoid it and be that much safer. The submarine generates a heightmap of the floor of the nearby caves for you (your puzzle input).


Smoke flows to the lowest point of the area it's in. For example, consider the following heightmap:


2**1**9994321**0**<br>
3987894921<br>
98**5**6789892<br>
8767896789<br>
989996**5**678<br>


Each number corresponds to the height of a particular location, where 9 is the highest and 0 is the lowest a location can be.


Your first goal is to find the **low points** - the locations that are lower than any of its adjacent locations. Most locations have four adjacent locations (up, down, left, and right); locations on the edge or corner of the map have three or two adjacent locations, respectively. (Diagonal locations do not count as adjacent.)


In the above example, there are **four** low points, all highlighted: two are in the first row (a 1 and a 0), one is in the third row (a 5), and one is in the bottom row (also a 5). All other locations on the heightmap have some lower adjacent location, and so are not low points.


The **risk level** of a low point is **1 plus its height**. In the above example, the risk levels of the low points are 2, 1, 6, and 6. The sum of the risk levels of all low points in the heightmap is therefore 15.


Find all of the low points on your heightmap. **What is the sum of the risk levels of all low points on your heightmap?**

In [120]:
from scipy.signal import argrelextrema, argrelmin
#test data
pz9_test = np.array([[2,1,9,9,9,4,3,2,1,0],
                     [3,9,8,7,8,9,4,9,2,1],
                     [9,8,5,6,7,8,9,8,9,2],
                     [8,7,6,7,8,9,6,7,8,9],
                     [9,8,9,9,9,6,5,6,7,8]])


In [127]:
#find risk levels of the low points on heightmap
def risk_lvl_sums(data):
    #pad the numpy array
    data = np.pad(data, pad_width=[(0,1), (0,1)],
                  mode='constant', constant_values=11)
    
    #lows in rows
    row_mins = argrelextrema(data, np.less, axis=1, mode='wrap')
    row_coord = [(x,y) for x,y in zip(row_mins[0],row_mins[1])]
    #lows in colsAn integer ndarray where each unique feature in input has a unique label in the returned array.
    col_mins = argrelextrema(data, np.less, axis=0, mode='wrap')
    col_coord = [(x,y) for x,y in zip(col_mins[0], col_mins[1])]

    low_coords = set(row_coord) & set(col_coord)
    low_coords_l = list(low_coords)
    
    #find low vals and sum for risk assessment
    low_risks = np.array([data[coord[0],coord[1]] + 1 for coord in low_coords_l]).sum()
    
    print("Sum of the risk levels of the low points:", low_risks)

risk_lvl_sums(pz9_test)

Sum of the risk levels of the low points: 15


In [128]:
import pandas as pd

#import raw data
pz9_raw = pd.read_csv("data/pz09.txt", header=None)

#separate rows of numebr s into digits
pz9_data = []
for row in np.squeeze(pz9_raw.values):
    pz9_data.append([int(i) for i in row])

pz9_data = np.array(pz9_data)

risk_lvl_sums(pz9_data)

Sum of the risk levels of the low points: 468


## Part II

--- Part Two ---
Next, you need to find the largest basins so you know what areas are most important to avoid.


A **basin** is all locations that eventually flow downward to a single low point. Therefore, every low point has a basin, although some basins are very small. Locations of height 9 do not count as being in any basin, and all other locations will always be part of exactly one basin.


The **size** of a basin is the number of locations within the basin, including the low point. The example above has four basins.

<pre>
The top-left basin, size 3:

2199943210
3987894921
9856789892
8767896789
9899965678

The top-right basin, size 9:

2199943210
3987894921
9856789892
8767896789
9899965678

The middle basin, size 14:

2199943210
3987894921
9856789892
8767896789
9899965678

The bottom-right basin, size 9:

2199943210
3987894921
9856789892
8767896789
9899965678</pre>


Find the three largest basins and multiply their sizes together. In the above example, this is 9 * 14 * 9 = 1134.


**What do you get if you multiply together the sizes of the three largest basins?**

In [151]:
from scipy.ndimage import measurements


def top3_clusters(data):
    
    #convert all numbers but 9 to 1
    data = np.where(data == 9, 0, 1)
    
    #find cluster label and size
    label_no, n_clusters = measurements.label(data)
    
    #cluster sizes
    areas = measurements.sum(data, label_no, index=range(label_no.max()+1))
    
    #find top 3 clusters
    top3_clusters = np.sort(areas)[-3:]
    
    #product of the top3 elements
    top3_product = np.prod(top3_clusters, dtype='int64')
    print("Product of the 3 largest basins:", top3_product)

top3_clusters(pz9_test)

Product of the 3 largest basins: 1134


In [152]:
top3_clusters(pz9_data)

Product of the 3 largest basins: 1280496


# Puzzle 10
## Part I

--- Day 10: Syntax Scoring ---
You ask the submarine to determine the best route out of the deep-sea cave, but it only replies:


<code>Syntax error in navigation subsystem on line: all of them</code>


All of them?! The damage is worse than you thought. You bring up a copy of the navigation subsystem (your puzzle input).


The navigation subsystem syntax is made of several lines containing chunks. There are one or more chunks on each line, and chunks contain zero or more other chunks. Adjacent chunks are not separated by any delimiter; if one chunk stops, the next chunk (if any) can immediately start. Every chunk must **open** and **close** with one of four legal pairs of matching characters:

- If a chunk opens with (, it must close with ).
- If a chunk opens with [, it must close with ].
- If a chunk opens with {, it must close with }.
- If a chunk opens with <, it must close with >.


So, () is a legal chunk that contains no other chunks, as is []. More complex but valid chunks include ([]), {()()()}, <([{}])>, [<>({}){}[([])<>]], and even (((((((((()))))))))).

Some lines are incomplete, but others are corrupted. Find and discard the corrupted lines first.

A corrupted line is one where a chunk closes with the wrong character - that is, where the characters it opens and closes with do not form one of the four legal pairs listed above.

Examples of corrupted chunks include (], {()()()>, (((()))}, and <([]){()}[{}]). Such a chunk can appear anywhere within a line, and its presence causes the whole line to be considered corrupted.

For example, consider the following navigation subsystem:


[({(<(())[]>[[{[]{<()<>><br>
[(()[<>])]({[<{<<[]>>(<br>
{([(<{}[<>[]}>{[]{[(<()><br>
(((({<>}<{<{<>}{[]{[]{}<br>
[[<[([]))<([[{}[[()]]]<br>
[{[{({}]{}}([{[{{{}}([]<br>
{<[[]]>}<{[{[{[]{()[[[]<br>
[<(<(<(<{}))><([]([]()<br>
<{([([[(<>()){}]>(<<{{<br>
<{([{{}}[<[[[<>{}]]]>[]]<br>


Some of the lines aren't corrupted, just incomplete; you can ignore these lines for now. The remaining five lines are corrupted:


* {([(<{}[<>[]}>{[]{[(<()> - Expected ], but found } instead.
* [[<[([]))<([[{}[[()]]] - Expected ], but found ) instead.
* [{[{({}]{}}([{[{{{}}([] - Expected ), but found ] instead.
* [<(<(<(<{}))><([]([]() - Expected >, but found ) instead.
* <{([([[(<>()){}]>(<<{{ - Expected ], but found > instead.


Stop at the first incorrect closing character on each corrupted line.


Did you know that syntax checkers actually have contests to see who can get the high score for syntax errors in a file? It's true! To calculate the syntax error score for a line, take the **first illegal** character on the line and look it up in the following table:

* ): 3 points.
* ]: 57 points.
* }: 1197 points.
* >: 25137 points.


In the above example, an illegal ) was found twice (2*3 = 6 points), an illegal ] was found once (57 points), an illegal } was found once (1197 points), and an illegal > was found once (25137 points). So, the total syntax error score for this file is 6+57+1197+25137 = 26397 points!


Find the first illegal character in each corrupted line of the navigation subsystem. **What is the total syntax error score for those errors?**

In [175]:
#error dict
error_dict = {")": 3, "]": 57, "}":1197, ">":25137}

#test data
pz10_test = np.array(["[({(<(())[]>[[{[]{<()<>>",
             "[(()[<>])]({[<{<<[]>>(", 
             "{([(<{}[<>[]}>{[]{[(<()>",
             "(((({<>}<{<{<>}{[]{[]{}",
             "[[<[([]))<([[{}[[()]]]",
             "[{[{({}]{}}([{[{{{}}([]",
             "{<[[]]>}<{[{[{[]{()[[[]",
             "[<(<(<(<{}))><([]([]()",
             "<{([([[(<>()){}]>(<<{{",
             "<{([{{}}[<[[[<>{}]]]>[]]"])

In [183]:
#find illegal chunk ends
def find_illegals(data):
    #values for thr chunks
    val_dict = {"(":1, ")":-1,
                "[":2, "]":-2,
                "{":3, "}":-3,
                "<":4, ">":-4}
    
    errors = []
    for string in data:
        #convert chunks to numbers
        chunk_val = [val_dict[i] for i in string]
        
        expectation = []
        for val,char in zip(chunk_val,string):
            if val > 0:
                expectation.append(val)
            else:
                if val + expectation[-1] == 0:
                    expectation = expectation[:-1]
                else:
                    errors.append(error_dict[char])
                    break
    
    print("Total syntax error:", sum(errors))
                    
                    
find_illegals(pz10_test)

Total syntax error: 26397


In [188]:
#import puzzle input
pz10_df = pd.read_csv("data/pz10.txt", header=None)
pz10_data_array = pz10_df.values.flatten()

find_illegals(pz10_data_array)

Total syntax error: 415953


## Part II

--- Part Two ---
Now, discard the corrupted lines. The remaining lines are **incomplete**.

Incomplete lines don't have any incorrect characters - instead, they're missing some closing characters at the end of the line. To repair the navigation subsystem, you just need to figure out **the sequence of closing characters** that complete all open chunks in the line.

You can only use closing characters (), ], }, or >), and you must add them in the correct order so that only legal pairs are formed and all chunks end up closed.

In the example above, there are five incomplete lines:

* [({(<(())[]>[[{[]{<()<>> - Complete by adding }}]])})].
* [(()[<>])]({[<{<<[]>>( - Complete by adding )}>]}).
* (((({<>}<{<{<>}{[]{[]{} - Complete by adding }}>}>)))).
* {<[[]]>}<{[{[{[]{()[[[] - Complete by adding ]]}}]}]}>.
* <{([{{}}[<[[[<>{}]]]>[]] - Complete by adding ])}>.


Did you know that autocomplete tools also have contests? It's true! The score is determined by considering the completion string character-by-character. Start with a total score of 0. Then, for each character, multiply the total score by 5 and then increase the total score by the point value given for the character in the following table:


* ): 1 point.
* ]: 2 points.
* }: 3 points.
* \>: 4 points.

So, the last completion string above - ])}> - would be scored as follows:


* Start with a total score of 0.
* Multiply the total score by 5 to get 0, then add the value of ] (2) to get a new total score of 2.
* Multiply the total score by 5 to get 10, then add the value of ) (1) to get a new total score of 11.
* Multiply the total score by 5 to get 55, then add the value of } (3) to get a new total score of 58.
* Multiply the total score by 5 to get 290, then add the value of > (4) to get a new total score of 294.


The five lines' completion strings have total scores as follows:


* }}]])})] - 288957 total points.
* )}>]}) - 5566 total points.
* }}>}>)))) - 1480781 total points.
* ]]}}]}]}> - 995444 total points.
* ])}> - 294 total points.


Autocomplete tools are an odd bunch: the winner is found by **sorting** all of the scores and then taking the **middle** score. (There will always be an odd number of scores to consider.) In this example, the middle score is **288957** because there are the same number of scores smaller and larger than it.

Find the completion string for each incomplete line, score the completion strings, and sort the scores. **What is the middle score?**

In [None]:
#points dictionary

In [215]:
#exclude corrupted lines 
def find_incomplete(data):
    #values for thr chunks
    val_dict = {"(":1, ")":-1,
                "[":2, "]":-2,
                "{":3, "}":-3,
                "<":4, ">":-4}
    
    idx_corrupt = []
    errors = []
    for i,string in enumerate(data):
        #convert chunks to numbers
        chunk_val = [val_dict[i] for i in string]
        
        expectation = []
        for val,char in zip(chunk_val,string):
            if val > 0:
                expectation.append(val)
            else:
                if val + expectation[-1] == 0:
                    expectation = expectation[:-1]
                else:
                    idx_corrupt.append(i)
                    break
    
    idx_incomplete = list(set(range(len(data))) - set(idx_corrupt))
    return [data[i] for i in idx_incomplete]

#find closing sequence
def find_closing_seq(data):
    #values for thr chunks
    val_dict = {"(":1, ")":-1,
                "[":2, "]":-2,
                "{":3, "}":-3,
                "<":4, ">":-4}
    
    closing_seq = []
    for string in data:
        #convert chunks to numbers
        chunk_val = [val_dict[i] for i in string]
        
        expectation = []
        for val,char in zip(chunk_val,string):
            if val > 0:
                expectation.append(val)
            else:
                expectation = expectation[:-1]
        closing_seq.append(expectation[::-1])
        
    return closing_seq

#find the middle score
def find_middle_score(data):
    scores = []
    for complete in data:
        tot_score = 0
        for i in complete:
            tot_score = (tot_score * 5) + i
        scores.append(tot_score)
    
    #sort score values
    scores_sorted = np.sort(scores)
    
    #extract middle value
    middle_score = scores_sorted[(len(scores_sorted) - 1) // 2]
    
    print("The middle score:", middle_score)

find_middle_score(find_closing_seq(find_incomplete(pz10_test)))

The middle score: 288957


In [216]:
find_middle_score(find_closing_seq(find_incomplete(pz10_data_array)))

The middle score: 2292863731
