<a href="https://colab.research.google.com/github/mrklees/adventofcode/blob/master/Advent_of_Code.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advent of Code

[A code challenge a day in December.](https://adventofcode.com) All data stored on Github. Run the following three cells to refresh the repo locally. 

In [0]:
cd /content/adventofcode

/content/adventofcode


In [0]:
! git pull

remote: Enumerating objects: 5, done.[K
remote: Counting objects:  20% (1/5)   [Kremote: Counting objects:  40% (2/5)   [Kremote: Counting objects:  60% (3/5)   [Kremote: Counting objects:  80% (4/5)   [Kremote: Counting objects: 100% (5/5)   [Kremote: Counting objects: 100% (5/5), done.[K
remote: Compressing objects:  33% (1/3)   [Kremote: Compressing objects:  66% (2/3)   [Kremote: Compressing objects: 100% (3/3)   [Kremote: Compressing objects: 100% (3/3), done.[K
remote: Total 4 (delta 0), reused 4 (delta 0), pack-reused 0[K
Unpacking objects:  25% (1/4)   Unpacking objects:  50% (2/4)   Unpacking objects:  75% (3/4)   Unpacking objects: 100% (4/4)   Unpacking objects: 100% (4/4), done.
From https://github.com/mrklees/adventofcode
   5eb5ef3..e20905b  master     -> origin/master
Updating 5eb5ef3..e20905b
Fast-forward
 Day 2/input.txt | 250 [32m++++++++++++++++++++++++++++++++++++++++++++++++++++++++[m
 1 file changed, 250 insertions(+)
 create mode 100644 

In [0]:
cd /content

/content


## Day 1


### Part 1

We are given a sequence of positive and negative numbers, such as +1, +1, -1, and are essentially asked to find the sum of this series.  Perhaps part of the challenge of this is parsing the values?  In this case though pandas and numpy make quick work of it.  We simple read in the sting of values with pandas, convert to a numpy array, and then sum the array.  This gives us 536 which is in fact the correct answer. 

In [0]:
import pandas as pd

data = pd.read_csv("adventofcode/Day 1/input.txt", header=None).values

In [0]:
data.sum()

536

###  Part 2


In [0]:
n_repeats = 500
running_freq = np.cumsum(np.vstack([data]*n_repeats))
freqs = {}
searching = True
for freq in running_freq:
  freqs[freq] = freqs.get(freq, 0) + 1
  if freqs[freq] == 2:
    print(freq)
    break

75108


## Day 2

### Part 1

We are given a series of strings of the form "aabbbccdddd" and our job is to process the data and produce a checksum via the follow algorithm.  We will process each string and identiy it contains **exactly 2** or **exactly 3** of any character.  The check sum is then the product of the respective sums of the number of series with exactly two and exactly three. 

In [0]:
import pandas as pd

data = pd.read_csv("adventofcode/Day 2/input.txt", header=None)
data.columns = ['strings']
sample = data['strings'][0]

In [0]:

data.strings.count()

250

In [0]:
def get_char_count(string):
  char_count = {}
  for char in string:
    char_count[char] = char_count.get(char, 0) + 1
  return char_count

def check_char_count(char_count, target):
  for char in char_count:
    if char_count[char] == target: return 1
  return 0

def process_string(string, target):
  char_count = get_char_count(string)
  meets_target = check_char_count(char_count, target=target)
  return meets_target

In [0]:
data['ExactlyTwo'] = [process_string(string, 2)for string in data['strings']]
data['ExactlyThree'] = [process_string(string, 3)for string in data['strings']]

data.ExactlyTwo.sum() * data.ExactlyThree.sum()

7163

### Part 2

We now have the task of searching our list to find two strings which differ by only one character.  To optimize our search, we should rank possible matches using the Levenshtein distance, which we'll calculate for every pair.

In [0]:
! pip install python-Levenshtein
! pip install fuzzywuzzy

Collecting python-Levenshtein
[?25l  Downloading https://files.pythonhosted.org/packages/42/a9/d1785c85ebf9b7dfacd08938dd028209c34a0ea3b1bcdb895208bd40a67d/python-Levenshtein-0.12.0.tar.gz (48kB)
[K    100% |████████████████████████████████| 51kB 1.7MB/s 
Building wheels for collected packages: python-Levenshtein
  Running setup.py bdist_wheel for python-Levenshtein ... [?25l- \ | / done
[?25h  Stored in directory: /root/.cache/pip/wheels/de/c2/93/660fd5f7559049268ad2dc6d81c4e39e9e36518766eaf7e342
Successfully built python-Levenshtein
Installing collected packages: python-Levenshtein
Successfully installed python-Levenshtein-0.12.0


In [0]:
from fuzzywuzzy import fuzz

fuzz.ratio(sample, sample)

def apply_fuzz(a, b):
  return fuzz.ratio(a, b)

vfuzz = np.vectorize(apply_fuzz)

In [0]:
%%timeit
import numpy as np

combinations["matchscore"] = vfuzz(combinations.str1.values, combinations.str2.values)
possible_row = combinations[combinations.matchscore != 100].sort_values("matchscore", ascending=False).iloc[0, :]

1 loop, best of 3: 6.53 s per loop


In [0]:
possible_row = combinations[combinations.matchscore != 100].sort_values("matchscore", ascending=False).iloc[0, :]

In [0]:
possible_row.str1, possible_row.str2

('ighfbbyijnoumxjlxevacpwqtr', 'ighfbsyijnoumxjlxevacpwqtr')

## Day 3

### Part 1

In this problem we have a large 2d array (at least 1000 x 1000), and we are then given a series of lines which 