# Day 8: Matchsticks

## Part 1

Space on the sleigh is limited this year, and so Santa will be bringing his list as a digital copy. He needs to know how much space it will take up when stored.

It is common in many programming languages to provide a way to escape special characters in strings. For example, C, JavaScript, Perl, Python, and even PHP handle special characters in very similar ways.

However, it is important to realize the difference between the number of characters ***in the code representation of the string literal*** and the number of characters ***in the in-memory string itself***.

For example:

- ```""```is ```2``` characters of code (the two double quotes), but the string contains zero characters.
- ```"abc"``` is ```5``` characters of code, but ```3``` characters in the string data.
- ```"aaa\"aaa"``` is ```10``` characters of code, but the string itself contains six "a" characters and a single, escaped quote character, for a total of ```7``` characters in the string data.
- ```"\x27"``` is ```6``` characters of code, but the string itself contains just one - an apostrophe (```'```), escaped using hexadecimal notation.


Santa's list is a file that contains many double-quoted string literals, one on each line. The only escape sequences used are ```\\``` (which represents a single backslash), ```\"``` (which represents a lone double-quote character), and ```\x``` plus two hexadecimal characters (which represents a single character with that ASCII code).

Disregarding the whitespace in the file, what is **the number of characters of code for string literals** minus **the number of characters in memory for the values of the strings** in total for the entire file?

For example, given the four strings above, the total number of characters of string code (```2 + 5 + 10 + 6 = 23```) minus the total number of characters in memory for string values (```0 + 3 + 7 + 1 = 11```) is ```23 - 11 = 12```.

There are far more robust/less hacky ways to do this with REGEX/string parsing that don't involve the dreaded ```eval()``` - but I spent far too long tying to escape the characters. As such life's too short and I have a train to catch!

In [1]:
def get_diff(string):
    return len(string) - len(eval(string))

This will return the difference between the string, as it is read from the file, and the expression inside, as evaluated by python's ```eval()``` function. Set up a simple solver to do this for each line in our input and add that to a cumulative counter to get the result for part one.

In [2]:
def solve_part1(lines):
    count = 0
    for line in lines:
        count += get_diff(line.strip())
    print(count)

In [3]:
lines = open("day08_data.txt").readlines()

In [4]:
solve_part1(lines)

1350


Now, let's go the other way. In addition to finding the number of characters of code, you should now **encode each code representation as a new string** and find the number of characters of the new encoded representation, including the surrounding double quotes.

For example:

- ```""``` encodes to ```"\"\""```, an increase from ```2``` characters to ```6```.
- ```"abc"``` encodes to ```"\"abc\""```, an increase from ```5``` characters to ```9```.
- ```"aaa\"aaa"``` encodes to ```"\"aaa\\\"aaa\""```, an increase from ```10``` characters to ```16```.
- ```"\x27"``` encodes to ```"\"\\x27\""```, an increase from ```6``` characters to ```11```.

Your task is to find **the total number of characters to represent the newly encoded strings** minus **the number of characters of code in each original string literal**. For example, for the strings above, the total encoded length (```6 + 9 + 16 + 11 = 42```) minus the characters in the original code representation (```23```, just like in the first part of this puzzle) is ```42 - 23 = 19```.

In [5]:
def escape_strs(string):
    return (string
            .replace('\\', '\\\\')
            .replace('"', '\\"'))

def get_diff_part_2(string):
    return len(escape_strs(string)) - len(string) + 2


def solve_part2(lines):
    count = 0
    for line in lines:
        count += get_diff_part_2(line.strip())
    print(count)

In [6]:
solve_part2(lines)

2085


Returning to this (and making life simpler by indexing each character of each string rather than using ```re```) a non-exploitable method can be found as follows:



In [14]:
lines = open("day08_data.txt").readlines()

def count_chars(string):
    return(len(string))

def count_escapes(string):
    count = 0
    c = 1
    while c < len(string) - 1:
        if string[c] == "\\":
            if string[c+1] == "x": 
                c += 4 # 4 is HEX i.e., "\x86" - four individual characters otherwise is escaping " etc.
            else:
                c += 2 
        else:
            c += 1
        count += 1
    return count

chars_code, chars_memory = 0, 0
for line in lines:
    chars_code   += count_chars(line)
    chars_memory += count_escapes(line)

print(chars_code-chars_memory)

1350


In [15]:
def encode_string(string):
    new_string = ""
    for c in string:
        if c == '"':
            new_string += '\\\"'
        elif c == '\\':
            new_string += '\\\\'
        else:
            new_string += c
    return '"' + new_string + '"'

chars_encoded = 0
for line in lines:
    chars_encoded += len(encode_string(line))

print(chars_encoded-chars_code)

2085
