Space on the sleigh is limited this year, and so Santa will be bringing his list as a digital copy. He needs to know how much space it will take up when stored.

It is common in many programming languages to provide a way to escape special characters in strings. For example, C, JavaScript, Perl, Python, and even PHP handle special characters in very similar ways.

However, it is important to realize the difference between the number of characters in the code representation of the string literal and the number of characters in the in-memory string itself.

For example:

"" is 2 characters of code (the two double quotes), but the string contains zero characters.
"abc" is 5 characters of code, but 3 characters in the string data.
"aaa\"aaa" is 10 characters of code, but the string itself contains six "a" characters and a single, escaped quote character, for a total of 7 characters in the string data.
"\x27" is 6 characters of code, but the string itself contains just one - an apostrophe ('), escaped using hexadecimal notation.
Santa's list is a file that contains many double-quoted string literals, one on each line. The only escape sequences used are \\ (which represents a single backslash), \" (which represents a lone double-quote character), and \x plus two hexadecimal characters (which represents a single character with that ASCII code).

Disregarding the whitespace in the file, what is the number of characters of code for string literals minus the number of characters in memory for the values of the strings in total for the entire file?

For example, given the four strings above, the total number of characters of string code (2 + 5 + 10 + 6 = 23) minus the total number of characters in memory for string values (0 + 3 + 7 + 1 = 11) is 23 - 11 = 12.

In [29]:
strings_import = []

with open("Day8Input.txt", encoding="utf-8") as file:
    for line in file:
        strings_import.append(line.strip())
        
print(strings_import)

['"azlgxdbljwygyttzkfwuxv"', '"v\\xfb\\"lgs\\"kvjfywmut\\x9cr"', '"merxdhj"', '"dwz"', '"d\\\\gkbqo\\\\fwukyxab\\"u"', '"k\\xd4cfixejvkicryipucwurq\\x7eq"', '"nvtidemacj\\"hppfopvpr"', '"kbngyfvvsdismznhar\\\\p\\"\\"gpryt\\"jaeh"', '"khre\\"o\\x0elqfrbktzn"', '"nugkdmqwdq\\x50amallrskmrxoyo"', '"jcrkptrsasjp\\\\\\"cwigzynjgspxxv\\\\vyb"', '"ramf\\"skhcmenhbpujbqwkltmplxygfcy"', '"aqjqgbfqaxga\\\\fkdcahlfi\\"pvods"', '"pcrtfb"', '"\\x83qg\\"nwgugfmfpzlrvty\\"ryoxm"', '"fvhvvokdnl\\\\eap"', '"kugdkrat"', '"seuxwc"', '"vhioftcosshaqtnz"', '"gzkxqrdq\\\\uko\\"mrtst"', '"znjcomvy\\x16hhsenmroswr"', '"clowmtra"', '"\\xc4"', '"jpavsevmziklydtqqm"', '"egxjqytcttr\\\\ecfedmmovkyn\\"m"', '"mjulrvqgmsvmwf"', '"o\\\\prxtlfbatxerhev\\xf9hcl\\x44rzmvklviv"', '"lregjexqaqgwloydxdsc\\\\o\\"dnjfmjcu"', '"lnxluajtk\\x8desue\\\\k\\x7abhwokfhh"', '"wrssfvzzn\\"llrysjgiu\\"npjtdli"', '"\\x67lwkks"', '"bifw\\"ybvmwiyi\\"vhol\\"vol\\xd4"', '"aywdqhvtvcpvbewtwuyxrix"', '"gc\\xd3\\"caukdgfdywj"', '"uczy\\\\fk"

In [30]:
code_length = []
for string in strings_import:
    code_length.append(len(string))

total_code_length = 0
for length in code_length:
    total_code_length += length
    
index = 0
memory_length = []

while index < len(strings_import):
    number_to_ignore = 0
    current_char = 0
    while current_char < len(strings_import[index]) - 1:
        if strings_import[index][current_char] == "\\":
            if strings_import[index][current_char + 1] == "\\" or strings_import[index][current_char + 1] == "\"":
                number_to_ignore += 1
                current_char += 1
            elif strings_import[index][current_char + 1] == "x":
                number_to_ignore += 3
                current_char += 3
        current_char += 1
    memory_length.append(len(strings_import[index]) - number_to_ignore - 2)
    index += 1

total_memory_length = 0
for length in memory_length:
    total_memory_length += length
    
for index in range(0, len(strings_import)):
    print(f"{strings_import[index]} = {code_length[index]}, {memory_length[index]}")

    
print(f"Total number of characters in code: {total_code_length}, total number of characters in memory: {total_memory_length}. Characters in code minus characters in memory: {total_code_length - total_memory_length}")

"azlgxdbljwygyttzkfwuxv" = 24, 22
"v\xfb\"lgs\"kvjfywmut\x9cr" = 28, 18
"merxdhj" = 9, 7
"dwz" = 5, 3
"d\\gkbqo\\fwukyxab\"u" = 23, 18
"k\xd4cfixejvkicryipucwurq\x7eq" = 32, 24
"nvtidemacj\"hppfopvpr" = 23, 20
"kbngyfvvsdismznhar\\p\"\"gpryt\"jaeh" = 38, 32
"khre\"o\x0elqfrbktzn" = 22, 16
"nugkdmqwdq\x50amallrskmrxoyo" = 30, 25
"jcrkptrsasjp\\\"cwigzynjgspxxv\\vyb" = 37, 32
"ramf\"skhcmenhbpujbqwkltmplxygfcy" = 35, 32
"aqjqgbfqaxga\\fkdcahlfi\"pvods" = 32, 28
"pcrtfb" = 8, 6
"\x83qg\"nwgugfmfpzlrvty\"ryoxm" = 32, 25
"fvhvvokdnl\\eap" = 17, 14
"kugdkrat" = 10, 8
"seuxwc" = 8, 6
"vhioftcosshaqtnz" = 18, 16
"gzkxqrdq\\uko\"mrtst" = 22, 18
"znjcomvy\x16hhsenmroswr" = 25, 20
"clowmtra" = 10, 8
"\xc4" = 6, 1
"jpavsevmziklydtqqm" = 20, 18
"egxjqytcttr\\ecfedmmovkyn\"m" = 30, 26
"mjulrvqgmsvmwf" = 16, 14
"o\\prxtlfbatxerhev\xf9hcl\x44rzmvklviv" = 40, 31
"lregjexqaqgwloydxdsc\\o\"dnjfmjcu" = 35, 31
"lnxluajtk\x8desue\\k\x7abhwokfhh" = 34, 25
"wrssfvzzn\"llrysjgiu\"npjtdli" = 31, 27
"\x67lwkks" 

In [31]:
test_strings = ['""', '"abc"', r'"aaa\"aaa"', r'"\x27"']
encoded_strings = []

for string in test_strings:
    current_char = 1
    final_string = "\"\\\""
    while current_char < len(string) - 1:
        if string[current_char] == "\\" or string[current_char] == "\"":
            final_string += "\\"
        final_string += string[current_char]
        current_char += 1
    final_string += "\\\""
    if len(string) > 0:
        final_string += string[-1]
    encoded_strings.append(final_string)

total_test_length = 0
for string in test_strings:
    print(string, "\\x" in string)
    if "\\x" in string:
        total_test_length += (len(string) + 3)
    else:
        total_test_length += len(string)
    
total_encoded_length = 0
for string in encoded_strings:
    print(string, "\\x" in string)
    if "\\x" in string:
        total_encoded_length += (len(string) + 3)
    else:
        total_encoded_length += len(string)

for index in range(0, len(test_strings)):
    print(f"{test_strings[index]} ({len(test_strings[index])}) = {encoded_strings[index]}  ({len(encoded_strings[index])})")


print(f"Total number of characters in encoded: {total_encoded_length}, total number of characters in code: {total_test_length}. Characters in code minus characters in memory: {total_encoded_length - total_test_length}")

"" False
"abc" False
"aaa\"aaa" False
"\x27" True
"\"\"" False
"\"abc\"" False
"\"aaa\\\"aaa\"" False
"\"\\x27\"" True
"" (2) = "\"\""  (6)
"abc" (5) = "\"abc\""  (9)
"aaa\"aaa" (10) = "\"aaa\\\"aaa\""  (16)
"\x27" (6) = "\"\\x27\""  (11)
Total number of characters in encoded: 45, total number of characters in code: 26. Characters in code minus characters in memory: 19
