# Day 1: Trebuchet?!
Something is wrong with global snow production, and you've been selected to take a look. The Elves have even given you a map; on it, they've used stars to mark the top fifty locations that are likely to be having problems.

You've been doing this long enough to know that to restore snow operations, you need to check all fifty stars by December 25th.

Collect stars by solving puzzles. Two puzzles will be made available on each day in the Advent calendar; the second puzzle is unlocked when you complete the first. Each puzzle grants one star. Good luck!

You try to ask why they can't just use a weather machine ("not powerful enough") and where they're even sending you ("the sky") and why your map looks mostly blank ("you sure ask a lot of questions") and hang on did you just say the sky ("of course, where do you think snow comes from") when you realize that the Elves are already loading you into a trebuchet ("please hold still, we need to strap you in").

As they're making the final adjustments, they discover that their calibration document (your puzzle input) has been amended by a very young Elf who was apparently just excited to show off her art skills. Consequently, the Elves are having trouble reading the values on the document.

The newly-improved calibration document consists of lines of text; each line originally contained a specific calibration value that the Elves now need to recover. On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.

For example:

1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet

In this example, the calibration values of these four lines are 12, 38, 15, and 77. Adding these together produces 142.

Consider your entire calibration document. What is the sum of all of the calibration values?

### Imports

In [2]:
import re

import pandas as pd
from aocd import get_data, submit

### get data

In [3]:
input_string = get_data(day=1, year=2023)

### Puzzle Solution

First of all we have to split the input string on linebreaks in order to get the individual strings.

In [4]:
list_of_strings = input_string.split('\n')

After that I tried a few things to get the first character in the string, which is a digit.
In order to do that, we loop through the characters until we find the first digit and return it.

In [5]:
def extract_first_digit(text):
    for char in text:
        if char.isdigit():
            return str(char)

    return None

Now we also have to find the last character. We can use a simple trick to get it done. The last character is actually the first character from behind. So we just reverse the string with slicing notation and call the `extract_first_digit` function on it.

In [6]:
def extract_last_digit(text):
    reversed_text = text[::-1]
    return extract_first_digit(reversed_text)

Here we can test the function on a few sample inputs.

In [7]:
sample = list_of_strings[0]
sample

'threerznlrhtkjp23mtflmbrzq395three'

In [8]:
extract_first_digit(sample)

'2'

In [9]:
extract_last_digit(sample)

'5'

It is also important that we return a string in order to concatenate them together. If we returned integers, the following code would just sum the digits. But that's not what we want.

In [10]:
extract_first_digit(sample) + extract_last_digit(sample)

'25'

For convenience, I put the data into a pandas.DataFrame, because we can now efficiently apply the functions on all values. (Also I have a Data Science Background, so pandas is where I feel comfortable)

In [11]:
df = pd.DataFrame()
df['strings'] = list_of_strings
df.head()

Unnamed: 0,strings
0,threerznlrhtkjp23mtflmbrzq395three
1,9sevenvlttm
2,3twochzbv
3,mdxdlh5six5nqfld9bqzxdqxfour
4,422268


In [12]:
df['first_digit'] = df['strings'].apply(extract_first_digit)
df['last_digit'] = df['strings'].apply(extract_last_digit)

In [13]:
df['two_digits'] = df['first_digit'] + df['last_digit']
df.head()

Unnamed: 0,strings,first_digit,last_digit,two_digits
0,threerznlrhtkjp23mtflmbrzq395three,2,5,25
1,9sevenvlttm,9,9,99
2,3twochzbv,3,3,33
3,mdxdlh5six5nqfld9bqzxdqxfour,5,9,59
4,422268,4,8,48


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   strings      1000 non-null   object
 1   first_digit  1000 non-null   object
 2   last_digit   1000 non-null   object
 3   two_digits   1000 non-null   object
dtypes: object(4)
memory usage: 31.4+ KB


We can see that the two_digits are still a string (or object which is most of the time the same). But now we want to calculate the sum of all the two_digit numbers. In order to do that we have to convert the data type to interger.

In [15]:
df['two_digit_number'] = df['two_digits'].astype(int)

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   strings           1000 non-null   object
 1   first_digit       1000 non-null   object
 2   last_digit        1000 non-null   object
 3   two_digits        1000 non-null   object
 4   two_digit_number  1000 non-null   int32 
dtypes: int32(1), object(4)
memory usage: 35.3+ KB


Now we can sum the numbers and get our sum of calibration_values.

In [17]:
sum_of_calibration_values = df['two_digit_number'].sum()

In [18]:
sum_of_calibration_values

55971

## Submission
Lastly we can submit our answer and chekc if it's correct.

In [48]:
submit(sum_of_calibration_values)

answer a: None
submitting for part a
coerced int64 value 55971 for 2023/01


[32mThat's the right answer!  You are one gold star closer to restoring snow operations. [Continue to Part Two][0m


<urllib3.response.HTTPResponse at 0x1e9d8190ca0>

# --- Part Two ---
Your calculation isn't quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid "digits".

Equipped with this new information, you now need to find the real first and last digit on each line. For example:

two1nine
eightwothree
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen
In this example, the calibration values are 29, 83, 13, 24, 42, 14, and 76. Adding these together produces 281.

What is the sum of all of the calibration values?

## Puzzle Solution

Now we have to account for written digits in the input strings. How can we do that? \n We could just replace the digit strings with the actual digits. After that our solution for part 1 should work fine. So lets try it.

In [19]:
digit_dict = {
    'one':'1',
    'two':'2',
    'three':'3',
    'four':'4',
    'five':'5',
    'six':'6',
    'seven':'7',
    'eight':'8',
    'nine':'9'
}

In [20]:
sample

'threerznlrhtkjp23mtflmbrzq395three'

In [21]:
adjusted_sample = sample

for key in digit_dict.keys():
    if key in sample:
        adjusted_sample = adjusted_sample.replace(key, digit_dict[key])

adjusted_sample

'3rznlrhtkjp23mtflmbrzq3953'

In [22]:
def replace_written_digits(text, replace_dict):
    adjusted_text = text

    for key in replace_dict.keys():
        if key in text:
            adjusted_text = adjusted_text.replace(key, digit_dict[key])

    return adjusted_text

In [23]:
replace_written_digits(sample, digit_dict)

'3rznlrhtkjp23mtflmbrzq3953'

In [24]:
df['adjusted_strings'] = df['strings'].apply(replace_written_digits, replace_dict=digit_dict)

In [25]:
df['adjusted_first_digit'] = df['adjusted_strings'].apply(extract_first_digit)
df['adjusted_last_digit'] = df['adjusted_strings'].apply(extract_last_digit)
df['adjusted_two_digits'] = df['adjusted_first_digit'] + df['adjusted_last_digit']
df['adjusted_two_digit_number'] = df['adjusted_two_digits'].astype(int)

In [26]:
df.head()

Unnamed: 0,strings,first_digit,last_digit,two_digits,two_digit_number,adjusted_strings,adjusted_first_digit,adjusted_last_digit,adjusted_two_digits,adjusted_two_digit_number
0,threerznlrhtkjp23mtflmbrzq395three,2,5,25,25,3rznlrhtkjp23mtflmbrzq3953,3,3,33,33
1,9sevenvlttm,9,9,99,99,97vlttm,9,7,97,97
2,3twochzbv,3,3,33,33,32chzbv,3,2,32,32
3,mdxdlh5six5nqfld9bqzxdqxfour,5,9,59,59,mdxdlh565nqfld9bqzxdqx4,5,4,54,54
4,422268,4,8,48,48,422268,4,8,48,48


In [27]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   strings                    1000 non-null   object
 1   first_digit                1000 non-null   object
 2   last_digit                 1000 non-null   object
 3   two_digits                 1000 non-null   object
 4   two_digit_number           1000 non-null   int32 
 5   adjusted_strings           1000 non-null   object
 6   adjusted_first_digit       1000 non-null   object
 7   adjusted_last_digit        1000 non-null   object
 8   adjusted_two_digits        1000 non-null   object
 9   adjusted_two_digit_number  1000 non-null   int32 
dtypes: int32(2), object(8)
memory usage: 70.4+ KB


In [28]:
adjusted_sum_of_calibration_values = df['adjusted_two_digit_number'].sum()

In [29]:
adjusted_sum_of_calibration_values

54303

# Submission Part 2

In [86]:
submit(adjusted_sum_of_calibration_values)

answer a: 55971
submitting for part b (part a is already completed)
coerced int64 value 54303 for 2023/01
wrong answer: That's not the right answer; your answer is too low.  If you're stuck, make sure you're using the full input data; there are also some general tips on the about page, or you can ask for hints on the subreddit.  Please wait one minute before trying again. [Return to Day 1]


[31mThat's not the right answer; your answer is too low.  If you're stuck, make sure you're using the full input data; there are also some general tips on the about page, or you can ask for hints on the subreddit.  Please wait one minute before trying again. [Return to Day 1][0m


<urllib3.response.HTTPResponse at 0x1e9db73f040>

The answer is wrong, so what did we miss?
There are actually strings that have overlapping written digits. That means that if we replace one of them the other written digits gets "destroyed" and we can not find it anymore. Reading the Puzzle description and the examples carefully would have helped. One lesson learned.

In [30]:
replace_written_digits('threefourrznlrhtkjp23mtflmbrzq3953', digit_dict)

'34rznlrhtkjp23mtflmbrzq3953'

In [31]:
replace_written_digits('oneightrznlrhtkjp23mtflmbrzq3953', digit_dict)

'1ightrznlrhtkjp23mtflmbrzq3953'

Let's try to find all the written digits. Then just insert the "real digit" before the first and after the last occurrences.
This should keep the order intact and all the other code should work just fine.

First lets write a function that inserts a string at a given index. We can use slicing to accomplish that.

In [34]:
def insert_string(insert_string, index, string):
    return string[:index] + insert_string + string[index:]

Now we need a way to find the starting index for our written digits.

In [36]:
sample = 'oneighthreerznlrhtkjp23mtflmbrzq3953'

In [33]:
sample.find('one')

0

In [35]:
insert_string('1', sample.find('one'), sample)

'1oneightrznlrhtkjp23mtflmbrzq3953'

In [37]:
insert_string('3', sample.find('three'), sample)

'oneigh3threerznlrhtkjp23mtflmbrzq3953'

With the regex findall function we can efficiently get all the occurrences of our written digits.
Conveniently the resulting list is in order. So we can just grab the first and last element of the list, and we have our first and last written digits.

I got the regex pattern from __[stackoverflow](https://stackoverflow.com/questions/33406313/how-to-match-any-string-from-a-list-of-strings-in-regular-expressions-in-python)__.

In [38]:
list_of_substrings = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine']
pattern = r'(?=(' + '|'.join(list_of_substrings) + r'))'

In [39]:
re.findall(pattern, sample)

['one', 'eight', 'three']

In [46]:
sample

'oneighthreerznlrhtkjp23mtflmbrzq3953'

In [43]:
sample.find('three')

6

In order to get the last written digit, we need to use our slicing trick again.
Otherwise, we might get the index of the first occurrence of a written digit, if there is the same written digit multiple times.

See the example:

In [150]:
example = 'one45three67three89'

In [151]:
re.findall(pattern, example)

['one', 'three', 'three']

In [152]:
example.find('three')

5

The index is 5 instead of 16 as desired.

We have to reverse the string as well as the string for the written digit, in order to find the index.

In [155]:
example[::-1]

'98eerht76eerht54eno'

In [156]:
'three'[::-1]

'eerht'

In [157]:
example[::-1].find('three'[::-1])

2

We also have to count from the back of the string (not the reversed one) to get the right index.
With the slicing notation we can just add a minus in front of the index.

In [158]:
example[:-2]

'one45three67three'

In [159]:
insert_string('3', -2, example)

'one45three67three389'

Now we put all that together into a function. After a little testing, I noticed some weird behaviour, in cases where the last written digit is the very last character in the string.
This is because then we get the index 0 and our nifty trick with counting from behind (negative indexing) results in -0, which is also 0. Therefore, I included an if statement to account for those cases.
Then everything works as expected.

In [160]:
def replace_first_last_written_digits(text):
    written_digits = re.findall(pattern, text)
    if len(written_digits) > 0:
        first = written_digits[0]
        last = written_digits[-1]
        first_idx = text.find(first)
        text_inserted = insert_string(digit_dict[first], first_idx, text)
        last_idx = text_inserted[::-1].find(last[::-1])
        # when the last_idx is 0 we would insert the string at the beginning because -0 == 0, in order to avoid that we check for that case
        if last_idx == 0:
            return text_inserted + digit_dict[last]
        else:
            return insert_string(digit_dict[last], -last_idx, text_inserted)
    else:
        return text

In [161]:
replace_first_last_written_digits(sample)

'1oneighthree3rznlrhtkjp23mtflmbrzq3953'

In [162]:
replace_first_last_written_digits('9seventhree')

'97seventhree3'

In [163]:
replace_first_last_written_digits('6hvfbrqccktfqhnone7btwo')

'6hvfbrqccktfqhn1one7btwo2'

In [164]:
re.findall(pattern, 'one')

['one']

In [165]:
'one'.find('one')

0

In [166]:
'1one'[::-1]

'eno1'

In [167]:
'one'[::-1]

'eno'

In [168]:
'1one'[::-1].find('one'[::-1])

0

In [169]:
replace_first_last_written_digits('12345')

'12345'

In [170]:
replace_first_last_written_digits('')

''

In [171]:
df['adjusted_strings'] = df['strings'].apply(replace_first_last_written_digits)

In [172]:
df['adjusted_first_digit'] = df['adjusted_strings'].apply(extract_first_digit)
df['adjusted_last_digit'] = df['adjusted_strings'].apply(extract_last_digit)
df['adjusted_two_digits'] = df['adjusted_first_digit'] + df['adjusted_last_digit']
df['adjusted_two_digit_number'] = df['adjusted_two_digits'].astype(int)

In [173]:
df[['strings', 'adjusted_strings', 'adjusted_first_digit',
    'adjusted_last_digit', 'adjusted_two_digits',
    'adjusted_two_digit_number']].sample(10)

Unnamed: 0,strings,adjusted_strings,adjusted_first_digit,adjusted_last_digit,adjusted_two_digits,adjusted_two_digit_number
777,six4two,6six4two2,6,2,62,62
33,threezdbbhkrnrq4seven,3threezdbbhkrnrq4seven7,3,7,37,37
979,cpcnkvdbrqrxtfnmzbqgffivesix91fivehgrv,cpcnkvdbrqrxtfnmzbqgf5fivesix91five5hgrv,5,5,55,55
776,4fivefiveglchzczdstone,45fivefiveglchzczdstone1,4,1,41,41
65,eight4mscvrpr7,8eight84mscvrpr7,8,7,87,87
824,six5eightjtnq,6six5eight8jtnq,6,8,68,68
274,6pjxcpkpdh,6pjxcpkpdh,6,6,66,66
814,fourtwolkxrtzdsninenine5pznzrqbcmnph,4fourtwolkxrtzdsninenine95pznzrqbcmnph,4,5,45,45
213,kqfxgjpnttwo84one,kqfxgjpnt2two84one1,2,1,21,21
643,sevensevenmjrzvbkkknkfbq2seven5vms5,7sevensevenmjrzvbkkknkfbq2seven75vms5,7,5,75,75


In [174]:
adjusted_sum_of_calibration_values = df['adjusted_two_digit_number'].sum()
adjusted_sum_of_calibration_values

54719

# Submission Part 2 Second Try

In [146]:
submit(adjusted_sum_of_calibration_values)

answer a: 55971
submitting for part b (part a is already completed)
coerced int64 value 54719 for 2023/01


[32mThat's the right answer!  You are one gold star closer to restoring snow operations.You have completed Day 1! You can [Shareon
  Twitter
Mastodon] this victory or [Return to Your Advent Calendar].[0m


<urllib3.response.HTTPResponse at 0x203c4da1ed0>

## Great! It worked!

I hope you enjoyed this blog, but either way let me know.
I am also super interested in your solutions and in any feedback/advice/question you might have for me.
Have fun learning!