# ***Advent of Code 2023 - Day 1***

***Objective***:
Obtain `calibration value` by combining the `first digit` & `last digit` to form a single number.

For example:

| Value | Calibrate values |
|-------|-----------------|
| 1abc2 | 12 |
| pqr3stu8vwx | 38 |
| a1b2c3d4e5f | 15 |
| treb7uchet | 77 |

In the end, what is the total sum of calibration value of given input?

## ***Day 1 - Part A***

### Step 1: Obtain input data

In [26]:
# Obtain input data from input.txt file
import pandas as pd
import numpy as np

df = pd.read_csv('input.txt', header=None)
df.head()

Unnamed: 0,0
0,xt36five77
1,two8five6zfrtjj
2,eightthree8fiveqjgsdzgnnineeight
3,7chmvlhnpfive
4,1tcrgthmeight5mssseight


### Step 2: Extract digit from each line

In [27]:
# Using regular expression to extract numerical character from the string on each line
# To achieve best performance, we use vectorized operation to loop through the entire dataframe
import re
df['digit'] = df[0].apply(lambda x: re.findall(r'\d', x))
print(df.head())

                                  0         digit
0                        xt36five77  [3, 6, 7, 7]
1                   two8five6zfrtjj        [8, 6]
2  eightthree8fiveqjgsdzgnnineeight           [8]
3                     7chmvlhnpfive           [7]
4           1tcrgthmeight5mssseight        [1, 5]


In [28]:
# From the extracted numerical character, form the calibration values of 1st & last digit
# Edge cases: 
# If the list is empty, calibration value = 0
# If the list has only 1 element, calibration value = dupplicate of the digit. [8] -> 88
def calibrate(digit):
    if len(digit) == 0:
        return 0
    elif len(digit) == 1:
        return int(digit[0]) * 11
    else:
        return int(digit[0]) * 10 + int(digit[-1])
df['calibrate'] = df['digit'].apply(calibrate)
print(df.head())

                                  0         digit  calibrate
0                        xt36five77  [3, 6, 7, 7]         37
1                   two8five6zfrtjj        [8, 6]         86
2  eightthree8fiveqjgsdzgnnineeight           [8]         88
3                     7chmvlhnpfive           [7]         77
4           1tcrgthmeight5mssseight        [1, 5]         15


In [29]:
# Calculate total of df['calibrate'] column
total = df['calibrate'].sum()
print(total)

54573


## ***Day 1 - Part B***

So the request is now more advance. Instead of finding numerical values within text, you need to identify digits represented as letters spelled out. Such as `one`, `two` etc...

For example df['1'] = 'two8five6zfrtjj', calibrate value should be 26, not 86 as Day 1A.

We will revise the approach by adjusting the regular expression 

In [30]:
# Import data input
df_partB = pd.read_csv('input.txt', header=None)
df_partB.head()

Unnamed: 0,0
0,xt36five77
1,two8five6zfrtjj
2,eightthree8fiveqjgsdzgnnineeight
3,7chmvlhnpfive
4,1tcrgthmeight5mssseight


In [31]:
# Using new regular expression to extract numerical character.
df_partB['digit'] = df_partB[0].apply(lambda x: re.findall(r'(?=(\d|one|two|three|four|five|six|seven|eight|nine))', x))
print(df_partB.head())

                                  0                                 digit
0                        xt36five77                    [3, 6, five, 7, 7]
1                   two8five6zfrtjj                     [two, 8, five, 6]
2  eightthree8fiveqjgsdzgnnineeight  [eight, three, 8, five, nine, eight]
3                     7chmvlhnpfive                             [7, five]
4           1tcrgthmeight5mssseight                  [1, eight, 5, eight]


In [32]:
# Now we need to convert from word to number
# First we create a function to convert word to number, not just from '0'-'9' to 0-9 but also 'zero'-'nine' to 0-9
def convert_to_number(array):
    word_to_num = {
        'zero': 0, 'one': 1, 'two': 2, 'three': 3, 'four': 4,
        'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9
    }
    def convert(word):
        if word in word_to_num:
            return word_to_num[word]
        else:
            return int(word)
    return [convert(x) for x in array]

# Apply convert_to_number on every row of df dataframe
df_partB['digit_converted'] = df_partB['digit'].apply(convert_to_number)
df_partB.head()

Unnamed: 0,0,digit,digit_converted
0,xt36five77,"[3, 6, five, 7, 7]","[3, 6, 5, 7, 7]"
1,two8five6zfrtjj,"[two, 8, five, 6]","[2, 8, 5, 6]"
2,eightthree8fiveqjgsdzgnnineeight,"[eight, three, 8, five, nine, eight]","[8, 3, 8, 5, 9, 8]"
3,7chmvlhnpfive,"[7, five]","[7, 5]"
4,1tcrgthmeight5mssseight,"[1, eight, 5, eight]","[1, 8, 5, 8]"


In [33]:
# From the extracted numerical character, form the calibration values of 1st & last digit
# Edge cases: 
# If the list is empty, calibration value = 0
# If the list has only 1 element, calibration value = dupplicate of the digit. [8] -> 88
def calibrate(digit):
    if len(digit) == 0:
        return 0
    elif len(digit) == 1:
        return int(digit[0]) * 11
    else:
        return int(digit[0]) * 10 + int(digit[-1])
df_partB['calibrate'] = df_partB['digit_converted'].apply(calibrate)
print(df_partB.head())

                                  0                                 digit  \
0                        xt36five77                    [3, 6, five, 7, 7]   
1                   two8five6zfrtjj                     [two, 8, five, 6]   
2  eightthree8fiveqjgsdzgnnineeight  [eight, three, 8, five, nine, eight]   
3                     7chmvlhnpfive                             [7, five]   
4           1tcrgthmeight5mssseight                  [1, eight, 5, eight]   

      digit_converted  calibrate  
0     [3, 6, 5, 7, 7]         37  
1        [2, 8, 5, 6]         26  
2  [8, 3, 8, 5, 9, 8]         88  
3              [7, 5]         75  
4        [1, 8, 5, 8]         18  


In [34]:
# Calculate total of calibrated values
total = df_partB['calibrate'].sum()
print(total)

54591
