# Advent of Code 2024

# Puzzle - part 1

**--- Day 1: Historian Hysteria ---**

The **Chief Historian** is always present for the big Christmas sleigh launch, but nobody has seen him in months! Last anyone heard, he was visiting locations that are historically significant to the North Pole; a group of Senior Historians has asked you to accompany them as they check the places they think he was most likely to visit.

As each location is checked, they will mark it on their list with a *star*. They figure the Chief Historian **must** be in one of the first fifty places they'll look, so in order to save Christmas, you need to help them get *fifty stars* on their list before Santa takes off on December 25th.

Collect stars by solving puzzles. Two puzzles will be made available on each day in the Advent calendar; the second puzzle is unlocked when you complete the first. Each puzzle grants *one star*. Good luck!

You haven't even left yet and the group of Elvish Senior Historians has already hit a problem: their list of locations to check is currently **empty**. Eventually, someone decides that the best place to check first would be the Chief Historian's office.

Upon pouring into the office, everyone confirms that the Chief Historian is indeed nowhere to be found. Instead, the Elves discover an assortment of notes and lists of historically significant locations! This seems to be the planning the Chief Historian was doing before he left. Perhaps these notes can be used to determine which locations to search?

Throughout the Chief's office, the historically significant locations are listed not by name but by a unique number called the location ID. To make sure they don't miss anything, The Historians split into two groups, each searching the office and trying to create their own complete list of location IDs.

There's just one problem: by holding the two lists up **side by side** (your puzzle input), it quickly becomes clear that the lists aren't very similar. Maybe you can help The Historians reconcile their lists?

For example:
|   |   |
|---|---|
| 3 | 4 |
| 4 | 3 |
| 2 | 5 |
| 1 | 3 |
| 3 | 9 |
| 3 | 3 |

Maybe the lists are only off by a small amount! To find out, pair up the numbers and measure how far apart they are. Pair up the **smallest number in the left list** with the **smallest number in the right list**, then the **second-smallest left number** with the **second-smallest right number**, and so on.

Within each pair, figure out **how far apart** the two numbers are; you'll need to **add up all of those distances**. For example, if you pair up a 3 from the left list with a 7 from the right list, the distance apart is 4; if you pair up a 9 with a 3, the distance apart is 6.

In the example list above, the pairs and distances would be as follows:

- The smallest number in the left list is `1`, and the smallest number in the right list is `3`. The distance between them is `2`.
- The second-smallest number in the left list is `2`, and the second-smallest number in the right list is another `3`. The distance between them is `1`.
- The third-smallest number in both lists is `3`, so the distance between them is `0`.
- The next numbers to pair up are `3` and `4`, a distance of `1`.
- The fifth-smallest numbers in each list are `3` and `5`, a distance of `2`.
- Finally, the largest number in the left list is `4`, while the largest number in the right list is 9; these are a distance `5` apart.

To find the total distance between the left list and the right list, add up the distances between all of the pairs you found. In the example above, this is `2 + 1 + 0 + 1 + 2 + 5`, a total distance of `11`!

Your actual left and right lists contain many location IDs.
**What is the total distance between your lists?**

## Input

In [1]:
# Load the input file

with open('input - Day 1.txt', 'r') as file:
    input = file.read()


print(input[:139])

13432   99527
85422   64009
79131   11256
27674   82211
65599   57936
12692   67107
29421   44641
48876   12545
62591   59319
16202   93012


## Input Formatting

Right now we just have a text file with values from both lists. What we want is two arrays each containing values from only one list. First let us separate each line so that we can later iterate over each line.

In [2]:
from pprint import pprint

input_lines = input.split('\n')
pprint(input_lines[:10])

print(f"\nThere are {len(input_lines)} lines")

['13432   99527',
 '85422   64009',
 '79131   11256',
 '27674   82211',
 '65599   57936',
 '12692   67107',
 '29421   44641',
 '48876   12545',
 '62591   59319',
 '16202   93012']

There are 1001 lines


Okay so now we can iterate over each line, but there is still a problem.
Each line consits valuew from both lists.

To separate them we can iterate over all lines and use **regex** to split each line by white space.
The reason why I am using **regex** here is that while the whitespace (the gap) between numbers in each line seems consistend I do not know that this is 100% the case for every single one of them. 
I could just use `.split("   ")` and then check if lengths of each list match with the length of `input_lines` but using **regex** is simpler and cleaner

In [3]:
# First let's check the last line, if it's just \n then let's remove it
pprint(input_lines[-1])

del input_lines[-1]

print(len(input_lines))

''
1000


In [4]:
import re

id_list_1 = []
id_list_2 = []

for line in input_lines:
    ids = re.split(r"\s+", line)    

    id_list_1.append(ids[0])
    id_list_2.append(ids[1])
    

Let's check lengths to make sure we didn't lose anything.

In [14]:
assert len(id_list_1) == len(id_list_2),\
    f"The two lits are not of the same length: {len(id_list_1)} and {len(id_list_2)}"

assert len(id_list_1) == len(input_lines),\
    f"The lists are not of same length ({len(id_list_1)}) as the input ({len(input_lines)})"

Okay so far so good.
Now we have two separate arrays/lists of... lists (I'll call them ID lists from now on).
Since we want to compare the smallest digit from each list and then the second smallest and so on let's sort them.
This way when we iterate over them we will always get the two **i-th** smallest values from each list.
Then all that's left is to calculate the distance between them.

Okay first let's convert the lists two numbers.
Could do everything in a single loop but why over-complicate this?

In [8]:
id_list_1 = [int(id) for id in id_list_1]
id_list_2 = [int(id) for id in id_list_2]

In [10]:
id_list_1.sort()
id_list_2.sort()

pprint(id_list_1[:10])
pprint(id_list_2[:10])

[10025, 10093, 10108, 10244, 10344, 10350, 10384, 10482, 10554, 10574]
[10038, 10217, 10231, 10264, 10370, 10384, 10384, 10384, 10384, 10384]


In [12]:
distance_between_ids = []

for id_1, id_2 in zip(id_list_1, id_list_2):
    
    # I assume we want only positive distances hence abs()
    gap = abs(id_1 - id_2)
    distance_between_ids.append(gap)

Now we just sum all the distances / gaps. 

In [13]:
print(f"The total distance is: {sum(distance_between_ids)}")

The total distance is: 1765812


# Puzzle - part 2

**--- Part Two ---**

Your analysis only confirmed what everyone feared: the two lists of location IDs are indeed very different.

Or are they?

The Historians can't agree on which group made the mistakes **or** how to read most of the Chief's handwriting, but in the commotion you notice an interesting detail: a lot of location IDs appear in both lists! Maybe the other numbers aren't location IDs at all but rather misinterpreted handwriting.

This time, you'll need to figure out exactly how often each number from the left list appears in the right list. Calculate a total **similarity score** by adding up each number in the left list after multiplying it by the number of times that number appears in the right list.

Here are the same example lists again:

|   |   |
|---|---|
| 3 | 4 |
| 4 | 3 |
| 2 | 5 |
| 1 | 3 |
| 3 | 9 |
| 3 | 3 |

For these example lists, here is the process of finding the similarity score:

The first number in the left list is `3`. It appears in the right list three times, so the similarity score increases by `3 * 3 = 9`.
The second number in the left list is `4`. It appears in the right list once, so the similarity score increases by `4 * 1 = 4`.
The third number in the left list is `2`. It does not appear in the right list, so the similarity score does not increase `(2 * 0 = 0)`.
The fourth number, `1`, also does not appear in the right list.
The fifth number, `3`, appears in the right list three times; the similarity score increases by `9`.
The last number, `3`, appears in the right list three times; the similarity score again increases by `9`.
So, for these example lists, the similarity score at the end of this process is **`31`** `(9 + 4 + 0 + 0 + 9 + 9)`.

Once again consider your left and right lists. What is their similarity score?

______________________________

Okay so we still have our `id_list_1` and `id_list_2`.
They are sorted, which doesn't help us, but they are already numbers so let's re-use them.

There are two ways to solve this really, a **good one** and a **bad one**.
The **bad one** is we do a loop over `id_list_1` and then we nest another loop over `id_list_2`, checking every value in the first list with all values of the second list. Obviosly this is not great as it has a complexity of **O(n^2)** - not good at all.

We can be clever about this, what we actually want to know is the frequency of each number in `id_list_2`. Then while iterating over `id_list_1` we can check the frequency of a particular `id`. Multiply it out, add it up and boom... you have your total similarity score, and all the the low cost of **O(2n)**

In [19]:
from collections import defaultdict 

# This will automatically initialize missing keys with 0
frequencies = defaultdict(int)

# Calculate frequencies of numbers in id_list_2
for id in id_list_2:
    frequencies[id] += 1

Let's check how we did. I only want the to see `10` elements though.

In [27]:
for idx, (key, value) in enumerate(frequencies.items()):
    print(f"{key}:  {value}")

    if idx>10:
        break
    

10038:  1
10217:  1
10231:  1
10264:  1
10370:  1
10384:  10
10568:  1
10610:  1
10641:  1
10789:  1
10828:  1
10875:  1


Perfect now we just need to select those frequencies whose numbers (keys) also show up in `id_list_1`.
While we are at it we might as do the multiplication and add it up for the final answer.

No need to worry about missing keys exception since we are using `defaultdict(ing)` - it will automatically initialize missing values to `0`.

In [31]:
total_similarity_score = 0

for id in id_list_1:

    similarity = id * frequencies[id]
    total_similarity_score += similarity

print(f"Total similarity score: {total_similarity_score}")

Total similarity score: 20520794
