In [1]:
from aocd import get_data

Pair up the smallest number in the left list with the smallest number in the right list, then the second-smallest left number with the second-smallest right number, and so on.

Within each pair, figure out how far apart the two numbers are; you'll need to add up all of those distances. 

For example, if you pair up a 3 from the left list with a 7 from the right list, the distance apart is 4; if you pair up a 9 with a 3, the distance apart is 6.

To find the total distance between the left list and the right list, add up the distances between all of the pairs you found.

In [2]:
import numpy as np

def calc_sum_list_distances(list1, list2):
    if len(list1) != len(list2):
        raise ValueError("Lists must be of the same length to calculate distance.")
    
    arr1 = np.sort(np.array(list1))
    arr2 = np.sort(np.array(list2))
    
    total_distance = np.sum(np.abs(arr1 - arr2))
    
    return total_distance

In [3]:
data = np.genfromtxt(get_data(day=1, year=2024).splitlines(), dtype=int)

In [4]:
data

array([[64430, 75582],
       [87936, 20843],
       [98310, 72035],
       ...,
       [47390, 75651],
       [94550, 80760],
       [61539, 20843]])

In [5]:
left_column = data[:, 0]
right_column = data[:, 1]

In [6]:
calc_sum_list_distances(left_column, right_column)

np.int64(2057374)

---

Now figure out exactly how often each number from the left list appears in the right list. Calculate a total similarity score by adding up each number in the left list after multiplying it by the number of times that number appears in the right list.

We create a boolean matrix where each row corresponds to a value in `left_column`, 
and each column corresponds to a value in `right_column`. 

An element in the matrix is `True` if the `left_column` value for that row matches the `right_column` value for that column.

---

First apply my approach to the example so I can see it's doing what I want

In [7]:
example_string = """
3   4
4   3
2   5
1   3
3   9
3   3
"""

In [8]:
example_data = np.genfromtxt(example_string.splitlines(), dtype=int)

In [9]:
example_data

array([[3, 4],
       [4, 3],
       [2, 5],
       [1, 3],
       [3, 9],
       [3, 3]])

In [10]:
left = example_data[:, 0]
right = example_data[:, 1]

In [11]:
right

array([4, 3, 5, 3, 9, 3])

In [12]:
comparison = left[:, None] == right[None, :]; comparison

array([[False,  True, False,  True, False,  True],
       [ True, False, False, False, False, False],
       [False, False, False, False, False, False],
       [False, False, False, False, False, False],
       [False,  True, False,  True, False,  True],
       [False,  True, False,  True, False,  True]])

Now we perform the similarity score by getting occurence counts by summing `True` values in each row and multiplying against the value in the `left_column`

In [13]:
occurrence = np.sum(comparison, axis=1); occurrence

array([3, 1, 0, 0, 3, 3])

In [14]:
similarity = np.dot(left, occurrence); similarity

np.int64(31)

---

Now apply it to my actual data:

In [15]:
comparison_matrix = left_column[:, None] == right_column[None, :]; comparison_matrix

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])

In [16]:
occurrence_counts = np.sum(comparison_matrix, axis=1)
similarity_score = np.dot(left_column, occurrence_counts)

In [17]:
similarity_score

np.int64(23177084)

... and yet this answer is wrong.