# Day 8

## Getting Started

Navigate to [Advent of Code Day 8](https://adventofcode.com/2025/day/8)

Save the problem input (I have saved as a text file called `AOC25_8_in.txt`)

## Understanding the problem

Today's problem leads to a classic computer science data structure and algorithm used for bringing sets together efficiently: **[disjoint-set union (DSU, also known as union-find)](https://en.wikipedia.org/wiki/Disjoint-set_data_structure)**.

DSU is quite simple in operation. It maintains a list of 'parents' and a list of 'sizes' such that when a point is merged with another point:
- we **find** the 'ultimate parent' of each point (the parent of the parent of the parent etc)
- we merge (form a **union** of) the two 'ultimate parents' by joining the set of smaller size into the set of bigger size, so that the 'ultimate parent' belonging to the set of bigger size becomes the ultimate parent of everything in both sets

Part 1 requires us to:
- find the 1000 pairs of points that are closest in terms of [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance): since the distance is likely not to be an integer, we can do this extra precisely by comparing the squares of the Euclidean distances (which _will_ be an integer), since a greater distance will also have a greater square distance
- connect those points: the DSU structure will do this super-efficiently (close to constant time), ignoring any points already connected and merging the sets where they are not yet connected
- find the three largest set sizes after these 1000 connections have been made (the DSU maintains a list of the set sizes, so this is trivial)

Part 2 is very similar except that we don't stop at 1000 connections. We plough on until there is only one giant set of all the points. We can test each iteration whether this has happened by forming the union and then finding the new ultimate parent of both points. If that ultimate parent has size $N$, where $N$ is our number of points, we've finished, and can trivially find our answer.

## Part 1

Read in the inputs (see day 1 for explanation):

In [1]:
# open the file
import sys
read = sys.stdin.read
f = open("AOC25_8_in.txt")

# read in the co-ordinates
points = [list(map(int, x.split(','))) for x in f.read().split('\n')]

Now we need to define our functions.

#### Square of Euclidean distance function

This will be used to find the points closest together:

In [2]:
# define a function that calculates the square distance between two points
def sq_dist(p1, p2):
    x1, y1, z1 = p1
    x2, y2, z2 = p2
    return (x2 - x1)**2 + (y2 - y1)**2 + (z2 - z1)**2

#### Find function

This is one of the two key functions of our DSU data structure, and it returns the 'ultimate parent' of a point (or node, in graph terms):

In [3]:
# find the ultimate parent of point a
def find(a):
    
    # the ultimate parent is found when its parent is itself
    if a != p[a]:
        p[a] = find(p[a])
    
    return p[a]

#### Union function

This is the other key function of our DSU data structure, and it merges two sets of points together (if necessary):

In [4]:
# join points a and b
def union(a, b):
    
    # we're actually joining the ultimate parents of a and b
    a, b = find(a), find(b)
    
    # if they have the same ultimate parent, these sets are already joined, so we end here
    if a == b:
        return
    
    # we want to merge the smaller set into the bigger set, to ensure fewer updates are required
    if size[a] > size[b]:
        a, b = b, a
    
    # make b the ultimate parent of a, and this will make b the ultimate parent of all 'a's children too
    p[a] = b
    
    # increase the size of b's set, and make a's size 0 to remove a from consideration as a circuit
    size[b] += size[a]
    size[a] = 0
    
    return

Let's now define $N$, the number of points we have:

In [5]:
# N is the number of points we have
N = len(points)

#### Initialise DSU data structure

We are now ready to initialise our DSU data structure, which is very simple and takes the form of two lists: the parents and the sizes of the points (nodes):

In [6]:
# initially define each point's parent as itself, and put it in a set of size 1 on its own
p = [i for i in range(N)]
size = [1]*N

Now create an array called `order` which will contain the squares of the Euclidean distances between each pair of points $(i, j)$, as well as the IDs of the points themselves, in the format $(distance(i, j)^2, i, j)$. We can do this using list comprehension and the function we earlier defined. We'll then sort this array by descending distance, so the smallest distances appear at the end:

In [7]:
# create an array called order to store the Euclidean distances
order = [(sq_dist(points[i], points[j]), i, j) for i in range(N) for j in range(i + 1, N)]

# sort order by descending square distance so that the shortest distance is at the end
order.sort(reverse = True)

Now define our number of iterations, and iteratively take the pair of points closest together and join them using our `union` function:

In [8]:
# set the maximum number of iterations
max_num = 1000

# perform max_num iterations
for count in range(max_num):
    
    # take the two points with the shortest remaining distance
    d, i, j = order.pop()
    
    # join them (if necessary)
    union(i, j)

Now sort the sizes of each set in descending order and multiply the top 3 together:

In [9]:
# sort the size array in descending order
size.sort(reverse = True)

# multiply the three greatest sizes together
print(size[0] * size[1] * size[2])

52668


And that's our answer to part 1.

## Part 2

Fortunately, we've set up part 1 in such a way that part 2 requires very minimal additional effort. We technically don't even need to reinitialise our DSU, since modified algorithm will simply perform the same steps again. But, for good practice, we will do so anyway:

In [10]:
# initially define each point's parent as itself, and put it in a set of size 1 on its own
p = [i for i in range(N)]
size = [1]*N

# create an array called order to store the Euclidean distances
order = [(sq_dist(points[i], points[j]), i, j) for i in range(N) for j in range(i + 1, N)]

# sort order by descending square distance so that the shortest distance is at the end
order.sort(reverse = True)

Now, instead of performing `max_num` iterations, we'll simply continue until all sets are joined together, as described above:

In [11]:
# perform a loop that continues we tell it otherwise (using 'break'), taking the shortest distance from the end each time
while True:
    
    # take the two points with the shortest remaining distance
    d, i, j = order.pop()
    
    # join them (if necessary)
    union(i, j)

    # i and j now have the same ultimate parent - if this ultimate parent's set size is N, then we have joined all points together, so we print the answer and stop
    if size[find(i)] == N:
        print(points[i][0] * points[j][0]) # multiply x-coordinates of i and j
        break

1474050600


And that's our answer to part 2.