# Day 4

## Getting Started

Navigate to [Advent of Code Day 5](https://adventofcode.com/2025/day/5)

Save the problem input (I have saved as a text file called `AOC25_5_in.txt`)

## Understanding the problem

In part 1, we could do something very simple here: for each ingredient, we could loop over all the fresh ranges and see if it sits in any of them. The inputs we are given will allow this, but once again it's not best practice. Let's think about why:

- Suppose instead of approx. 200 ranges and approx. 1000 ingredients, we actually had 100,000 ranges and 100,000 ingredients. A naive solution would not scale well to those inputs, requiring in the region of 10 billion checks (each of which requires several operations). The complexity of this solution is $O(N * M)$ where $N$ is the number of ranges and $M$ is the number of ingredients.

We can make a significant improvement. Supposes we create an efficient set of ranges, by doing two things:
- sorting our ranges in order
- merging overlapping ranges into a single range

How does this help? Well, we can use a technique called [binary search](https://en.wikipedia.org/wiki/Binary_search) (Python and most other languages have inbuilt functions to do this for us). This reduces our search time for a candidate interval down to $O(log(N))$. This technique will allow us to find the interval which will contain our ingredient **if there is any such interval**, and therefore will tell us whether the ingredient is fresh.

Now, instead of $(O(N * M))$, our solution is $O(N * log(N))$ for the initial sorting of our array, and $O(M * log(N))$ for processing the ingredients. This is much faster.

### Implementing the sort and merge

Sorting the initial set of ranges is a simple inbuilt Python function. Merging them is more complex. What we will do is iterate from the start of our set of ranges to the end, and while the next range overlaps with our current set of ranges, we'll note the current min and max values of the collection of ranges and keeping moving through.

When we get to a range that does not overlap with our current set, we will add the current merged interval to a fresh list of ranges, and then start again at the next range, repeating the process until all ranges are incorporated into our new list.

## Part 1

Read in the inputs (see day 1 for explanation):

In [1]:
# open the file
import sys
from bisect import bisect_left, bisect_right
read = sys.stdin.read
f = open("AOC25_5_in.txt")

# read in the inputs - this is more intricate today; firstly we need to separate the two parts of the input, using .split('\n\n')
# then we parse the two different parts
inp1, inp2 = f.read().split('\n\n')

# Let's process inp1 - the fresh ranges as strings - into a list of ranges [x, y] where x and y are integers
# To do that, we split by '\n' to get each range as a string, then split by '-' to get x and y, mapping those to integers as we did yesterday
fresh_raw = [list(map(int,rng.split('-'))) for rng in inp1.split('\n')]

# Processing the available ingredients is simpler: split again by '\n', then map to integers
available = list(map(int, inp2.split('\n')))

#### Doing the sort

Sort the raw list of fresh ranges, and create a new empty list to hold the merged intervals:

In [2]:
# Sort the ranges of fresh ingredients - this will sort them by x first, then by y
fresh_raw.sort()

# Create a new list into which we'll insert our merged ranges
fresh_processed = []

#### Doing the merge

Iterate left to right through our sorted list, and add new merged intervals to the new list, as described above:

In [3]:
# Start at the beginning of the array, with index i = 0, and iterate while i remains within the bounds of the array
# i will be our left-most range of those to be merged
# fresh_raw[i][0] - the lower end of this range - will also be the low value of our merged range, since we have sorted by first index already
i = 0
while i < len(fresh_raw):
    
    # Initially the right-most range to be merged is the starting range
    j = i
    
    # Initially the low and high values of this range are the low and high values of the merged range
    curr_min, curr_max = fresh_raw[i]
    
    # iterate over j - while the next range contains any values which intersect with our current range, we include this range
    while j + 1 < len(fresh_raw) and fresh_raw[j + 1][0] <= curr_max:
        j += 1
        
        # update the high end of our merged range if necessary
        curr_max = max(curr_max, fresh_raw[j][1])
    
    # add the final merged range to our new, clean array
    fresh_processed.append([curr_min, curr_max])
    
    # start again at the range immediately after the final one we merged
    i = j + 1

Now set ourselves up to do the binary search. When we search for ingredient with number $X$, we'll search for the index where $[X, BIG_NUMBER]$ would sit. This is because this is guaranteed to be one place to the right of any interval that might contain $X$.

Why? Let's call a candidate range containing X $[A, B]$, and suppose the interval immediately to the right is $[C, D]$. We know that $A <= X$ and $C > X$. We will search using interval $[X, Y]$:
- if $A < X$ then any range beginning with $X$ will appear to the right regardless of the value of $Y$.
- if $A = X$ then the position of $[X, Y]$ will depend on which is bigger, $B$ or $Y$. We can force the issue by ensuring that $Y > B$. So we set $Y$ to be a 'big number'.

In [4]:
# This big number will have us bisect our array effectively
BIG_NUMBER = 10**18

# This variable will store our answer - for each fresh ingredient, we'll add 1
ans = 0

Now we loop over the ingredients, and run our binary search for each one:

In [5]:
for ingredient in available:
    # Is this ingredient fresh? Let's bisect the processed list using [ingredient, 10**18] and see if ingredient already sits in the range at that index
    idx = bisect_left(fresh_processed, [ingredient, BIG_NUMBER]) - 1
    if idx >= 0 and fresh_processed[idx][0] <= ingredient <= fresh_processed[idx][1]:
        ans += 1

print(ans)

558


And there's our answer to part 1! Simple.

## Part 2

Today, part 2 is very easy for those who put the effort in to do part 1 the clever way! To count the fresh ingredient IDs, we just add up the width of all our merged intervals in the processed list we've already created.

In [6]:
# Create an answer variable
ans = 0

# We can now just use our fresh_processed array to easily sum the sizes of the ranges
for lo, hi in fresh_processed:
    ans += hi - lo + 1
  
print(ans)

344813017450467


And part 2 is solved. That was nice and easy - all thanks to the work we did upfront in part 1.