# Day 2

## Getting started

Navigate to [Advent of Code Day 2](https://adventofcode.com/2025/day/2)

Save the problem input (I have saved as a text file called `AOC25_2_in.txt`)

## Understanding the problem

This problem is asking us to identify numbers with repeating patterns within a certain range. In order to identify such numbers, we must first consider how such a number is constructed.

In part 1, a number is 'invalid' if 'the first half' of the number matches 'the second half'. A very simple way to check for a given number is simply to write the number as a string of text, and see if the first half of the string matches the second half.

In part 2, a number is 'invalid' if it has any repeating component, and so we can extend the same idea, but this time check for any pattern that repeats more than once, using all possible divisors of the string length (except for the entire length itself, because 'repeating' only once does not make a number invalid).

#### Naive approach (slow)

A naive approach to this problem, therefore, is simply to iterate through each number within each of our ranges, and check whether it is invalid. The organisers have been kind today, and the ranges will allow for such a solution to run in time.

But what if we had a range of 1 to 10 trillion? We'd have to check every single number in that range. Ordinary computers are good at doing approximately 1 billion mathematical operations per second, so this would immediately elevate our script runtime to a minimum of several hours. In reality, the runtime would be even worse than this, because we're doing quite a few operations for each number.

#### Clever approach (fast)

There is a clever approach too! We can **analyse** the range and mathematically construct all possible invalid numbers without visiting each one of them, and we can do it very quickly.

I will discuss both solutions (and both are uploaded) in this repository. To compare runtimes, I will use the library `time`.


In [12]:
import time

## Part 1

### Naive approach (slow)

As in day 1, we will read in our inputs, and create an array (this time called `rngs`) containing our ranges.

In [3]:
import sys
read = sys.stdin.read
f = open("AOC25_2_in.txt")

#read in the file, and save the information in an array which we call 'rngs'
#the 'split' function breaks the input file up, and the ',' argument tells when to break it up (in this case, every new line)
rngs = f.read().split(',')

In the simple solution, I will iterate over every number in the range and perform a check to see if it is invalid. To do this, I require a function to check whether a number is invalid.

I can do this very easily. If I treat the number as a string of text, I just check that the first half matches the second half.

In [4]:
def is_invalid(num):
    
    # get length of number
    num_length = len(str(num))
    
    # if the number has odd length, we immediately know it is not invalid
    if num_length % 2 == 1:
        return False
    
    # if a number is invalid, then the 'first half of the number' equals 'the second half of the number', so I'll check that the characters in the first half match those in the second half, and return 'True' if so
    return str(num)[ : num_length // 2] == str(num)[num_length // 2 : ]

Now all I need to do is create a variable called `ans` to add each invalid number to, and iterate over the range to find them all.

In [14]:
T = time.time()

ans = 0

for rng in rngs:
    lo, hi = rng.split('-')
    lo = int(lo)
    hi = int(hi)
    
    # the input looks pretty simple, but we should always prepare for the most difficult scenario
    # we must count all invalid numbers x, such that lo <= x <= hi
    
    for num in range(lo, hi + 1):
        if is_invalid(num):
            ans += num

print(ans)
print("Runtime =", round(time.time() - T, 2), "seconds")

53420042388
Runtime = 3.08 seconds


And it's that simple - I've solved part 1 (the easy but slow way). But it took a few seconds (3.08s at time of writing) - try running this with an input range of 1 to 10 trillion and you will soon find that it's not an ideal way!

### Clever approach (fast)

In my clever approach, I'm going to break the problem up and consider numbers of different lengths. For a given possible length (which must be even in part 1, otherwise we cannot have a pattern that repeats twice), I will find the smallest possible invalid number, and the largest possible invalid number.

I'll then take advantage of a trick in maths for summing all numbers over a range:

$$\sum_{i=1}^n i = {n \over 2} (n + 1) $$

Using this formula, I can deduce that

$$\sum_{i=m}^n i = {n \over 2} (n + 1) - {m \over 2} (m - 1) $$

Example: suppose I want to find all invalid numbers in the range `1161` to `7913`.

I can identify the lowest such invalid number as `1212`, and the highest such invalid number as `7878`. All numbers with repeating patterns `12, 13, ..., 78` will be invalid. I can calculate the sum of these numbers by calculating the sum of the numbers from `12` to `78` using the above formula, and then multiplying my answer by `101`.

I can repeat this logic across all possible lengths, and deduce all possible invalid numbers and their sum for a given range in almost no time, without checking any of the numbers individually.

So let's define my functions to calculate the range sum first:

In [7]:
def sum_first_n(n):
    return n * (n + 1) // 2

def sum_range(min_, max_):
    return sum_first_n(max_) - sum_first_n(min_ - 1)

Now let's define my function to calculate the sum of all invalid numbers of a specified length, such that they are also within our required range:

In [9]:
def sum_(lo, hi, length):
    
    # discount any odd length numbers, as we know all invalid numbers have even length
    if length % 2 == 1:
        return 0
    
    # For ease, let's call the smallest half-number lo_half and the largest half number hi_half
    # This is because each half-number in this range accounts for exactly one possible invalid number, so 1234 could be 12341234, an invalid number
    lo_half = 10 ** (length // 2 - 1)
    hi_half = 10 * lo_half - 1
    
    # what's the smallest invalid number that's >= lo and has this length?
    # if the length is bigger than the length of the bottom number of our range, then our smallest number is simply a power of 10 (e.g. 10, 100, 1000, etc)
    if length > len(str(lo)):
        min_ = lo_half
    # otherwise, our smallest number is the smallest repeating pattern number >= lo
    else:
        min_ = lo // (lo_half * 10)
        if (min_ * (10 * lo_half + 1)) < lo:
            min_ += 1
    
    # what's the largest invalid number that's <= hi and has this length?
    # if the length is smaller than the length of the top number of our range, then our largest number is simply a sequence of repeating 9s (e.g. 99, 999, 9999, etc)
    if length < len(str(hi)):
        max_ = hi_half
    # otherwise, our largest number is the largest repeating pattern number <= hi
    else:
        max_ = hi // (lo_half * 10)
        if (max_ * (10 * lo_half + 1)) > hi:
            max_ -= 1
    
    # So now I know the range of invalid half numbers, I must sum the numbers in that range and then multiply by (10 * lo_half + 1) 
    return sum_range(min_, max_) * (10 * lo_half + 1)

Now we iterate over all our ranges, and within each range we iterate over all possible lengths of invalid numbers:

In [20]:
T = time.time()
ans = 0

for rng in rngs:
    lo, hi = rng.split('-')
    lo = int(lo)
    hi = int(hi)
    
    # the input looks pretty simple, but we should always prepare for the most difficult scenario
    # we must count all invalid numbers x, such that lo <= x <= hi
    # we will start by considering numbers with the same length as lo, and iterate until we reach numbers of the same length as hi
    
    for length in range(len(str(lo)), len(str(hi)) + 1):
        ans += sum_(lo, hi, length)

print(ans)
print("Runtime =", round(time.time() - T, 3), "seconds")

53420042388
Runtime = 0.003 seconds


Our answer agrees, and we got there much faster! 0.003s at the time of writing. This is because we used some mathematical tricks, instead of 'brute force' programming.

## Part 2

### Naive approach (slow)

We can fairly easily extend our slow approach from Part 1. This time, when checking each number, we simply check repetitions of all possible lengths, and divide our string up into the corresponding number of chunks. We now need a new function which checks whether a number has a repeating pattern of a specified length:

In [22]:
def is_invalid_len(num, rep):
    
    num_length = len(str(num))
    
    # if the pattern length doesn't divide the number length, this pattern length does not fit
    if num_length % rep > 0:
        return False
    
    # let's define num_reps as the number of times we need the pattern to recur
    num_reps = num_length // rep
    
    # then every 'rep' digits must be the same
    return str(num) == num_reps * str(num % (10 ** rep))

This replaces the logic specific to cutting the string in half with some generic logic that works for any number of chunks (note that the same rule applies - if the pattern length is not a factor of the number length, we skip it).

Now we modify our function which checks a particular number to check all possible valid lengths:

In [23]:
def is_invalid(num):
    
    # Let's try to determine whether the number has a repeating pattern of length L, for any length L up to but not including the length of our number
    # We can start from L/2 and work backwards to 1
    
    for i in range(len(str(num)) // 2, 0, -1):
        if is_invalid_len(num, i):
            return True
    
    return False

Finally, we iterate over all ranges, and all numbers within each range:

In [26]:
T= time.time()
ans = 0

for rng in rngs:
    lo, hi = rng.split('-')
    lo = int(lo)
    hi = int(hi)
    
    # the input looks pretty simple, but we should always prepare for the most difficult scenario
    # we must count all invalid numbers x, such that lo <= x <= hi
    
    for num in range(lo, hi + 1):
        if is_invalid(num):
            ans += num

print(ans)
print("Runtime =", round(time.time() - T, 2), "seconds")

69553832684
Runtime = 8.67 seconds


Well, looks good. It's passed the test. But it took over 8 seconds (at time of writing) this time! Not great - especially if we make our inputs more complex.

### Clever approach (fast)

The clever approach needs some modification this time. I can still easily count all invalid numbers of a specified length by manipulating the same logic in part 1 to be general rather than specific than two halves. The problem is that a number such as `1111` has a repeating pattern of length 1 and also a repeating pattern of length 2! I must be careful not to double count it.

To get round this, I save the total of invalid numbers of each length in an array called `rep_counts'. At the end, I will sum up all the numbers in rep_counts, but for each pattern length, I will subtract the total of any pattern lengths which are a factor of that length. This will save me from double counting!

In [27]:
def sum_rep(lo, hi, length, rep):
    
    # discount any repetition lengths which are not a factor of length
    if length % rep != 0:
        return 0
    
    rep = length // rep
    
    # For ease, let's call the smallest repeating pattern lo_rep and the longest half number hi_rep
    # This is because each pattern in this range accounts for exactly one possible invalid number, so if rep = 3, 1234 could be 123412341234, an invalid number
    lo_rep = 10 ** (length // rep - 1)
    hi_rep = 10 * lo_rep - 1
    
    # what's the smallest invalid number that's >= lo and has this length and this degree of repetition?
    if length > len(str(lo)):
        min_ = lo_rep
    else:
        min_ = lo
        min_ //= (lo_rep * 10) ** (rep - 1)
        # try this number, and increment by 1 if too low
        tmp = min_
        for i in range(rep - 1):
            tmp = tmp * (10 * lo_rep) + min_
        if tmp < lo:
            min_ += 1
    
    # what's the largest invalid number that's <= hi and has this length and this degree of repetition?
    if length < len(str(hi)):
        max_ = hi_rep
    else:
        max_ = hi
        max_ //= (lo_rep * 10) ** (rep - 1)
        tmp = max_
        for i in range(rep - 1):
            tmp = tmp * (10 * lo_rep) + max_
        if tmp > hi:
            max_ -= 1
    # So now I know the range of invalid partial numbers, I must sum that range and then multiply by (lo_rep * 10 + 1)**(rep - 1)
    ret = sum_range(min_, max_)
    adder = ret
    for i in range(rep - 1):
        ret = ret * (10 * lo_rep) + adder
    return ret

Above we've modified our previous function to add all invalid numbers which have a repeating pattern of length `rep`.

Now I'll iterate over all possible pattern lengths, store the sums, and do my trick to avoid double counting:

In [28]:
def sum_(lo, hi, length):
    
    # define an array to store the number of invalid numbers which have a repeating pattern of anywhere between 1 and length // 2 inclusive
    rep_counts = [0] * (length // 2 + 1)
    
    # iterate over all possible repetition lengths
    for l in range(1, length // 2 + 1):
        rep_counts[l] = sum_rep(lo, hi, length, l)
    
    # define a variable to return the number of invalid numbers of this length between lo and hi inclusive
    ret = 0
    for l in range(1, length // 2 + 1):
        if length % l > 0:
            continue
        # add the number of that length
        ret += rep_counts[l]
        
        # delete double counts - any with a shorter repeating pattern that is a factor of l
        for x in range(1, l):
            if l % x == 0:
                ret -= rep_counts[x]

    return ret

Now let's run the code again - hopefully it will still be very fast.

In [35]:
T = time.time()
ans = 0

for rng in rngs:
    lo, hi = rng.split('-')
    lo = int(lo)
    hi = int(hi)
    
    # the input looks pretty simple, but we should always prepare for the most difficult scenario
    # we must count all invalid numbers x, such that lo <= x <= hi
    # we will start by considering numbers with the same length as lo, and iterate until we reach numbers of the same length as hi
    
    for length in range(len(str(lo)), len(str(hi)) + 1):
        ans += sum_(lo, hi, length)

print(ans)
print("Runtime =", round(time.time() - T, 3), "seconds")

69553832684
Runtime = 0.001 seconds


My answer is the same as before, but this time only took 0.001s! Pretty good result.

## Complexity analysis

Why is faster code important? Let's consider briefly the two approaches just for part 1. We considered every single number in each range. Suppose our range was 10 trillion long - then we'd have to perform our string comparison on 10 trillion different numbers.

Using the mathematical techniques, we can perform just a few operations at most for every possible length of number. Between 1 and 10 trillion, this is 14 different possible lengths, meaning our number of operations is likely only a few hundred (down from many trillion).