# Longest Collatz sequence

<blockquote>
<p>The following iterative sequence is defined for the set of positive integers:</p>
<p style="margin-left:50px;"><var>n</var> → <var>n</var>/2 (<var>n</var> is even)<br /><var>n</var> → 3<var>n</var> + 1 (<var>n</var> is odd)</p>
<p>Using the rule above and starting with 13, we generate the following sequence:</p>
<div style="text-align:center;">13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1</div>
<p>It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.</p>
<p>Which starting number, under one million, produces the longest chain?</p>
<p class="note"><b>NOTE:</b> Once the chain starts the terms are allowed to go above one million.</p>
</blockquote>

Since computing the length of the Collatz chain for any given number can take arbitrarily-long, we’d like to avoid computing the same value twice. We’ll use a dictionary to store the results we’ve already computed. A pre-allocated array would be more trouble since we don’t know how large the terms will be, even though we are only interested the results for the first million terms.

Once a value in a chain is a power of 2, the chain quickly reduces to 1 as we apply the $n\div 2$ rule repeatedly. In fact, the length of the Collatz chain for $2^n$ is $n$. We can seed our dictionary with these values.

Another savings in computation can be had by noticing that, once we’ve found the length of the chain for a number, we also know the length for all the other numbers in the chain. Our algorithm adds Collatz terms to a stack until it reaches one with a known length. It then pops them off and saves the results based on the known length and the position in the stack.

In [2]:
collatz(n) = iseven(n) ? div(n, 2) : 3n+1

collatz (generic function with 1 method)

In [3]:
let m = 1_000_000
    C = Dict()
    
    # seed with powers of 2
    n, l = one(m), one(m)
    while n < m
        C[n] = l
        n *= 2
        l += 1
    end
    
    # compute the chain lengths for other values less than m
    for n in 3:m-1
        haskey(C, n) && continue
        S = [n]
        while !haskey(C, S[end])
            push!(S, collatz(S[end]))
        end
        l = C[pop!(S)]
        while length(S) > 0
            l += 1
            C[pop!(S)] = l
        end
    end

    # find the number with the longest chain
    println("longest chain: $(maximum(values(C)))")
    println("number with longest chain: $(argmax(C))")
    println("largest term: $(maximum(keys(C)))")
    println("number of terms: $(length(C))")
    a, b = 0, 0
    for n in 1:m-1
        if C[n] > b
            a, b = n, C[n]
        end
    end
    a, b
end

    

longest chain: 525
number with longest chain: 837799
largest term: 56991483520
number of terms: 2168611


(837799, 525)

## Results

The number less than one million with the longest chain is 837799, which has a chain length of 525. Sorting the results would be a mistake: we don't need to put them all in order, we just need to make a single pass to find the largest one.

The `argmax` function does find the key for the largest value. I considered whether there might be a longer chain for numbers larger than one million, but the only way those numbers get calculated is as part of a chain for a number less than one million; so using `argmax` should be sufficient.

It’s interesting to see that we end up computing the chain length for over 2 million terms in order to get every term less than one million. Amazingly, we have to compute the chain length for a number that is nearly 57 trillion in the process.