Skip to content

Latest commit

 

History

History
44 lines (32 loc) · 3.27 KB

README.md

File metadata and controls

44 lines (32 loc) · 3.27 KB

Project Euler problem 14 solutions

About

A collection of solution optimizations for Project Euler problem 14 regarding delay records for testing the Collatz conjecture in Python, C++ and x64 Assembly. I originally solved the problem in 2010 but recently used it to practice different optimization techniques and languages.

Background

The procedure is very straightforward. Starting with some number n, follow a simple iterative procedure:

  • if even, divide by 2
  • if odd, multiply by 3 and add 1
  • if 1, finish

These programs try all starting numbers below a specified limit and find the longest chain. There's no way to predict the length of a chain given the starting value. You can cache results, but that requires O(n) space. Chain length grows very slowly—the longest chain below 10^10 has 1133 steps—but that's still almost 19 gigabytes with lengths stored as 16-bit ints!

Simply, there is no practical early termination condition.1

Therefore I focused first on shrinking search space and second on using faster languages and libraries. Note that unsigned 64-bit ints can handle the intermediate values in chains for any limit up to 10^10.

1 Of course there's finishing when you reach a power of 2, but the savings there are minimal, especially when using bit shifts and count trailing zeros.

Examples of improvements

  • naive solution

  • search space narrowing: n mod 96 must be in [1, 7, 9, 15, 25, 27, 31, 33, 39, 43, 57, 63, 73, 75, 79, 91] - only checking 16/96 or 1/6 of the numbers2

  • Python Numba library JIT compiler

  • Python Multiprocessing package

  • efficient wheel implementation of search space scan

  • bit packing the wheel

  • compiler intrinsics (specifically bit scan forward/count trailing zeros)

  • pure x64 assembly implementation

    2 there are several known records produced by numbers that fall outside this set (including even numbers, eg. 6, 18, 54, 31466382). However, most with limit 10^N fall inside it. See http://www.ericr.nl/wondrous/delrecs.html for a list of known records.

Benchmark results are included in eul14 benchmarks.txt and all ran on my over-the-hill Core i7-920 @ 2.67 GHz

Code usage

  • Each source file is self-contained and should run in the interpreter or compile as appropriate
  • C++ files require C++11 for timing via <chrono>
  • Preprocessor uses either GCC or MSVC++ intrinsics
  • Assembly code uses MASM syntax and was tested with ML64 packaged with Visual Studio 2017

Future plans

  • C++ multithreading!
  • GPU would be nice but this problem may not be a good candidate due to extreme variability and unpredictability of chain lengths

Special thanks to Eric Roosendaal's 3x + 1 Problem compendium for ideas on narrowing search space