multiplication-table-problem

Single-processor C++ implementation of an efficient algorithm for computing $M(n)$, the number of distinct entries in the $n \times n$ multiplication table.

The Problem

$$M(n) = |{ij : 1 \le i, j \le n}|$$

Despite its elementary definition, $M(n)$ grows slower than $n^2$. Erdős (1955) showed $M(n) = o(n^2)$, and Ford (2008) established

$$M(n) = \Theta \left(\frac{n^2}{\Phi(n)}\right), \quad \Phi(n) = (\log n)^c (\log\log n)^{3/2}, \quad c = 1 - \frac{1 + \log\log 2}{\log 2} \approx 0.086$$

Equivalently, most integers less than $n^2$ are not products of two integers both at most $n$.

Algorithm

The implementation is based on Algorithm 4 of:

R. Brent, C. Pomerance, D. Purdum, J. Webster, Algorithms for the Multiplication Table Problem, arXiv:1908.04251 (2021).

This implementation goes beyond the above and incorporates the ability to correct shapes from unit shifts of Algorithm 4.

The key identity is

$$M(n) = \sum_{k=1}^{n} (k - \delta(k)),$$

where $\delta(k)$ counts multiples of $k$ already in the $(k-1) \times (k-1)$ table.

Unit-shift strategy. For a fixed multiplier $m$, the divisor shape for $mk$ changes predictably as $k$ increases by 1. When $k$ is prime the shape shifts exactly (no correction needed); when $k$ is composite a small correction accounts for extra divisors of $mk$ not in $M \cup kM$.

Variable-smoothness multipliers. A multiplier $m$ is included when $m \cdot P^+(m) \le n$, where $P^+(m)$ is the largest prime factor of $m$. Multipliers are processed in decreasing order of $\tau(m) = |{d : d \mid m}|$ so that high-divisor-count multipliers claim the most composite integers early, minimising redundant work.

Complexity. The runtime is conjectured to be $O(n^{5/3} \log n)$, which is superlinear — scaling from $n = 10^6$ to $n = 10^7$ increases the running time by a factor of roughly $10^{5/3} \approx 46$, not 10.

Build

Requires a C++17 compiler. No external dependencies.

make

To enable SIMD acceleration on your platform, edit CXXFLAGS in the Makefile:

Platform	Flag(s)
ARM NEON	`-DUSE_NEON_SIMD`
AVX-512	`-DUSE_AVX512_SIMD -mavx512f`

Usage

./mtable <n> [--bound <b>] [--output <file>] [--warm <file>]

Argument	Description
`n`	Compute $M(n)$. Must be $\le 2^{32}-1$.
`--bound <b>`	Include multipliers $m \le b$ (default: $\lfloor n^{2/3} \rfloor$).
`--output <file>`	Write `k M(k)` for $k = 1 \ldots n$, one per line, to a text file.
`--warm <file>`	Warm-start from a binary file of `uint32_t[n+1]` where entry $k$ holds $k - \delta(k)$.
`--save-warm <file>`	Write $k - \delta(k)$ for $k = 0 \ldots n$ as a binary `uint32_t[n+1]` file that can be passed to `--warm` in a later run.

Examples

./mtable 1000000

M(1000000): unit-shift algorithm (Alg. 4, var-smooth multipliers)
Multiplier bound : m <= 10000

Multipliers: 5345  [3 ms]

Exact fills    : 978668
Corrected fills: 21332
Skipped (cached): 0
Fallback (Alg.2): 0  [0 ms]
Sweep time     : 18313 ms

M(1000000) = 198878423611  [18315 ms total]

Write all values $M(1), M(2), \ldots, M(10000)$ to a file:

./mtable 10000 --output values.txt

Incremental computation — compute $M(10^4)$, save a warm-start file, then extend to $M(10^6)$ without redoing the work already done:

./mtable 10000 --save-warm warm_10k.bin
./mtable 1000000 --warm warm_10k.bin

Known values

$n$	$M(n)$
$10^3$	$248{,}083$
$10^6$	$198{,}878{,}423{,}611$
$2^{30}-1$	$204{,}505{,}763{,}483{,}830{,}092$
$2^{32}-1$	$3{,}215{,}709{,}724{,}700{,}470{,}901$

The value for $2^{30}-1$ is from Brent et al. (2021); the value for $2^{32}-1$ is new.

Memory

The bit-vector of $n$ bits is read and written with random-access patterns throughout the entire multiplier sweep; performance degrades once it exceeds the L3 cache. The uint32_t array of $k - \delta(k)$ values is written with strided access (multiples of each multiplier $m$) during the sweep and read sequentially once at the end to form the sum $M(n)$.

$n$	bit-vector	`uint32_t` array
$10^6$	125 KB	4 MB
$10^8$	12.5 MB	400 MB
$10^9$	125 MB	4 GB

For larger $n$

mtable performs all steps in a single pass on a single processor, which becomes impractical beyond roughly $n \sim 10^8$. Scaling to $n = 2^{32}$ and beyond requires several architectural changes:

Decouple the pipeline steps. The single-pass structure of mtable couples multiplier generation, the unit-shift sweep, corrections, and accumulation. At large scale these become separate programs communicating through files, allowing each stage to be tuned, restarted, or distributed independently.

Segment the bit-vector. Due to its random-access pattern, the bit-vector must fit in L3 cache to avoid severe performance degradation from cache misses. For $n = 2^{32}$ the full bit-vector is 512 MB — far exceeding a typical L3 cache. Segmenting the sweep so that each pass works on a contiguous chunk that fits in cache significantly improves throughput.

Distribute multipliers across processors. Each multiplier is largely independent: it owns a disjoint set of composite integers and sweeps its own portion of the bit-vector. Multipliers can therefore be partitioned across nodes in a cluster, each node writing its share of the $k - \delta(k)$ values to a memory-mapped file. An ownership bit-vector prevents two nodes from counting the same composite twice.

Replace the val array with a memory-mapped file. When multipliers are distributed across nodes, each node must write its $k - \delta(k)$ values into a shared result array. Memory-mapping a single file on shared storage allows all nodes to write to disjoint regions concurrently without explicit coordination, and the completed file serves directly as input to the final summation step.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
algo4_unit_v3.cpp		algo4_unit_v3.cpp
algo4_unit_v3.hpp		algo4_unit_v3.hpp
bitvector.hpp		bitvector.hpp
compile_flags.txt		compile_flags.txt
factorize.hpp		factorize.hpp
mtable.cpp		mtable.cpp
rollsieve.cpp		rollsieve.cpp
rollsieve.h		rollsieve.h
udl.hpp		udl.hpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multiplication-table-problem

The Problem

Algorithm

Build

Usage

Examples

Known values

Memory

For larger $n$

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

multiplication-table-problem

The Problem

Algorithm

Build

Usage

Examples

Known values

Memory

For larger $n$

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages