In [2]:
import pandas as pd
import numpy as np

from itertools import product
from functools import reduce

Some interesting things I noticed:
1. For $k \geq 2$, you can never have more than $k-2$ ones.
    * If you had all $k$ ones, then the product is $1$ and the sum is definitely $\geq 2$.
    * If you had $k-1$ ones and the remaining number is $x$, then the product is $x$ and the sum is $\geq x+1$.
    * This means prime numbers will never be product-sum numbers.

2. If any number $N$ is a product of two numbers where $x,y \geq 2$ and $xy = N$, then the number of ones needed is $A = xy - (x + y)$ and the set needed is size $k = 2 + A$. In general, if you take a product of $n$ numbers, $x_i \geq 2$ for each $i \leq n$, such that $\prod_{i=1}^{n} x_i = N$, then the number of additional ones needed is $A = \prod_{i=1}^{n} x_i - \sum_{i=1}^{n} x_i$ and thus the set needed is $k = n + A$.
    * If $x + y > xy$ then either $x = 1$ or $y = 1$.
    * This means the smallest set needed for a particular number is where $k = n + A$ is minimized, meaning that we want to minimize the difference between the product and the sum, while simultaneously using as few divisors possible.

3. For any $k$, the set $\{k, 2, (k-2) \text{ ones}\}$ creates a product-sum number ($2k = k + k = k + 2 + (k - 2)$). The product is $2k$, so that tells us for any $k$, we only need to check numbers up to $2k$.


In [3]:
def sieve(n):
    arr = [0,0,1] + [1,0]*(n//2 + 1)
    i = 3
    while i*i <= n:
        if arr[i]:  
            arr[i*i::2*i] = [0]*len(arr[i*i::2*i])
        i += 2

    ret = []
    for (i, p) in enumerate(arr):
        if p:
            ret.append(i)

    return arr, ret

pbs, ps = sieve(10**5)

### My approach

0. Initialize a dictionary `min_ks` and `factor_cache` to store results. `min_ks` can be initialized with every key $2 \leq k \leq 12000$ and set each value to be $25000$ which is larger than the maximum number (in general $2 \cdot \max(k)$).

1. Factorize every number, $n$, with every combination of factors. We cache these results so larger numbers can use can reuse results from their factors. So for example, for $36$ we would have
    $$
    \text{factorize}(n) = [(36), (4,9), (2,2,9), (2,18), (3,3,4), (3,12), (2,3,6), (6,6), (2,2,3,3)]
    $$
    which may reuse values from $2, 6,$ and $18$.

2. For each tuple, $t$, we find their $k_t$ which is (as said above)
    $$
    k_t = \text{len}(t) + \text{prod}(t) - \text{sum}(t)
    $$

3. We see if `min_ks[k_t] < n`. If it is, then we update the dictionary value with $n$. If not, then we continue.

This algorithm is obviously not the fastest (~10 sec if not pre-cached). But it gets the job done.

In [15]:
min_ks = {i: 25000 for i in range(12001)}
factor_cache = {}
def factorize(n, prime_bins, limit):
    if n == 1 or prime_bins[n]:
        return [(n,)]

    if n not in factor_cache:
        factorization = {(n,)}
        i = 2
        while i*i <= n:
            if n % i == 0:
                # put in all possible factorizations
                left_factors = factorize(i, pbs, limit)
                right_factors = factorize(n // i, pbs, limit)

                for left, right in product(left_factors, right_factors):
                    factorization.add(tuple(sorted(list(left) + list(right))))

            i += 1

        for f in factorization:
            pot_k = len(f) + reduce(lambda a,b: a*b, f, 1) - sum(f)
            if 2 <= pot_k <= limit:
                min_ks[pot_k] = min(min_ks[pot_k], n)

        factor_cache[n] = tuple(factorization)
    
    return factor_cache[n]

In [12]:
klim = 12000
for x in range(2*klim + 1)[::-1]:
    if pbs[x]: continue
    factorize(x, pbs, klim)

In [13]:
s = set()
for x in range(2, 12001):
    s.add(min_ks[x])

print(sum(s))

7587457
