### The "Adds up to 100" puzzle:
- The question is more or les stated, "Given some vector of numbers $X$ of positive numbers, which might be quite large, how many pairs of numbers add up to 100?"
- First, answer the question with a question: What kind of numbers? Integers? Over what range -- are all of the numbers $x_j < 100$?
- Here, we will provide reasonably general solutions for both the continuum and integer cases.
- Note that fundamentally, this is an $ON^2$ problem, but we want to reduce that as much as possible.

In [18]:
%load_ext autoreload
%autoreload 2
%matplotlib inline



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [19]:
import matplotlib
import pylab as plt
import numpy
import scipy
import random


### Make a sequence of numbers

In [20]:
N = 1000
R = random.Random()

x_min = 0.
x_max = 100.

# search range:
x_lo = 50.
x_hi = 60.

X = [R.random()*(x_min + (x_max - x_min)) for _ in range(N)]


### Continuum case:
- First, we might assume based on how the question was phrased that we can ignore this case. Continuum values never really equal oneanother, so we'll modify the question to say something like, find the pairs of numbers whose sum is between $x_{min} < x_1 + x_2 < x_{max}$.
- We don't have too much to work with; this problem is still fundamentally (as much as) $ON^2$, but:
    - We can use sorting to reduce the scope a bit
    - Obviuosly, we can exclude any numbers $x>x_{max}$
    - ... etc.
- So this gets us approximately $O(N \cdot \log(N) + \alpha N^2)$, where $\alpha$ is more or less (exactly?) the square of the fraction of values $x < x_{max}$.
- So we observe significant reductions in the complexity, even for $x_{hi} \approx x_{max}$. The next step might be to figure out a witty indexing or modulus-skipping system to optimize the search for small $x$.


In [21]:
# now, find all pairs where x1 < (x_j + x_k) < x2
#
# make a sorted copy of X:
X_prime = list(sorted(X))
pairs = []
nits = 0
#
# we should maybe (or maybe not) use itertools, but first, let's just block it out:
for j, x1 in enumerate(X_prime):
    # is this a valid value for x1? if we've already exceeded our maximum value, we can stop spinning...
    if x1>x_hi: break
    #
    for k, x2 in enumerate(X_prime):
        nits += 1
        if x2 + x1 > x_hi:
            break
        x = x1 + x2
        if x>=x_lo and x<=x_hi:
            pairs += [[j,k, x1, x2, x]]
        #
#
#       

print("Number of distinct pairs: {}".format(len(pairs)))
print('nits: {}/{} ({:.3f})'.format(nits, N*N, nits/(N*N)))

Number of distinct pairs: 53695
nits: 169074/1000000 (0.169)


#### Variable Types: The Integer trick

- So that's the sort of problem we would probably encounter in physical science -- continuum values.
- But in CS, or just for the sake of puzzles, let's consider the case where $X$ is restricted to positive ingegers, this problem becomes approximately $ON$ for large $N$. The number of integers that add to 100 is small (it's 100).
- We solve the problem in two steps. We index and filter the input:
    - Exclude all values $x>100$; count and index the number of each value $x \le 100$
    - Now, for each entry $x_j$, compute the colplement $x_k = 100 - x_j$, and add the total count, $n = n_j + n_k$.
    - Note that we return the pairs, $x_j + x_k = 100$, and their counts. These are $2 \times$ degenerate, in that we get two redundant pairs, $(x_j, x_k) = (x_k, x_j)$, so if we want the total count, add these up and divide by two. 
    - ...or equivalently, add up half of them. So we only need to spin over $0 < x_1 < 50$.

In [22]:
N = 10000
R = random.Random()
#
x_min = 0.
x_max = 200.
#
X = [int(R.random()*(x_min + (x_max - x_min))) for _ in range(N)]
#
X_index = {x:0 for x in range(int(x_min), 101, 1) }
for x in X:
    if x>100: 
        continue
    #
    X_index[x] +=1
#
pairs = []
#
for x,n in X_index.items():
    if x>50: continue
    x_prime = int(100 - x)
    n_prime = X_index[x_prime]
    #
    pairs += [[(x, x_prime), (n, n_prime), n+n_prime]]
#
print('pairs: ')
for p in pairs: print(p)


pairs: 
[(0, 100), (56, 62), 118]
[(1, 99), (43, 56), 99]
[(2, 98), (57, 63), 120]
[(3, 97), (60, 40), 100]
[(4, 96), (53, 63), 116]
[(5, 95), (48, 48), 96]
[(6, 94), (43, 39), 82]
[(7, 93), (42, 58), 100]
[(8, 92), (58, 46), 104]
[(9, 91), (46, 55), 101]
[(10, 90), (49, 68), 117]
[(11, 89), (47, 57), 104]
[(12, 88), (50, 44), 94]
[(13, 87), (47, 55), 102]
[(14, 86), (50, 45), 95]
[(15, 85), (57, 44), 101]
[(16, 84), (49, 62), 111]
[(17, 83), (50, 55), 105]
[(18, 82), (54, 49), 103]
[(19, 81), (47, 51), 98]
[(20, 80), (41, 48), 89]
[(21, 79), (48, 46), 94]
[(22, 78), (65, 40), 105]
[(23, 77), (58, 47), 105]
[(24, 76), (56, 39), 95]
[(25, 75), (42, 48), 90]
[(26, 74), (60, 47), 107]
[(27, 73), (42, 51), 93]
[(28, 72), (46, 51), 97]
[(29, 71), (42, 67), 109]
[(30, 70), (43, 55), 98]
[(31, 69), (37, 47), 84]
[(32, 68), (54, 44), 98]
[(33, 67), (39, 59), 98]
[(34, 66), (54, 32), 86]
[(35, 65), (51, 45), 96]
[(36, 64), (60, 54), 114]
[(37, 63), (54, 40), 94]
[(38, 62), (41, 57), 98]
[(39, 6

### Comments: 
- Note that we can hybridize these techniques.
- If we have inherently continuum value data, can we approximate (round or truncate) those values, so that we can index them?
    - $x = 5.1$, $dx = .1$, $j=51$
    - $j_k = int \left( \frac{x_k}{dx} \right)$ 
- It gets a little bit trickeier when we then evaluate a range of equalities, using our indexed system, but it's not too bad; we simply replace our $x_c = 100 - x$ with a framework where we evaluate all $(x_{lo}-x) < x_c < (x_{hi}-x)$. How exactly we do this may depend on the properties of the data.
