# Problem 101

### Optimum polynomial

If we are presented with the first k terms of a sequence it is impossible to say with certainty the value of the next term, as there are infinitely many polynomial functions that can model the sequence.

As an example, let us consider the sequence of cube numbers. This is defined by the generating function, 
$$u_n = n^3: 1, 8, 27, 64, 125, 216, ...$$

Suppose we were only given the first two terms of this sequence. Working on the principle that "simple is best" we should assume a linear relationship and predict the next term to be 15 (common difference 7). Even if we were presented with the first three terms, by the same principle of simplicity, a quadratic relationship should be assumed.

We shall define **OP(k, n)** to be the nth term of the optimum polynomial generating function for the first k terms of a sequence. It should be clear that OP(k, n) will accurately generate the terms of the sequence for n ≤ k, and potentially the first incorrect term (**FIT**) will be OP(k, k+1); in which case we shall call it a bad OP (**BOP**).

As a basis, if we were only given the first term of sequence, it would be most sensible to assume constancy; that is, for n ≥ 2, OP(1, n) = u1.

Hence we obtain the following OPs for the cubic sequence:

    OP(1, n) = 1	            1, **1**, 1, 1, ...
    OP(2, n) = 7n−6	            1, 8, **15**, ...
    OP(3, n) = 6n2−11n+6     	1, 8, 27, **58**, ...
    OP(4, n) = n3	            1, 8, 27, 64, 125, ...
    
Clearly no BOPs exist for k ≥ 4.

By considering the sum of FITs generated by the BOPs (indicated in red above), we obtain 1 + 15 + 58 = 74.

Consider the following tenth degree polynomial generating function:

$$ u_n = 1 − n + n^2 − n^3 + n^4 − n^5 + n^6 − n^7 + n^8 − n^9 + n^{10} $$

Find the sum of FITs for the BOPs.

### Solution

We can write the polynomial as:

$$ u_n = \sum_{k=0}^{10} (-1)^k n^k $$

The best approximation of *n* points comes from a polynomial of degree $k=n-1$. We can write a square linear system and solve it (no least square error) to find the coefficients. We can then make the prediction of the next term of the sequence.

In [1]:
import numpy as np

def u(n):
    return sum((-1)**k * n**k for k in xrange(0, 11))

X, Y = zip(*[(n, u(n)) for n in xrange(1, 13)])

Create matrix A with all the required powers:

In [2]:
H_max = 11
A_max = np.empty(shape=(H_max, H_max), dtype=np.int)
A_max[:, -1] = 1

for idxv, v in enumerate(X[:H_max]):
    for hh in xrange(0, H_max):
        A_max[idxv, H_max - hh - 1] = v**hh

In [3]:
sum_fits = 0

for h in xrange(1, H_max + 1):
    
    # select the sub-matrix relative to the first h points and the first h powers
    A = A_max[:h, -h:]
    
    # solve the linear system
    v = np.linalg.solve(A, Y[:h])
    v = np.round(v).astype(np.int32)
    print 'Coefficients:', v
    
    # compute the next prediction
    p = np.poly1d(v)
    next_pred = p(X[h])
    if h < H_max:
        sum_fits += next_pred
        
    print 'Next true number / Predicted number:', Y[h], next_pred
    print
    
    
print    
print 'Sum of FITs:', sum_fits

Coefficients: [1]
Next true number / Predicted number: 683 1

Coefficients: [ 682 -681]
Next true number / Predicted number: 44287 1365

Coefficients: [ 21461 -63701  42241]
Next true number / Predicted number: 838861 130813

Coefficients: [ 118008 -686587 1234387 -665807]
Next true number / Predicted number: 8138021 3092453

Coefficients: [  210232 -1984312  6671533 -9277213  4379761]
Next true number / Predicted number: 51828151 32740951

Coefficients: [   159060  -2175668  11535788 -29116967  34305227 -14707439]
Next true number / Predicted number: 247165843 205015603

Coefficients: [    58542  -1070322   8069182 -31492582  65955241 -68962861  27442801]
Next true number / Predicted number: 954437177 898165577

Coefficients: [    11165   -254078   2524808 -13814218  44083303 -80663539  76941359
 -28828799]
Next true number / Predicted number: 3138105961 3093310441

Coefficients: [     1111    -28831    352528  -2514688  11126621 -30669221  50572225
 -44806465  15966721]
Next true num