# Shapley kernel validation

Here we prove that the Shapley kernel we found computationally is correct for all functions of up to 10 features. The proof is also computational, hence why we only show it for functions with few enough inputs that we can fully enumerate them. 

The classic Shapley value computation is 

\begin{align}
\phi_i(f,x) &= \sum_{S \subseteq S_{all}} \frac{f_x(S \cup \{i\}) - f_x(S)}{{M \choose |S|}(M-|S|)} \\
&= \sum_{S \subseteq S_{all}} \frac{f_x(S \cup \{i\})}{{M \choose |S|}(M-|S|)} - \sum_{S \subseteq S_{all}} \frac{f_x(S)}{{M \choose |S|}(M-|S|)}
\end{align}

and since

\begin{align}
\sum_{S \subseteq S_{all}} \frac{1}{{M \choose |S|}(M-|S|)} = 1
\end{align}

we see that the classic Shapley value computation is a difference between two weighted means. This means that $\phi_i(f,x)$ is a linear function of the vector of all function output values $f_x(S)$. So if any linear estimator that produces the correct answer for any basis of the output vector space must produce the right answer for all possible function outputs. Since the Shapley kernel estimation method is weighted a linear regression, it is also linear in the function output vector. We use this property below to show that the Shapley kernel regression method produces the correct Shapley values for all possible functions up to a given size (how large we can verify is determined by our computational resources). Given the perfect agreement for all functions of the sizes we verify, it is almost certain that such agreement continues for problem sizes we cannot enumerate.

As an additional example (but not proof) of accuracy we also generate several random functions with more than 10 inputs and show perfect agreement.

## Validate that the Shapley kernel is correct for all functions of a given input dimension

In [2]:
# import the Julia libraries we will use
using Iterators
using ProgressMeter

In [3]:
function classic_shapley(x, f, X, ind)
    M = length(x)
    val = 0.0
    sumw = 0.0
    for s in subsets(setdiff(1:M, ind))
        S = length(s)
        w = factorial(S)*factorial(M - S - 1)/factorial(M)
        tmp = copy(X)
        for i in 1:size(X)[2]
            tmp[s,i] = x[s]
        end
        #println(tmp)
        y1 = mean(f(tmp))
        tmp[ind,:] = x[ind]
        y2 = mean(f(tmp))
        val += w*(y2-y1)
        sumw += w
    end
    @assert abs(sumw - 1.0) < 1e-6
    val
end

function shapley_kernel_weight(M, x)
    s = length(x)
    if s == M || s == 0
        return 1e9
    end
    (M-1)/(binomial(M,s)*s*(M-s))
end

function kernel_shapley(x, f, X)
    M = length(x)
    X = zeros(M,2^M)
    w = zeros(2^M)
    fnull = f(zeros(M,1))[1]
    fx = f(x)[1]
    count = 1
    for subset in subsets(1:M)
        X[subset,count] = 1
        w[count] = shapley_kernel_weight(M, subset)
        count += 1
    end
    cX = X[1:M-1,:] .- X[M:M,:]
    y = f(X) - fnull - (fx - fnull)*X[M,:]
    
    tmp = cX'.*w
    b = inv(tmp'*cX')*tmp'*y
    [b; (fx-fnull) - sum(b)]
end

function single_point_model(mask)
    x->[convert(Float64, all(x[:,i] .== mask)) for i in 1:size(x)[2]]
end

single_point_model (generic function with 1 method)

In [4]:
for M in 1:10
    mask = zeros(M)
    X = zeros(M, 1)
    x = ones(M, 1)
    masks = zeros(M, 2^M)
    count = 1
    @showprogress for s in subsets(1:M)
        mask[:] = 0
        mask[s] = 1
        masks[:,count] = mask
        count += 1
        f = single_point_model(mask)
        classic_vals = [classic_shapley(x, f, X, i) for i in 1:M]
        kernel_vals = kernel_shapley(x, f, X)
        if norm(kernel_vals - classic_vals) > 1e-12
            error("Mismatch!")
        end
    end
    println("All functions of input dimension $M are correct!")
end

Progress: 100%|█████████████████████████████████████████| Time: 0:00:03
All functions of input dimension 1 are correct!
All functions of input dimension 2 are correct!
All functions of input dimension 3 are correct!
All functions of input dimension 4 are correct!
All functions of input dimension 5 are correct!
Progress: 100%|█████████████████████████████████████████| Time: 0:00:00
All functions of input dimension 6 are correct!
Progress: 100%|█████████████████████████████████████████| Time: 0:00:01
All functions of input dimension 7 are correct!
Progress: 100%|█████████████████████████████████████████| Time: 0:00:05
All functions of input dimension 8 are correct!
Progress: 100%|█████████████████████████████████████████| Time: 0:00:25
All functions of input dimension 9 are correct!
Progress: 100%|█████████████████████████████████████████| Time: 0:01:44
All functions of input dimension 10 are correct!


## Verify that several random models of higher input dimensions are also correct

In [5]:
function gen_random_model(M)
    model = Dict()
    for k in subsets(collect(1:M))
        model[k] = randn()
    end
    model
end

gen_random_model (generic function with 1 method)

In [9]:
@showprogress for M in 1:18
    srand(M)
    model = gen_random_model(M)
    X = zeros(M, 1)
    x = randn(M, 1)
    f = x->[model[find(x[:,i])] for i in 1:size(x)[2]]
    v = norm([classic_shapley(x, f, X, i) for i in 1:M] - kernel_shapley(x, f, X))
    assert(v < 1e-12)
end

Progress: 100%|█████████████████████████████████████████| Time: 0:00:54
