# Sampling strategy for plotting

When we want to plot partial dependency plots, how do we sample across the non-plotted
dimensions?

We want to plot NxN pixels. We do this by sampling each dimension into N parts, and 
taking the average MxM cells before we plot. We do this for O dimensions. The total 
amount of sampled points are P. What is the probability that one cell in one of the
required plots is empty?

## Number of plots, number of sub-areas

We have O 1-D plots with N parts, and (O^2-O)/2 2D plots with N^2 parts.

## Completely random samples.

For the 1D plots, the probability that one sample is not in a given part is 1-1/N. The 
probability that no samples is in a given part is (1-1/N)^P. The probability that no
sample is within at least one of the MxM parts that the average is taken over is
((1-1/N)^P)^(M^2)) = (1-1/N)^(PM^2). 

The probability that there is at least one sample within one of the MxM parts that the 
average is taken over is 1-(1-1/N)^(PM^2). The probability that all N parts of the 1D
plot have samples is roughly (1-(1-1/N)^(PM^2))^N. The probability that this is the case
for all O plots is (1-(1-1/N)^(PM^2))^(ON).

For the 2D plots, the probability that one sample is not in a given part is 1-1/N^2. The
probability that no samples is in a given part is (1-1/N^2)^P. The probability that no
sample is within at least one of the MxM parts that the average is taken over is
((1-1/N^2)^P)^(M^2)) = (1-1/N^2)^(PM^2). 

The probability that there is at least one sample within one of the MxM parts that the 
average is taken over is 1-(1-1/N^2)^(PM^2). The probability that all N^2 parts of the 2D
plot have samples is roughly (1-(1-1/N^2)^(PM^2))^(N^2). The probability that this is the case
for all (O^2-O)/2 plots is (1-(1-1/N^2)^(PM^2))^(N^2*(O^2-O)/2). For small (1-1/N^2)^(PM^2)),
this is approximately ((1-1/N^2)^(PM^2))*(N^2(O^2-O)/2).

If we have enough samples that each part of the 2D plot has R samples on average, we
need RN^2 samples. The full expression then becomes ((1-1/N^2)^(RN^2M^2))*(N^2(O^2-O)/2)
= (((1-1/N^2)^(N^2))^(RM^2))*(N^2(O^2-O)/2), which is approxamtely 
e^(-RM^2)(N^2(O^2-O)/2). This is basically e^(-samples averaged over). Since we probably
want to average over at least 100 samples to reduce sampling noise, this should be
tiny, so the risk of not having anything to plot is negligible.

In [66]:
N = 20
oversampling = 10
M = 2
O = 10
P = N**2*oversampling
(1-1/N**2)**(P*M**2)*(N**2*(O**2-O)/2)

7.273479944572843e-14

In [56]:
(1-1/(N/M)**2)**(P)*(N**2*(O**2-O)/2)

6.252459615354868e-14

In [2]:
(1-1/N**2)**(P*M**2)

6.048519205098133e-13

In [3]:
(1-1/(N/M)**2)**(P)

5.636181281797957e-13

In [58]:
import math
math.e**(-oversampling*M**2)*(N**2*(O**2-O)/2)

7.647037659524876e-14

In [60]:
math.e**-40

4.248354255291598e-18