# Random Number Statistics

## Probability of single non-apocalyptic number

Consider the case of a base-10 random number of $l$ digits. Every digit is randomly selected in the range $[0,9]$.

The probability of a specific series of digits, length $n$, arising is $10^{-n}$. There are $l-n+1$ opportunities to make a match.

The probability to *not match* each time is $1-10^{-n}$. Therefore the probability to not match anywhere in this number is

$$(1-10^{-n})^{l-n+1}$$

Generalising this to base-b:

- Probability of a specific $n$ digit series is $b^{-n}$
- Probability to not match is $1-b^{-n}$
- Number of opportunities to match $l-n+1$
- Probability of not matching

$$(1-b^{-n})^{l-n+1}$$

With $p=1$ for $l<n$

## Expected number of non-apocalyptic numbers

Standard result for sum to $\infty$ is:

$$ \sum_{n=1}^{\infty} \epsilon^n = - \frac{\epsilon}{\epsilon-1} $$

So the total sum for $n$ and $l$ gives the expected number of non-apocalyptic numbers when matching against a random sequence of numbers in base $b$, with a single number of each digit length $l$, for $l \in \mathbb{N}$:

$$ m(n,b) = n-1 + \frac{1-b^{-n}}{b^{-n}} $$

### Example

For 3 digit matches in base 10:

$$m(3,10) = 2 + \frac{0.999}{0.001} = 1001

### Correction for $p^n$ numbers

When the digit sequences are generated from $p^n$ for $n \in \mathbb{N}$, there will be a different distribution of numbers that are sampled from - e.g., powers of 2 will give more $l$ digit decimals.

The correction factor will be

$$ c(p, b) =  \frac{\log b }{\log p} $$

e.g., for the case of $2^n$ represented in base 10

$$ c(2, 10) = \frac{\log 10}{\log 2} \approx 3.32193 $$

Then, modifying $m(n,b)$ for a different base $p$:

$$ m'(n,b,p) = \left( n-1 + \frac{1-b^{-n}}{b^{-n}} \right) \frac{\log b }{\log p} $$

### (Non-)Apocalyptic Numbers

For the case of $p=2$, $b=10$ and $n=3$:

$$m'(n=3,b=10,p=2) \approx 3325.25$$

![Non-apocalyptic matches 2^n, base 10, 3-digit matches](non-apocalyptic-matches-v2-base-10-power-2-seq-3-n1-63992.png)

$2^n$ does **not** result in a random sequence, but it's close enough to randomised digits (for all except the $NNN$ matches) so the statistical random non-matches do come close to the actual non-apocalypse sequence counts.

## Numerical Calculations (not using the known summation)

Take the specific example of a 3-digit match. Then the probability of a number of $l$ digits not matching is:

In [6]:
function non_match(l; n=3)
    (l < n) && return 1.0
    (1.0 - 10.0^(-n))^(l+1-n)
end

non_match (generic function with 1 method)

In [7]:
for l in [1, 3, 10, 50, 100, 500, 1_000, 5_000, 10_000, 50_000, 100_000, 100_001, 500_000, 1_000_000]
    println("p($l) -> $(non_match(l))")
end

p(1) -> 1.0
p(3) -> 0.999
p(10) -> 0.992027944069944
p(50) -> 0.9531108968798943
p(100) -> 0.9066044494080757
p(500) -> 0.607593524316293
p(1000) -> 0.36843192017940235
p(5000) -> 0.006734574374039293
p(10000) -> 4.5263828369959795e-5
p(50000) -> 1.8848653148880386e-22
p(100000) -> 3.5456153734747036e-44
p(100001) -> 3.542069758101229e-44
p(500000) -> 5.558812362627309e-218
p(1000000) -> 0.0


Note that $p(1\,000\,000)$ is so small it cannot be represented as a `Float64`!

Now, what is the probability that above some $l_{min}$ that every random number *does* contain a match for a particular string?

Here we can use the partial sum formula:

$$ \sum_{n=1}^{k} \epsilon^n = \frac{\epsilon (\epsilon^k - 1)}{\epsilon-1} $$

Hence

$$ \sum_{n=l_{min}}^{\infty} \epsilon^n =  - \frac{\epsilon}{\epsilon-1} - \frac{\epsilon (\epsilon^{l_{min}-1} - 1)}{\epsilon-1} $$