Let's try some examples, to try to clarify what is happening.

If we draw a random matrix, without any priors, then there are $2^{N K^+}$ possible arrangements of 1s and 0s, given we know $K^+$.

But in fact when we draw random matrix, with the infinite prior, there is no restriction on $K^+$, and we need to consider the probability of a matrix compared with any possible matrix drawn, including the ones where $K^+$ is really large, albeit with very tiny probability.

Comparing with the earlier equation for the infinite limit of $P(\mathbf{Z})$, it looks like the tutorial is asserting that:

$$
P(\mathbf{Z})_{IBP} =
\frac
 {\prod_{h=1}^{2^N-1}K_h!}
  {\prod_{i=1}^N K_1^{(i)}!}
P([\mathbf{Z}])
$$

Let's denote the original $P(\mathbf{Z})$, the one drawn from binomial features, conditionally independent on $\pi_k$, as $P(\mathbf{Z})_{BF}$.  And the probability of matrices drawn using the Indian Buffer Process, is $P(\mathbf{Z})_{IBP}$.  So, we have:

$$
P(\mathbf{Z})_{IBP} =
\frac
 {\prod_{h=1}^{2^N-1}K_h!}
  {\prod_{i=1}^N K_1^{(i)}!}
P([\mathbf{Z}])_{BF}
$$

From earlier, we have:

$$P([\mathbf{Z}])_{BF} = \frac{K!}{\prod_{h=0}^{2^{N}-1}K_h!} P(\mathbf{Z})_{BF}$$

So, this means that:

$$
P(\mathbf{Z})_{IBP} =
\frac{K!}
 {K_0!\prod_{i=1}^N K_1^{(i)}!}
P(\mathbf{Z})_{BF}
$$

... but I cant quite see how this arises.  Perhaps, the result that the tutorial gives for $P(\mathbf{Z})_{IBP}$ is derived from first principles?  Let's try that.

For the first customer, we draw Poisson$(\alpha)$ features.  Using the $K_1^{(i)}$ notation, the probability of the first draw is, from Poisson distribution:

$$
\frac{\alpha^{K_1^1}\exp(-\alpha)}{K_1^{(1)}!}
$$

For the second customer, we have the probability of drawing 1 and 0 for each of the existing dishes, and then the probability of drawing the next new dishes.  The number of existing dishes is: $K_1^{(1)}$.  And the probability of drawing... well...we need to know which features customer drew draw 1 for, and which ones they drew 0 for.  And looking at the final expression, we only have $K_1^{(i)}$ and $m_k$ to work with.  But we can calculate the draw for each feature using $m_k$.  Using $z_{2,k}$ for the draw of each feature for customer 2 we have:

$$z_{2,k} = m_{\dots} \dots$$

.. .wait, that wont work, because what we'd really need is the partial history of each feature.  But, what about, we simply calculate the probability of the history of each feature, after the time it's been drawn?  For this we just need to know:

- when that feature was first drawn, so we know how long the history is, and
- how many 1s in that history, which is $m_k$

For the features drawn by customer 1, the length of the history, excluding the first 1, is $N - 1$.  And there are exactly $m_k - 1$ 1s in the history after the first customer, and exactly $N - m_k$ 0s.  So the probability of drawing the rest of the history, given the first customer drew a 1, i, binomial distribution.  We need some notation for the feature vector following customer 1, but we can just use the full history vector, conditional on customer 1 drawing a 1.  And let's use $\mathbf{z}_{*,k}$ to denote the full history of feature k.  Then we have:

$$
P(\mathbf{z}_{*,k} \mid z_{1,k} = 1)
= \mathrm{Binomial}(\mathbf{z}_{*,k}; \dots)
$$

... hmmm.... Still wont work, because the probability in the Binomial distribution is not fixed, eg $\alpha$, but changes over time, ie $\frac{m_{k_{i-1}} }{ i }$.

... but it seems like there's not really any other way of figuring out the "probability of any particular matrix being produced by this process", so let's continue.  Let's introduce some new notation: $m_{i,k}$ is the number of times feature $k$ has been chosen by customers up to, and including, customer $i$.  Looking at the probability of the choice of dishes by customer 2, that were already chosen by customer 1, first let's write down the probability of choosing each dish.  It is:

$$
\frac
  {m_{i-1,k}}
  {i}
$$
$$
= \frac
  {m_{1,k}}
  {2}
$$

The probability of the feature vector for customer 2, for $k \le K_1^{(1)}$, is

$$
P(\mathbf{z}_{2,k \le K_1^{(1)}}) =
\prod_{j=1}^{K_1^{(1)}} 
  \left(
    \frac
      {m_{1,k}}
      {2}
   \right)^{z_{2,j}}
  \left(
    1 - 
    \frac
      {m_{1,k}}
      {2}
   \right)^{1 - z_{2,j}}
$$

Or, for all customers, just the probability of each customer sampling existing dishes, not including the Poisson terms yet, the probability is:

$$
P(\mathbf{z}_{*,k \le K_{1}^{(i-1)}}) =
\prod_{i=2}^N
\prod_{p=1}^{i-1}
\prod_{j=1}^{K_1^{p}} 
  \left(
    \frac
      {m_{i-1,j}}
      {i}
   \right)^{z_{i,j}}
  \left(
    1 - 
    \frac
      {m_{i - 1,j}}
      {i}
   \right)^{1 - z_{i,j}}
$$

Note that there is no prior to add for any of the parameters.  In the original formulation, we were sampling from a Bernoulli distribution, parameterized on $\pi_k$, which had a prior with hyperparameter $\alpha$.  But here we are directly sampling based on $\alpha$, and don't need to integerate over $\pi_k$.  So this is the complete formulation for the probability of a matrix.  I think/hope.  Except, we do need to add in the Poisson draw probabilities.  The Poisson draw probability for customer $i$ is:

$$
\mathrm{Poisson}\left(
  K_1^{(i)}; \frac{\alpha}{i}
\right) =
\frac{(\alpha/i)^{K_1^{(i)}}\exp(-\alpha/i)}{K_1^{(i)}!}
$$

So the full probability is:

$$
P(\mathbf{Z})_{\mathrm{IBP}} =
\prod_{i=1}^N 
  \frac{(\alpha/i)^{K_1^{(i)}}\exp(-\alpha/i)}{K_1^{(i)}!}
\cdot
\prod_{i=2}^N
\prod_{p=1}^{i-1}
\prod_{j=1}^{K_1^{p}} 
  \left(
    \frac
      {m_{j-1,j}}
      {i}
   \right)^{z_{i,j}}
  \left(
    1 - 
    \frac
      {m_{j - 1,j}}
      {i}
   \right)^{1 - z_{i,j}}
$$

Looking at the left hand term, ie:

$$
\prod_{i=1}^N 
  \frac{(\alpha/i)^{K_1^{(i)}}\exp(-\alpha/i)}{K_1^{(i)}!}
$$


... we can factorize this out, since it's just products of things, and also bearing in mind that:

$$
\left( \frac{a}{b} \right)^n = \frac{a^n}{b^n}
$$

into:

$$
\prod_{i=1}^N \alpha^{K_1^{(i)}}
\cdot
\prod_{i=1}^N \frac{1}
             {i^{K_1^{(i)}}}
\cdot
\prod_{i=1}^N \exp(-\alpha/i)
\cdot
\prod_{i=1}^N
  \frac{1}
     {K_1^{(i)}!}
$$

Going through each of these terms in turn:

$$
\prod_{i=1}^N \alpha^{K_1^{(i)}}
$$
$$
=\alpha^{\sum_{i=1}^N K_1^{(i)}}
$$
$$
=\alpha^{K^+}
$$

This term figures in the expression in the tutorial, so this seem like a good start.  Let's look at the next term:

$$
\prod_{i=1}^N \frac{1}
             {i^{K_1^{(i)}}}
$$

Not sure, but the fourth term can be combined with the first term to give:

$$
\frac{\alpha^{K^+}}
  {\prod_{i=1}^N K_1^{(i)}!}
$$

... which is the first term in the tutorial expression.

The third term, ie:

$$
\prod_{i=1}^N \exp(-\alpha/i)
$$

Can be written as:
$$
\exp(-\sum_{i=1}^N \alpha/i)
$$
$$
=\exp \left(-\alpha\sum_{i=1}^N \frac{1}{i} \right)
$$
... which is the second term in the tutorial expression.  So, currently we have:

$$
\frac{\alpha^{K^+}}
  {\prod_{i=1}^N K_1^{(i)}!}
\cdot
\exp \left(-\alpha\sum_{i=1}^N \frac{1}{i} \right)
\cdot
\prod_{i=1}^N \frac{1}
             {i^{K_1^{(i)}}}
\cdot
\prod_{i=2}^N
\prod_{p=1}^{i-1}
\prod_{j=1}^{K_1^{p}} 
  \left(
    \frac
      {m_{i-1,j}}
      {i}
   \right)^{z_{i,j}}
  \left(
    1 - 
    \frac
      {m_{i - 1,j}}
      {i}
   \right)^{1 - z_{i,j}}
$$
Somehow, these last two terms should combine presumably into a product over $k$ of factorial terms.

Let's look at the second part of the expression for the probability of the draws from the IBP:

$$
\prod_{i=2}^N
\prod_{p=1}^{i-1}
\prod_{j=1}^{K_1^{p}} 
  \left(
    \frac
      {m_{i-1,j}}
      {i}
   \right)^{z_{i,j}}
  \left(
    1 - 
    \frac
      {m_{i - 1,j}}
      {i}
   \right)^{1 - z_{i,j}}
$$

We should be able to combine the individual products over features, into something that uses $m_k$ or similar.  $m_k$ is over an entire feature, for all $1 \le i \le N$, though, whereas currently the expression is over all $k$, for specific values of $i$.  So we somehow need to rotate/pivot this product over products of products.

Let's try factorizing over $k$.  All the products here lie within the support $1 \le k \le K^+$, so we can simply consider over this support:

$$
=\prod_{k=1}^{K^+}
\prod_{i=2}^N
\prod_{p=1}^{i-1}
\prod_{j=1}^{K_1^{p}} 
  \left(
    \frac
      {m_{i-1,j}}
      {i}
   \right)^{z_{i,j}}
  \left(
    1 - 
    \frac
      {m_{i - 1,j}}
      {i}
   \right)^{1 - z_{i,j}}
$$

Let's first factorize it into two parts, for simplicity.  We can combine them later:

$$
\prod_{i=2}^N
\prod_{p=1}^{i-1}
\prod_{j=1}^{K_1^{p}} 
  \left(
    \frac
      {m_{i-1,j}}
      {i}
   \right)^{z_{i,j}}
\cdot
\prod_{i=2}^N
\prod_{p=1}^{i-1}
\prod_{j=1}^{K_1^{p}} 
  \left(
    1 - 
    \frac
      {m_{i - 1,j}}
      {i}
   \right)^{1 - z_{i,j}}
$$

And let's move the $1/i$ outside:

$$
\frac{}
  {}
$$

Let's look first at the first term in this factorization, and let's first convert the innermost two products into sums, since $p$ doesnt appear in the terms at all, and $j$ only appears in the exponent, not in the bases.  In fact, let's turn it into the exponential of a log, to simplify notation somewhat:

$$
= \exp \left(
\sum_{i=2}^N
\sum_{p=1}^{i-1}
\sum_{j=1}^{K_1^{p}} 
\log\left(
  \left(
    \frac
      {m_{i-1,j}}
      {i}
   \right)^{z_{i,j}}
\right)
\right)
$$

$$
= \exp \left(
\sum_{i=2}^N
\sum_{p=1}^{i-1}
\sum_{j=1}^{K_1^{p}} 
\left(
z_{i,j}
\log
\frac
      {m_{i-1,j}}
      {i}
\right)
\right)
$$

$$
= \exp \left(
\sum_{i=2}^N
\frac{1}
  {\log(i)}
\sum_{p=1}^{i-1}
\sum_{j=1}^{K_1^{p}} 
\left(
z_{i,j}
\log
  m_{i-1,j}
\right)
\right)
$$


Since we need everything factorized over $k$ essentially, let's start from that approach?  We'll have a Poisson draw for each customer, of new features, but for each of the features chosen at a certain depth, $i$, we can try to calculate a probability just for one of those individual features, given the fact that it was first added at depth $i_{k}$, that there are $N$ customers in total, the $m_k$ total, and $\alpha$.

If feature $k$ was chosen at depth $i_k$, that means:

- number of 1s, excluding the first 1, is $m_k - 1$
- number of 0s, after the first 1, is $N - i_k - m_k + 1$

(check, eg:

- 5 customers
- feature chosen first by customer 3
- $m_k$ = 2, let's say
- then number of 1s, after customer 3, = $m_k - 1$ = 1
- number of 0s after being chosen = 1 = $5 - 3 - 2 + 1$

)

The probability of this feature $k$, given it was chosen by customer at depth $i_k$ first will be:

$$
\prod_{i=i_k + 1}^N
  \left(
    \frac
      {m_{i - 1,j}}
      {i}
   \right)^{z_{i,k}}
  \left(
    1 - 
    \frac
      {m_{i - 1,j}}
      {i}
   \right)^{1 - z_{i,k}}
$$
...where $m_{i-1,k}$ is the number of times the feature was chosen, up to and including customer $i-1$.  Let's take out the factor of $1/i$.  Since the factor is applied for all features in the loop, and exactly one of the two terms is non-1 for each value of $i$, so it simply becomes, outside the loop:

$$
=
\prod_{i=i_k + 1}^N
  \frac{1}
    {i}
\cdot
\prod_{i=i_k + 1}^N
  \left(
      m_{i - 1,j}
   \right)^{z_{i,k}}
  \left(
    i - 
      m_{i - 1,j}
   \right)^{1 - z_{i,k}}
$$

$$
=
  \frac{1}
    {\prod_{i=i_k + 1}^Ni}
\cdot
\prod_{i=i_k + 1}^N
  \left(
      m_{i - 1,j}
   \right)^{z_{i,k}}
  \left(
    i - 
      m_{i - 1,j}
   \right)^{1 - z_{i,k}}
$$

$$
=
  \frac{i_k!}
    {N!}
\cdot
\prod_{i=i_k + 1}^N
  \left(
      m_{i - 1,j}
   \right)^{z_{i,k}}
  \left(
    i - 
      m_{i - 1,j}
   \right)^{1 - z_{i,k}}
$$

Let's try factorizing out the two products:

$$
=
  \frac{i_k!}
    {N!}
\cdot
\prod_{i=i_k + 1}^N
  \left(
      m_{i - 1,j}
   \right)^{z_{i,k}}
\cdot
\prod_{i=i_k + 1}^N
  \left(
    i - 
      m_{i - 1,j}
   \right)^{1 - z_{i,k}}
$$

For the middle term, although the $m$ and $z$ terms contain $i$, note that:

- there will be exactly $m_k - 1$ occurrences where $z$ is 1
- the $m$ values will follow the sequence $1, \dots, m_k - 1$

Therefore the middle term is:

$$
\prod_{i=1}^{m_k - 1} i
$$

$$
=(m_k - 1)!
$$
