# Posterior derivation for dice throwing

## I. Likelihood function for dice and multinomial distribution

The number of outcomes is the main difference between the coin-throws and dice throwing. Therefore, we need to fix probabilities for each potential outcome:

\begin{align}
p[i]=p_i 
\end{align}

such that

\begin{align}
p_1+\cdots p_m=1\enspace.
\end{align}

It is easy to assign probability for the configuration where we first get $k_1$ ones, $k_2$ twos and so on

\begin{align}
\Pr[1,\ldots,1,2\ldots,2\ldots,m,\ldots,m]= p_1^{k_1}p_2^{k_2}\ldots p_m^{k_m}\enspace.
\end{align}

Note that any sequence of observation that has $k_1$ ones, $k_2$ twos and so on has the same probability. Therefore

\begin{align}
p[k_1,k_2,\ldots,k_m|p_1,p_2,\ldots,p_m]&= Number\_of_configuration(k_1,k_2,\ldots,k_m)\cdot p_1^{k_1}\cdot p_2^{k_2} \cdots p_m^{k_m}\\
&\propto  p_1^{k_1}\cdots p_m^{k_m}\enspace
\end{align}

where the exact number of configurations is not relevant as this is fixed by the observations and does not vary if we change unknown parameters $p_1,\ldots,p_m$.




## II. Maximum likelihood estimation

Let us again consider uniformative prior that does not prefer any paramter complect

\begin{align}
p[p_1,p_2,\ldots,p_m]\propto 1
\end{align}

Then the corresponding uninformed posterior is
\begin{align}
p[p_1,p_2,\ldots,p_m|k_1,k_2,\ldots,k_m]&\propto  p_1^{k_1}\cdots p_m^{k_m}\enspace.
\end{align}

To find the parameter assignment that maximises posterior we need to solve optimisation task

\begin{align}
F = p_1^{k_1}\cdots p_m^{k_m}\to \max
\end{align}

subject to 

\begin{align}
p_1+p_2+\cdots+p_m=1\enspace.
\end{align}

Again we can solve the simpler task 

\begin{align}
\log F = k_1\log(p_1)+ \cdots+k_m\log(1-p_1-\cdots p_{m-1})\to \max
\end{align}

subject to 

\begin{align}
p_1+p_2+\cdots+p_m=1\enspace.
\end{align}

instead. Taking partial derivatives and equating them with zero leads to equations

\begin{align}
\frac{\partial \log F}{\partial p_i} = \frac{k_i}{p_i} -  \frac{k_m}{1-p_1-\cdots p_{m-1}}
=\frac{k_i(1-p_1-\cdots- p_{m-1})- k_mp_i}{p_i(1-p_1-\cdots p_{m-1})}=0
\end{align}

This is a system of linear equations

\begin{align}
k_i = k_i(p_1+\cdots+ p_{m-1})+ k_mp_i,\qquad i=1,\ldots, m-1
\end{align}

which has a solution

\begin{align}
p_i=\frac{k_i}{k_1+\cdots+k_m}
\end{align}

that coincides with the classical probility formula.

The same formula can be derived through an unconstrained optimisation task combining log-likehood trick with the trick of Lagrange multipliers

\begin{align}
F^{*}(p_1,\ldots,p_m,\lambda)= log F(p_1,\ldots, p_m)+\lambda (p_1+\cdots+ p_m-1)\to \max\enspace.
\end{align}

## III. Reduction to coinflipping

We can always turn the dice into a coin by declaring $\text{Heads}=[\text{Dice}=i]$. By construction

\begin{align}
\Pr[\text{Heads}]=p_i
\end{align}

After that we can reuse all the derivations for the coin-flipping and get formulae for informed and uniformed posteriors. In particular, we get that maximum aposteriori estimate for uninformed person is  

\begin{align}
p_i=\frac{k_i}{k_1+\cdots+ k_m}\enspace.
\end{align}

However, it turns out that the result is incorrect as uninformed prior to coin-flipping is uniform while the same marginal prior for dice is different. In partticular, there is only one option to define $p_i=1$ while there are meny ways to define $p_i=1$: we just have to satisfy $p_1+\cdots+p_{i-1}+p_{i+1}+\cdots + p_m=1$. As a result, uniformed marginal prior for $p_i$ is not constant:

\begin{align}
p[p_i=0]&>0\\
p[p_i=1]&=0
\end{align}
and or derivation through coinflipping is incorrect. To be precise, the posterior is a valid paosterior but it does not correspond to conclusions of uniformed person rather to a person that prefers solution $p_i=1, p_j=0$ to other parameter combinations. 
