### Overview and notation

Given a mixture distribution $H$ with global mean $\mu$ and global variance $\sigma^2$, we wish to find a parameterization $(w_i, \mu_i, \sigma_i)$ for $1\leq i \leq k$ the $k$ component distributions where $w_i$ is the weight of the $i^{\text{th}}$ component distribution. For $H$ to be well defined, the following two properties should hold:

$$\tag{1}\operatorname{E}[X]=\sum_{i=1}^kw_i\mu_i$$

$$\tag{2}\operatorname{E}[(X-\mu)^2]=\sum_{i=1}^kw_i(\mu_i^2+\sigma^2_i)-\mu^2$$

where $X$ is a random variable drawn from $H$.

#### Assumption of equally weighted components

In our problem, we assume that each game state has an associated distribution representative of the reward you would accumulate from playing the game from the state until the end of the game. We make the assumption of the following method of sampling this distribution for rewards. 

Given the current state, we choose one of the state's $k$ children with uniform probability. This is repeated until a terminal state in the tree is reached. A contrasting method would be to calculate the number of leaf nodes under each of the $k$ children, $n_i$. By sampling uniformly at random one of leaf nodes in the tree, each of the immediate children has a likelihood of $\frac{n_i}{\sum_{i=1}^k{n_i}}$ of being chosen as the next move. Because we have no way of calculating the number of leaf nodes under each of the $k$ children in most cases, we instead use the first sampling method.

This means that each of $k$ component distributions has a weight of $\frac{1}{k}$ and, therefore, $w_i = \frac{1}{k}\text{ }\forall i$.

### Reparameterization

###### Reparameterization 1
Our original parameterization of $H$ was $\mathbf{\theta} = (\mu,\sigma,w_1,\ldots,w_k,\mu_1,\ldots,\mu_k,\sigma_1,\ldots,\sigma_k)$. Since $w_i = \frac{1}{k}\text{ }\forall i$, we can simply use $(\mu,\sigma,w,\mu_1,\ldots,\mu_k,\sigma_1,\ldots,\sigma_k)$. We can further use the reparameterization suggested in Kamary et. al. [1]:

$$\theta = (\mu,\sigma,w,\alpha_1,\ldots,\alpha_k,\tau_1,\ldots,\tau_k)$$

With this parameterization, $\mu_i = \mu + \sigma\alpha_i$ and $\sigma_i = \tau_i\sigma$ where $\tau_i>0$ and $\alpha_i\in\mathbb{R}$. In this way, $\alpha_i$ shifts the mean of each component by a factor of $\sigma$. $\tau_i$ scales $\sigma_i$ by a factor of the global standard deviation. By (3) and (4) we know that $\alpha_i$ and $\tau_i$ are constrained by:

$$\tag{3}\sum_{i=1}^kw\alpha_i=0$$
$$\tag{4}\sum_{i=1}^kw\tau_i^2+\sum_{i=1}^kw\alpha_i^2=1$$

##### Reparameterization 2

If we set $\alpha_i=\frac{\gamma_i}{\sqrt{w}}$ and $\tau_i=\frac{\eta_i}{w}$ we know:

$$\tag{5}\sum_{i=1}^k\gamma_i=0$$
$$\tag{6}\sum_{i=1}^k\gamma_i^2+\eta_i^2=1$$

By introducing a supplementary radius parameter, $\varphi$, we can separate $\vec{\gamma}$ and $\vec{\eta}$ into two spherical constraints.

$$\tag{7}\sum_{i=1}^k\gamma_i^2=\varphi^2$$
and
$$\tag{8}\sum_{i=1}^k\eta_i^2=1-\varphi^2$$

After this reparameterization, we need to do the following. First, $\varphi^2$ needs to be drawn from a distribution such that $0\leq\varphi^2\leq 1$. For this, we can use a beta distribution, which has support on $[0,1]$. With $\varphi^2$ drawn, we need to to randomly draw a point on a hypersphere of radius $\sqrt{1-\varphi^2}$. This point is $\vec{\eta}$. Then, we need to draw a point on a hypersphere of radius $\varphi$ which also lies on the hyperplane given by $\sum_{i=1}^k\gamma_i=0$.

