## A Simple Variational Approximation for the Shape Parameter of a Gamma Distribution

Let
$$
y_i \sim \mathcal{G}(y\mid\alpha,1),\hspace{1cm}\alpha \sim \mathcal{G}(\alpha \mid a, b)
$$
such that
$$
f(y,\alpha) \propto \prod_{i = 1}^n\mathcal{G}(y_i\mid\alpha,1)\times\mathcal{G}(\alpha\mid a,b)
$$
and assume a variational model such that $\lambda = \log(\alpha)$, and
$$
\lambda \sim \mathcal{N}(\lambda\mid\mu,\sigma)
$$

$$
\begin{aligned}
f(y,\lambda) &= \prod_{i = 1}^n \left[\frac{1^{e^{\lambda}}}{\Gamma(e^{\lambda})}y_i^{e^{\lambda} - 1}e^{-be^{\lambda}}\right]\times \frac{b^a}{\Gamma(a)}e^{\lambda a}e^{-be^{\lambda}}\\
&= \frac{1}{\Gamma^n(e^{\lambda})}\left(\prod_{i = 1}^n y_i\right)^{e^{\lambda} - 1}e^{-\sum_{i = 1}^n y_i}\frac{b^a}{\Gamma(a)}e^{a\lambda}e^{-be^{\lambda}}\\
\log f(y,\lambda) &= -n\log\Gamma(e^{\lambda}) - (e^{\lambda} - 1)\sum_{i = 1}^n\log(y_i) - \sum_{i = 1}^n y_i + a\log(b) - \log\Gamma(a) + a\lambda - be^{\lambda}\\
&= -n\log\Gamma(e^{\lambda}) - \left(\sum_{i = 1}^n\log(y_i)\right)(e^{\lambda} - 1) + \sum_{i = 1}^n y_i + a\log(b) - \log\Gamma(a) + a\lambda - be^{\lambda}\\
&= -n\log\Gamma(e^{\lambda}) - \left(\sum_{i = 1}^n\log(y_i) + b\right)e^{\lambda} + a\lambda + C_{f}
\end{aligned}
$$

Where $C$ denotes a constant with respect to $\lambda$.  To unbound $\sigma$, let $\tau = \log\sigma$.  Then do the same for $q_{\mu,\tau}(\lambda)$

$$
\begin{aligned}
q_{\mu,\tau}(\lambda) &= \frac{1}{\sqrt{2\pi}e^{\tau}}\exp\left\lbrace-\frac{1}{2e^{2\tau}}(\lambda - \mu)^2\right\rbrace\\
\log q_{\mu,\tau}(\lambda) &= -\frac{1}{2}\log(2\pi) - \tau - \frac{1}{2}e^{-2\tau}(\lambda - \mu)^2\\
&= - \tau - \frac{1}{2}e^{-2\tau}(\lambda - \mu)^2 + C_q
\end{aligned}
$$

So our objective function, $h_{\mu,\tau}(\lambda)$ is then $\log f(y,\lambda) - \log q_{\mu,\tau}(\lambda)$

$$
h_{\mu,\tau}(\lambda) = -n\log\Gamma(e^{\lambda}) - \left(\sum_{i = 1}^n\log(y_i) + b\right)e^{\lambda} + a\lambda + \tau + \frac{1}{2}e^{-2\tau}(\lambda - \mu)^2 + C
$$
Then, taking the derivative with respect to $\lambda$, we get:
$$
\Delta_{\lambda}h_{\mu,\tau}(\lambda) = -ne^{\lambda}\Psi(e^{\lambda}) - \left(\sum_{i = 1}^n\log(y_i) + b\right)e^{\lambda} + a - e^{-2\tau}(\lambda - \mu)
$$

Finally, letting $\lambda = g(\mu,\tau,\varepsilon)$ where $\varepsilon \sim \mathcal{N}(0,1)$, the reparameterization gradient estimator is reached as

$$
\Delta_{\mu,\tau} \mathcal{L}(\mu,\tau) = \text{E}_{\varepsilon}\left[\Delta_{\mu,\tau}g(\mu,\tau,\varepsilon)^T\Delta_{\lambda}h_{\mu,\tau}(\lambda)\right]
$$

### Implementation

Setup

In [1]:
import numpy as np
import scipy as sp
import sklearn as skl

from scipy.special import digamma

true_alpha = 3.5
sample_size = 100

# data generation
Y = np.random.gamma(shape = true_alpha, scale = 1, size = sample_size)
sum_log_y = np.log(Y).sum()

Gradient

In [None]:
def delta_h(epsilon, mu, tau, a, b):
    lam = np.exp(tau) * epsilon + mu
    alp = np.exp(lam)
    
    out = np.zeros(epsilon.shape[0])
    out -= sample_size * alp * digamma(alp)
    out -= (sum_log_y + b) * alp
    out += a
    out -= np.exp(-2 * tau) * (lam - mu)
    return out.reshape((-1,1))

def delta_g(epsilon, mu, tau):
    return np.array((1, np.exp(tau) * epsilon)).reshape((1, -1))

def objective(args):
    epsilon = np.random.normal(20)
    mu, tau = args
    out = delta_g(epsilon, mu, tau) * delta_h(epsilon, mu, tau)
    return out.mean(axis = 0)