# Usage

Here we describe how to use the `gmmfun` package, which estimates population parameters via the generalized method of moments.  The moments we use are generated by the moment generating function (MGF).  We take advantage of automatic differentiation, which makes computing these moments trivial.  A more detailed description can be found in the README.  Below, we provide a brief overview of automatic differentiation before diving into an example.

## Automatic differentiation

We use [Jax](https://jax.readthedocs.io/en/latest/index.html) for automatic differentiation (AD).  It is straightforward to use.  Below, we define the moment generating function and cumulant generating function of a normal distribution with mean and standard deviation $\theta = (\mu, \sigma)$.

We can then compute the gradient of these functions using `grad`.  The `grad` function will take the gradient of the first argument by default, but it is possible to specifiy which argument you want to differentiate.

We compute the first and second centered moment using the CGF and then evaluate those for a given $(\mu, \sigma)$ to confirm that this indeed is working as intended.

In [2]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = [8, 3]

In [3]:
import jax.numpy as jnp
from jax import grad
from gmmfun.utils import moment_functions

In [4]:
def mgf_norm(t, theta):
    return jnp.exp(t * theta[0] + 0.5 * jnp.square(theta[1] * t))

def cgf_norm(t, theta):
    return t * theta[0] + 0.5 * jnp.square(theta[1] * t)

cgf1 = grad(cgf_norm)
cgf2 = grad(grad(cgf_norm))

In [5]:
theta0 = jnp.array([1, 2.0])
cgf1(0.0, theta0), cgf2(0.0, theta0)

(Array(1., dtype=float32, weak_type=True),
 Array(4., dtype=float32, weak_type=True))

Using automatic differentiation, it is trivial to compute any moment, though it may take a while for computing moments of higher order.  Below we employ the utility function `moment_functions` from `gmmfun` to compute the first 6 moments of a standard normal distribution.

In [11]:
[fun(jnp.array([0.0, 1.0])) for fun in  moment_functions(mgf_norm, 6)]

[Array(0., dtype=float32, weak_type=True),
 Array(1., dtype=float32, weak_type=True),
 Array(0., dtype=float32, weak_type=True),
 Array(3., dtype=float32, weak_type=True),
 Array(0., dtype=float32, weak_type=True),
 Array(15., dtype=float32, weak_type=True)]

## Code

The basics of the GMM can be found in the README.  Part of what is so amazing about the approach is its elegance and succinctness.  That same thing can be seen in the code to implement the method, which can be found [here](https://github.com/jwindle/gmmfun).  In the abstract `GmmBase` (which can be found at `src/gmmfun/gmm_base.py`), you will find a close correspondence between the mathematics described in the README and the implementation.  To use the class, one, in effect, just needs to define the functions representing the moment conditions, which we call $g_i$ in the README.  The derived classes `GmmMgf4` and `GmmCgf4` do just that to use either the first 4 moments or the first 4 cumulants for constructing moment conditions.  The [source](https://github.com/jwindle/gmmfun/blob/main/src/gmmfun/gmm_mgf_4.py) for, e.g. `GmmMgf4` makes it clear how simple this is.

We follow this approach to create several different other approaches, like `GmmMgf`, which works for an arbitrary number of moments, and `GmmMgfCenScl`, which effectively works with an arbitrary number of the standardized moments.

## Example - estimating parameters of a normal distribution

Let's estimate population parameters under a known distribution.  We use the classes `GmmMgf4` and `GmmCgf4` to estimate the population parameter using the GMM.  We have to provide the moment generating function (or cumulant generating function) and the bounds on the parameters, which in this case are the mean and standard deviation $\theta = (\mu, \sigma)$.

In [13]:
from gmmfun import GmmMgf4, GmmCgf4
from gmmfun.utils import sample_moments
from scipy.stats import norm, gamma, t, chi2, describe
from scipy.linalg import solve
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(12345)

In [15]:
N = 100
theta0 = jnp.array([2.0, 1.0])
x = rng.normal(loc=theta0[0], scale=theta0[1], size=N)

In [16]:
np.mean(x), np.std(x)

(1.9717922168272406, 0.9403885689428003)

In [17]:
lower_bounds = jnp.array([-jnp.inf, 0.0])
upper_bounds = jnp.array([ jnp.inf, jnp.inf])
bounds_norm = (lower_bounds, upper_bounds)

In [18]:
mgf_4 = GmmMgf4(mgf_norm, bounds_norm, x)
cgf_4 = GmmCgf4(cgf_norm, bounds_norm, x)

We initialize $\theta$ to the sample versions of the parameters and then initialize GMM using $W = I$.  You can see that the results are not identical, in terms of either $\theta$ or the $p$-value computed using the $J$-statistic, which is not surprising, since we are using two different sets of moment conditions.

In [20]:
theta_init = jnp.array([jnp.mean(x), jnp.std(x)])
mgf_4.opt_I(theta_init).params, cgf_4.opt_I(theta_init).params

(Array([1.9664618 , 0.95683515], dtype=float32),
 Array([2.0083153 , 0.91235435], dtype=float32))

In [21]:
mgf_4.pval, cgf_4.pval

(0.9923799848021424, 0.7975243714377289)

Following the work in the README, we update $W$, update $\theta$, and then repeat that several times using the `update_both` method.  We also have computed the condition number of the new weight matrices $W$ below.  We can see that using the MGF has much higher condition number than when using the CGF.  This is because the CGF is using something akin to centered moments whereas the MGF is using uncentered moments.  Later, we will modify the MGF approach to try to fix that, since large condition numbers make inverting matrices difficult.

In [33]:
W_new_mgf = mgf_4.update_W()
np.linalg.cond(W_new_mgf)

2811684.8

In [34]:
W_new_cgf = cgf_4.update_W()
np.linalg.cond(W_new_cgf)

254.71574

In [35]:
foo = mgf_4.update_theta() # this returns the output of the optimization
foo = cgf_4.update_theta()

In [36]:
trace_mgf_4 = mgf_4.update_both(5)
trace_cgf_4 = cgf_4.update_both(5)

After updating both several times, we see that $\hat \theta$ and $p$-values are quite similar.  Even though you can use any weight matrix in the GMM and end up with a consistent estimator, the efficiency will be much improved using an iterative estimator.  Further, it is necessary to iterate to have an estimator that corresponds to the asymptotic distribution used when computing the p-value.

In [37]:
mgf_4.theta, cgf_4.theta

(Array([2.0386884 , 0.89161164], dtype=float32),
 Array([2.0363216, 0.8911411], dtype=float32))

In [38]:
mgf_4.pval, cgf_4.pval

(0.2941754122929443, 0.2941088599195417)

Lastly, just to confirm that the large condition number using the MGF above was not due to being at the start of the interative process, we recompute it.  It is still relatively large using the MGF approach and relatively small using the CGF approach.

In [41]:
np.linalg.cond(mgf_4.W)

3029604.0

In [42]:
np.linalg.cond(cgf_4.W)

234.89946

## Scaling the moments for better numerical stability

One option for improving the numerical stability is to change the moment functions slightly, by considering the moment generating function for the centered and standardized random variable.  Here we are centering and standardizing using the momennt genereting function --- not the data!  You can see we end up with very similar p-values as above.

In [65]:
from gmmfun import GmmMgf, GmmMgfCenScl

mgf_arb = GmmMgfCenScl(mgf_norm, bounds_norm, 4, x)
mgf_arb.opt_I(theta_init).params

Array([2.0098555 , 0.91206646], dtype=float32)

In [44]:
trace_mgf_arb = mgf_arb.update_both(5)
[out.params for out in trace_mgf_arb]

[Array([2.0384412, 0.8917396], dtype=float32),
 Array([2.0380747, 0.8943017], dtype=float32),
 Array([2.0386457 , 0.89389896], dtype=float32),
 Array([2.0386457 , 0.89389896], dtype=float32),
 Array([2.0386457 , 0.89389896], dtype=float32)]

In [45]:
mgf_arb.theta, mgf_arb.pval

(Array([2.0386457 , 0.89389896], dtype=float32), 0.29381463915692074)

Here we can see that the variance components for the scaled moment conditions are for better behaved than above.

In [46]:
np.linalg.cond(mgf_arb.W)

367.90897

## Arbitrary number of moments

The classes `GmmMgf` annd `GmmMgfCenScl` works for an arbitrary number of moments, though as you increase the number of moments, you can still encounter the stability issues mentioned above that can impact the results as seen below.

In [72]:
mgf_arb = GmmMgfCenScl(mgf_norm, bounds_norm, 3, x)
mgf_arb.opt_I(theta_init).params, mgf_arb.pval

(Array([2.0098965, 0.9416739], dtype=float32), 0.6696440877221782)

In [73]:
trace_mgf_arb = mgf_arb.update_both(3)
[out.params for out in trace_mgf_arb]

[Array([1.9871845, 0.9240068], dtype=float32),
 Array([1.9830548, 0.9231035], dtype=float32),
 Array([1.9830475, 0.9230921], dtype=float32)]

In [74]:
mgf_arb.pval

0.5248617310063135

Let's repeat that using 5 moments.  A problem arises...

In [75]:
mgf_arb = GmmMgfCenScl(mgf_norm, bounds_norm, 5, x)
mgf_arb.opt_I(theta_init).params

Array([2.11102  , 0.9201831], dtype=float32)

In [76]:
trace_mgf_arb = mgf_arb.update_both(10)
[out.params for out in trace_mgf_arb]

[Array([2.219602  , 0.90107083], dtype=float32),
 Array([1.8846518, 1.0812597], dtype=float32),
 Array([2.2313654, 0.9653946], dtype=float32),
 Array([2.2713282, 0.9253201], dtype=float32),
 Array([1.8849238, 1.0928484], dtype=float32),
 Array([2.2362204, 0.9705579], dtype=float32),
 Array([2.2756574 , 0.92712903], dtype=float32),
 Array([1.8854389, 1.0935588], dtype=float32),
 Array([2.2363436, 0.9705907], dtype=float32),
 Array([2.2748668, 0.9272811], dtype=float32)]

It doesn't look like our iterations have converged yet.  Further, the p-value is much lower than we'd expect, given that we are simulating from the null.  We show in `gmmfun-example-02-asympototics` that the sample variance when using higher order moments is problematic.  Furtunately, there is a solution to that.

In [77]:
mgf_arb.pval

0.00016693623341590413

## Known asympotitic variance

It is important to reiterate this at this point: this is all enabled by automatic differentiation, which makes computing derivatives, and hence moments trivial.  We can use that same approach to compute the asympototic variance exactly, which we use in `GmmMgfAvar`.  Of course, there is a cost associated with computing the asympototic variances, which requires `2 * n_moments` derivatives --- hence it will take a moment for the code below to run.  But, ultimately, we now have sensible results using 5 moments!

In [84]:
from gmmfun import GmmMgfAvar

In [79]:
n_moments = 5
mgf_avar = GmmMgfAvar(mgf_norm, bounds_norm, n_moments, x)
mgf_avar.opt_I(theta_init).params, mgf_arb.pval

(Array([1.9627025, 0.9605625], dtype=float32), 0.00016693623341590413)

In [80]:
mgf_avar.update_W()

array([[ 3.55013072e+01, -1.66753310e+01, -1.46738932e+00,
         1.91380467e+00, -2.44883074e-01],
       [-1.66753310e+01,  3.13467481e+01, -1.69480679e+01,
         3.61172448e+00, -2.65059779e-01],
       [-1.46738932e+00, -1.69480679e+01,  1.44454145e+01,
        -4.03223179e+00,  3.64891731e-01],
       [ 1.91380467e+00,  3.61172448e+00, -4.03223179e+00,
         1.25702675e+00, -1.22232939e-01],
       [-2.44883074e-01, -2.65059779e-01,  3.64891731e-01,
        -1.22232939e-01,  1.24555741e-02]])

In [81]:
trace_mgf_avar = mgf_avar.update_both(5)
[out.params for out in trace_mgf_avar]

[Array([1.9697026 , 0.94349414], dtype=float32),
 Array([1.9714528, 0.9409395], dtype=float32),
 Array([1.9716105, 0.9404282], dtype=float32),
 Array([1.9716105, 0.9404282], dtype=float32),
 Array([1.9716105, 0.9404282], dtype=float32),
 Array([1.9716105, 0.9404282], dtype=float32),
 Array([1.9716105, 0.9404282], dtype=float32),
 Array([1.9716105, 0.9404282], dtype=float32),
 Array([1.9716105, 0.9404282], dtype=float32),
 Array([1.9716105, 0.9404282], dtype=float32)]

In [82]:
# trace_mgf_avar

In [91]:
mgf_avar.pval

0.7549822325918822

## Testing the alternative

Everything above was for testing when the null is true.  What happens when we use the gamma MGF?  We get a strong rejection.

In [88]:
def mgf_gamma(t, theta):
    return jnp.exp(-theta[0] * jnp.log(1 - theta[1] * t))

lower_bounds = jnp.array([0.0, 0.0])
upper_bounds = jnp.array([ jnp.inf, jnp.inf])
bounds_gamma = (lower_bounds, upper_bounds)

In [89]:
mgf_arb = GmmMgfCenScl(mgf_gamma, bounds_gamma, 4, x)
mgf_arb.opt_I(theta_init).params

Array([4.3639526 , 0.89897245], dtype=float32)

In [90]:
mgf_arb.update_both(5)
mgf_arb.theta, mgf_arb.pval

(Array([5.1356893, 0.6931895], dtype=float32), 0.0)

# Conclusion

The underlying code here is very minimal.  Using the power of automatic differentiation and the elegance of GMM, we were able to easily generate estimators of population parameters and measures of goodness of fit.  Be sure to check out the source code at <https://github.com/jwindle/gmmfun>.