Skip to content

ADVI, NUTS and Metropolis produce significantly different results #1163

@kayhan-batmanghelich

Description

@kayhan-batmanghelich

Dear PyMC developers,

I am trying to develop simple mixed effect model using pymc (see the code below). I have tried NUTS, ADVI, Metropolis for inference. Aside from variations in speed for this model (Time : AVDI ~= Metrolopis << NUTS ), the results are significantly different. To be more specific, I am interested to estimate $h^2$ which is basically proportion of variance explained by the random effect.

This is the code:

import numpy as np, pymc3 as pm, theano.tensor as T


mixedEffect_model = pm.Model()

with pm.Model() as mixedEffect_model:
    ### hyperpriors
    tau = pm.Gamma('tau', 1.e-3, 1.e-3)
    sigma2 = 1.0/tau
    h2 = pm.Uniform('h2')
    beta_0 = pm.Uniform('beta_0', lower=-1000, upper=1000)   # a replacement for improper prior
    w = pm.Uniform('w', lower=-1000, upper=1000, shape=(M,1))   # a replacement for improper prior

    z = pm.Normal('z', mu=0, sd= (h2*sigma2)**0.5 , shape=(N,1))
    g = T.dot(L,z)

    y = pm.Normal('y', mu = beta_0+g + T.dot(X,w), sd= ((1-h2)*sigma2)**0.5 , shape=(N,1) , observed=y )

ADVI:

with mixedEffect_model:
    v_params = pm.variational.advi(n=5000)

with mixedEffect_model:
    trace = pm.variational.sample_vp(v_params, draws=5000)

_ = hist(trace['h2'],100)

image

Metropolis:

with mixedEffect_model:
    start = pm.find_MAP()
    step = pm.Metropolis()
    trace = pm.sample(5000, step, start=start)

_ = hist(trace['h2'],100)

image

NUTS (slow):

with mixedEffect_model:
    start = pm.find_MAP()
    step = pm.NUTS()
    trace = pm.sample(5000, step, start=start)

_ = hist(trace['h2'],100)    

image

Given that w and beta_0 are treated as parameter with (approximately) improper prior, I expect the results to be very close to maximum likelihood estimation. Using maximum likelihood estimation (more specifically Restricted maximum Likelihood estimation), I expect values around 0.5 for $h^2$ with standard error of around 0.07. The results is very different. It is very different from ML and also very different across different inference engines. NUTS is closer to ML but given that the results are drastically different, I don't know how to explain it.

Any idea?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions