model.logp is nan #2066

denadai2 · 2017-04-21T08:44:24Z

There are two strange behaviours in this model:

tau and p get stuck (they don't fluctuate), so in consequence mu_phi is stuck
logp is nan: why??

I have the last version (master) of pymc3

import theano.tensor as tt
import pandas as pd
import numpy as np
import pickle
import pymc3 as pm
import theano.tensor as tt
from theano import scan
import theano
floatX = "float32"

from pymc3.distributions import continuous
from pymc3.distributions import distribution

class CAR(distribution.Continuous):
	"""
	Conditional Autoregressive (CAR) distribution. Ref paper: https://www.ncbi.nlm.nih.gov/pubmed/12926715
	"""

	def __init__(self, n_neigh, a, tau, lambda_par, *args, **kwargs):
		super(CAR, self).__init__(*args, **kwargs)
		self.a = tt.as_tensor_variable(a)
		self.n_neigh = tt.as_tensor_variable(n_neigh)
		self.tau = tau
		self.lambda_par = lambda_par
		self.mode = 0.

	def logp(self, x):
		denom = 1 - self.lambda_par + self.lambda_par * self.n_neigh
		tau = self.tau * denom
		num = self.lambda_par / denom

		mu_w = num * tt.sum(x * self.a, axis=1)  # tt.sum(x*a, axis=1)/tt.sum(w, axis=1)
		return tt.sum(continuous.Normal.dist(mu=mu_w, sd=tau).logp(x))


#LOAD DATA
with open('bug.pkl', "rb") as input_file:
    weights_df = pickle.load(input_file)
    X = pickle.load(input_file)
    y = pickle.load(input_file)
    
N = weights_df.values.shape[0]
amat = weights_df.values

n_neighbours = amat.sum(axis=1)


with pm.Model() as unpooled_model:
      
    # define priors, use Normal for Ridge (sd=100, weakly informative)
    b0 = pm.Normal('intercept', mu=5.4, tau=0.5)
    b1 = pm.Normal('b1_mdist_daily', mu=1, tau=1.0E-6)
    
    # random effect precision parameter
    tau = pm.Normal('tau', mu=0, tau=1.0E-6)#.Gamma('tau', alpha=0.001, beta=0.001)##
    # strength of spatial correlation
    p = pm.Uniform('p', lower=0.0, upper=1.0, testval=0)
    
    phi = CAR('mu_phi', n_neigh=n_neighbours, a=amat, tau=tau, lambda_par=p, shape=N)
    
    # define linear model
    mu = tt.exp( b0 + b1 * X['mdist_daily'] + phi) # 
    #a = pm.HalfCauchy(name='alpha', beta=10, testval=1.) 
    a = pm.Normal('alpha', mu=5.2, sd=0.5)
    home_points = pm.NegativeBinomial('home_points', mu=mu, observed=y, alpha=a)
    
    unpooled_trace = pm.sample(1000, njobs=1, step=pm.Metropolis()) #(scaling=C) , ,

Logp:

[unpooled_model.logp(pt) for pt in unpooled_trace[200:]]

[array(nan),
 array(nan),
 array(nan),
 array(nan),
 array(nan),

etc

chains:

pickle
bug.pkl.zip

The text was updated successfully, but these errors were encountered:

aseyboldt · 2017-04-21T08:54:57Z

This might be (part of) the problem:

p = pm.Uniform('p', lower=0.0, upper=1.0, testval=0)

The testval is not valid, it has to be between 0 and 1. If your master is older than a couple of hours, you might want to pull again. #2046 was merged very recently.

aseyboldt · 2017-04-21T08:58:11Z

Also, tau should be positive, right? Maybe pm.HalfCauchy('tau', beta=2) or similar.

denadai2 · 2017-04-21T09:03:42Z

No, I really have the last version ;D

even so, If i replace uniform with

BoundedNormal = pm.Bound(pm.Normal, lower=0, upper=1)
p = BoundedNormal('p', mu=0.9, sd=0.09)

I have -inf as logp -.-. This shouldn't be normal, should it?

With pm.HalfCauchy it works, and the logp is even positive XD (40K) something strange?

aseyboldt · 2017-04-21T09:25:27Z

The bounded normal on its own works fine for me. Can you try Uniform('p', lower=0, upper=1, testval=0.9)?

twiecki · 2017-04-21T10:21:05Z

@junpenglao had a CAR implementation posted in a different issue, I recommend finding that.

denadai2 · 2017-04-21T17:24:42Z

@aseyboldt same
@twiecki this is similar, but a different paper :) I have to implement this... Do you think it is normal to have nan as logp without a warning? :/ maybe yes but I don't understand

thx

aseyboldt · 2017-04-21T17:39:23Z

NUTS prints warnings if there are nans in the non-tuning part of the trace (calls them divergences), Metropolis doesn't. We have been thinking about printing Messages that are more helpful, but it is a bit tricky to do that without a huge performance hit.
You can use the nanguardmode from theano to try to find out where the first nan or inf comes from. Something like this should do it:

from theano.compile.nanguardmode import NanGuardMode
mode = NanGuardMode(nan_is_error=True, inf_is_error=True, big_is_error=True)
with model:
    step = pm.Metropolis(mode=mode)
    trace = pm.sample(1000, step=step)
    # Or if you want to use NUTS
    trace = pm.sample(2000, tune=1000, nuts_kwargs={'mode': mode})

aseyboldt · 2017-04-21T17:49:39Z

Something else that can help is to introduce Deterministics. In this case I'd try to add something like mu = Deterministic('mu', mu) after your definition of mu. With that the values of mu end up in the trace and you can check if they are reasonable (maybe NegativeBinomial blows up if mu is too large).

junpenglao · 2017-04-21T20:14:23Z

Is there a particular reason you used Metropolis? Using NUTS with ADVI init seems to run normally.

denadai2 · 2017-04-22T08:36:54Z

@junpenglao this is just a toy example. My real problem is bigger (more variables). Thus, in the bigger problem NUTS goes to ~1.5 iterations per second, and this is tooooo slow compared to 250 it/s of Metropolis. This is the only reason: trying more models...
BTW I have "FloatingPointError: NaN occurred in ADVI optimization." in the example I posted :)

@aseyboldt I don't have any warning. Am I doing something wrong?
Deterministic mu:

thx

aseyboldt · 2017-04-22T09:19:19Z

Do you mean that you got the FloatingPointError in the original model with the invalid testval for p? That would be expected.
Don't switch to Metropolis when NUTS seems slow. The metropolis sampler just doesn't work in high dimensions, you get crazy autocorrelations. You can't compare samples per s like that, you'd have to look at effective samples per second.
There aren't any nans or infs in mu? Then it looks to me like you should be fine. Do you still get errors or warnings with NUTS?

twiecki · 2017-04-22T09:45:26Z

@aseyboldt I've seen that mistake being made all the time. I wonder if we should add a warning if someone uses Metropolis or Slice on a continuous model that they should really use NUTS.

denadai2 · 2017-04-22T09:52:34Z

@twiecki :) noob mistakes. Consider it! :))

@aseyboldt sorry, my bad: I forgot to put a valid testval for p :)

With:

import theano.tensor as tt
import pandas as pd
import numpy as np
import pickle
import pymc3 as pm
%matplotlib inline
import matplotlib.pyplot as plt
import theano.tensor as tt
from theano import scan
import theano
floatX = "float32"

from pymc3.distributions import continuous
from pymc3.distributions import distribution

class CAR(distribution.Continuous):
	"""
	Conditional Autoregressive (CAR) distribution. Ref paper: https://www.ncbi.nlm.nih.gov/pubmed/12926715
	"""

	def __init__(self, n_neigh, a, tau, lambda_par, *args, **kwargs):
		super(CAR, self).__init__(*args, **kwargs)
		self.a = tt.as_tensor_variable(a)
		self.n_neigh = tt.as_tensor_variable(n_neigh)
		self.tau = tau
		self.lambda_par = lambda_par
		self.mode = 0.

	def logp(self, x):
		denom = 1 - self.lambda_par + self.lambda_par * self.n_neigh
		tau = self.tau * denom
		num = self.lambda_par / denom

		mu_w = num * tt.sum(x * self.a, axis=1)  # tt.sum(x*a, axis=1)/tt.sum(w, axis=1)
		return tt.sum(continuous.Normal.dist(mu=mu_w, sd=tau).logp(x))


#LOAD DATA
with open('bug.pkl', "rb") as input_file:
    weights_df = pickle.load(input_file)
    X = pickle.load(input_file)
    y = pickle.load(input_file)
    
N = weights_df.values.shape[0]
amat = weights_df.values

n_neighbours = amat.sum(axis=1)

from theano.compile.nanguardmode import NanGuardMode
mode = NanGuardMode(nan_is_error=True, inf_is_error=True, big_is_error=True)
with pm.Model() as unpooled_model:
      
    # define priors, use Normal for Ridge (sd=100, weakly informative)
    b0 = pm.Normal('intercept', mu=5.4, tau=0.5)
    b1 = pm.Normal('b1_mdist_daily', mu=1, tau=1.0E-6)
    
    # random effect precision parameter
    tau = pm.HalfCauchy('tau', beta=2)#pm.Gamma('tau', alpha=0.001, beta=0.001)##
    # strength of spatial correlation
    p = pm.Uniform('p', lower=0.0, upper=1.0)
    
    phi = CAR('mu_phi', n_neigh=n_neighbours, a=amat, tau=tau, lambda_par=p, shape=N)
    
    # define linear model
    mu = tt.exp( b0 + b1 * X['mdist_daily'] + phi) # 
    #mus = pm.Deterministic('mu', mu)

    a = pm.HalfCauchy(name='alpha', beta=10, testval=1.) 
    home_points = pm.NegativeBinomial('home_points', mu=mu, observed=y, alpha=a)
    #step = pm.Metropolis(mode=mode)
    
    unpooled_trace = pm.sample(2000, njobs=1, tune=1000, nuts_kwargs={'mode': mode}) #(scaling=C), step=step

With this code I have

AssertionError: Inf detected
Big value detected
NanGuardMode found an error in an input of the graph.

And de-commenting mus = pm.Deterministic('mu', mu), I have:

AttributeError                            Traceback (most recent call last)
<ipython-input-28-1bf6c57dd774> in <module>()
     70     #step = pm.Metropolis(mode=mode)
     71 
---> 72     unpooled_trace = pm.sample(2000, njobs=1, tune=1000, nuts_kwargs={'mode': mode}) #(scaling=C), step=step

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain, njobs, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, **kwargs)
    256         sample_func = _sample
    257 
--> 258     return sample_func(**sample_args)
    259 
    260 

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed, live_plot, **kwargs)
    271     try:
    272         strace = None
--> 273         for it, strace in enumerate(sampling):
    274             if live_plot:
    275                 if it >= skip_first:

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tqdm/_tqdm.py in __iter__(self)
    831 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    832 
--> 833             for obj in iterable:
    834                 yield obj
    835                 # Update and print the progressbar.

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    351         _update_start_vals(start, strace.point(-1), model)
    352     else:
--> 353         _update_start_vals(start, model.test_point, model)
    354 
    355     try:

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in _update_start_vals(a, b, model)
    466         for tname in b:
    467             if tname.startswith(name) and tname!=name:
--> 468                 transform_func = [d.transformation for d in model.deterministics if d.name==name][0]
    469                 b[tname] = transform_func.forward(a[name]).eval()
    470 

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in <listcomp>(.0)
    466         for tname in b:
    467             if tname.startswith(name) and tname!=name:
--> 468                 transform_func = [d.transformation for d in model.deterministics if d.name==name][0]
    469                 b[tname] = transform_func.forward(a[name]).eval()
    470 

AttributeError: 'TensorVariable' object has no attribute 'transformation'

junpenglao · 2017-04-22T12:41:41Z

Re-parameterization should always be the first reaction when your model is slow :)
For example, you can try to model p as a inverse logit of normal distribution, the sampler will run 10 times faster.

fonnesbeck · 2017-04-22T17:39:06Z

I'm not sure we should warn people off of Metropolis or other samplers, unless it is absolutely not appropriate to do so (e.g. NUTS for a discrete variable). I frequently get continuous models that perform poorly with NUTS and I need to drop back to Metropolis, so I would not want to be seeing warnings in that situation. If users are unsure about sampler choice they should use auto-assignment.

aseyboldt · 2017-04-22T19:47:00Z

@denadai2 Your CAR implementation seems a bit strange. I just gave the version at http://mc-stan.org/documentation/case-studies/mbjoseph-CARStan.html a shot, and it runs much faster (100it/s) and seems to converge. But the model can't really tell what the spacial correlation should be. Maybe that is because the data doesn't provide enough information somehow, or there is a bug in the logp of CAR.

import theano.sparse
import scipy.sparse

class CAR(Continuous):
    def __init__(self, alpha, adjacency, *args, **kwargs):
        if not isinstance(adjacency, np.ndarray):
            raise ValueError("Adjacency matrix is not an ndarray.")
        n, m = adjacency.shape
        if n != m or np.any(adjacency != adjacency.T):
            raise ValueError('Adjacency matrix must be symmetric.')
        if 'shape' in kwargs and kwargs['shape'] == n:
            raise ValueError('Invalid shape: Must match matrix dimension.')
        kwargs['shape'] = n
        super(CAR, self).__init__(*args, **kwargs)
        self.n = n
        self.alpha = tt.as_tensor_variable(alpha)
        adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
        self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
        self.neighbors = tt.as_tensor_variable(adjacency.sum(0))
        self.mean = tt.zeros(n)
        self.median = self.mean
        adj = adjacency.astype('d').copy()
        sqrt_neighbors = 1 / np.sqrt(adjacency.sum(0))
        adj[:] *= sqrt_neighbors[:, None]
        adj[:] *= sqrt_neighbors[None, :]
        self.eigs = scipy.linalg.eigvalsh(adj)

    def logp(self, x):
        Wx = theano.sparse.dot(self.adjacency, x.reshape((self.n, 1)))
        tau_dot_x = self.neighbors * x - self.alpha * Wx.ravel()
        logdet = tt.log(1 - self.alpha * self.eigs).sum()
        logp = 0.5 * (logdet - tt.dot(x, tau_dot_x))
        return bound(logp, self.alpha > 0, self.alpha < 1)


with pm.Model() as model:
    b0 = pm.Normal('intercept', mu=5.4, sd=2)
    b1 = pm.Cauchy('b1_mdist_daily', alpha=0, beta=2)

    # random effect precision parameter
    sd = pm.HalfCauchy('sd', beta=2)
    # strength of spatial correlation
    p = pm.Uniform('p', lower=0, upper=1)
    phi = pm.distributions.CAR('mu_phi', alpha=p, adjacency=amat)
    
    mu = tt.exp(b0 + b1 * X['mdist_daily'] + sd * phi)
    alpha = pm.HalfCauchy(name='alpha', beta=2)
    home_points = pm.NegativeBinomial('home_points', mu=mu, alpha=alpha, observed=y)
    
    trace = pm.sample(2000, tune=1000, njobs=4)

aseyboldt · 2017-04-22T19:50:10Z

@fonnesbeck @twiecki How about printing a warning if the number of dimensions is large? But I'm not sure what a reasonable cutoff would be. I vaguely remember someone saying that it usually doesn't work well if d>20, but I don't even know where I head that.

junpenglao · 2017-04-22T21:43:34Z

nice implementation @aseyboldt ! I am writing a doc on porting models in WinBugs/JAGS/STAN to pymc3, with some additional tips and heuristic - can I use some of your code as an example?

aseyboldt · 2017-04-22T21:49:48Z

@junpenglao sure. But keep in mind that I haven't tested this thoroughly.

junpenglao · 2017-04-23T05:29:11Z

@aseyboldt I will compare it with the STAN result. Will keep you posted!

denadai2 · 2017-04-25T00:43:38Z

@aseyboldt wow thanks for your code. That implementation is especially good when you have multiple CAR variables (MCAR) and it is fast. I could use it if I fail with my code :D
My code is improved thanks to you and it is based on another CAR (Leroux / BYM)

class CAR(distribution.Continuous):
    """
    Conditional Autoregressive (CAR) distribution. Ref paper: https://www.ncbi.nlm.nih.gov/pubmed/12926715
    """
    
    def __init__(self, adjacency, tau, rho, *args, **kwargs):
        super(CAR, self).__init__(*args, **kwargs)
        n, m = adjacency.shape
        self.n = n
        
        adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
        self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
        
        self.neighbors = tt.as_tensor_variable(adjacency.sum(0))
        self.mean = tt.zeros(n)
        self.median = self.mean
        self.tau = tau
        self.rho = rho

    def logp(self, x):
        priorvardenom = 1 - self.rho + self.rho * self.neighbors
        priorvar = self.tau * priorvardenom
        Wx = theano.sparse.dot(self.adjacency, x.reshape((self.n, 1)))
        mu_w = self.rho / priorvardenom * tt.sum(Wx, axis=1)
        tau = self.tau * priorvardenom
        return bound(tt.sum(continuous.Normal.dist(mu=mu_w, tau=tau).logp(x)), self.rho >= 0, self.rho < 1)

Still slow, but it converges.

@junpenglao trying to reparametrize

fonnesbeck · 2017-04-30T21:19:23Z

Closed accidentally. Reopening.

fonnesbeck · 2017-04-30T21:24:00Z

I would be fine with a "report" at the end of sampling that includes warnings and recommendations, perhaps written to disk. Along the lines of:

Run complete. There were 4 warnings and 1 recommendation. Please open report.txt to view.

junpenglao · 2017-05-01T04:37:30Z

@fonnesbeck I seconded the "report" idea. Now it is a bit too much of all the warnings at times.

aseyboldt · 2017-05-01T08:24:42Z

@fonnesbeck @junpenglao I don't think we should overwrite a file without being asked explicitly. But how about adding a trace.report? The traces are a bit of a mess as it is, but from an api point of view that feels much better.
Also, i think we could improve this a lot by better formatting of the warnings – whether in a report or otherwise.

junpenglao · 2018-03-08T15:12:48Z

Many of the original issues are fixed. Closing.

denadai2 mentioned this issue Apr 25, 2017

LogitNormal code review #2074

Closed

denadai2 mentioned this issue Apr 26, 2017

Add doc for modelling tips #2080

Merged

fonnesbeck closed this as completed Apr 30, 2017

fonnesbeck reopened this Apr 30, 2017

junpenglao closed this as completed Mar 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model.logp is nan #2066

model.logp is nan #2066

denadai2 commented Apr 21, 2017 •

edited

aseyboldt commented Apr 21, 2017

aseyboldt commented Apr 21, 2017

denadai2 commented Apr 21, 2017 •

edited

aseyboldt commented Apr 21, 2017

twiecki commented Apr 21, 2017

denadai2 commented Apr 21, 2017

aseyboldt commented Apr 21, 2017

aseyboldt commented Apr 21, 2017

junpenglao commented Apr 21, 2017

denadai2 commented Apr 22, 2017

aseyboldt commented Apr 22, 2017 •

edited

twiecki commented Apr 22, 2017

denadai2 commented Apr 22, 2017

junpenglao commented Apr 22, 2017

fonnesbeck commented Apr 22, 2017

aseyboldt commented Apr 22, 2017

aseyboldt commented Apr 22, 2017 •

edited

junpenglao commented Apr 22, 2017

aseyboldt commented Apr 22, 2017

junpenglao commented Apr 23, 2017

denadai2 commented Apr 25, 2017

fonnesbeck commented Apr 30, 2017

fonnesbeck commented Apr 30, 2017

junpenglao commented May 1, 2017

aseyboldt commented May 1, 2017 •

edited

junpenglao commented Mar 8, 2018

model.logp is nan #2066

model.logp is nan #2066

Comments

denadai2 commented Apr 21, 2017 • edited

aseyboldt commented Apr 21, 2017

aseyboldt commented Apr 21, 2017

denadai2 commented Apr 21, 2017 • edited

aseyboldt commented Apr 21, 2017

twiecki commented Apr 21, 2017

denadai2 commented Apr 21, 2017

aseyboldt commented Apr 21, 2017

aseyboldt commented Apr 21, 2017

junpenglao commented Apr 21, 2017

denadai2 commented Apr 22, 2017

aseyboldt commented Apr 22, 2017 • edited

twiecki commented Apr 22, 2017

denadai2 commented Apr 22, 2017

junpenglao commented Apr 22, 2017

fonnesbeck commented Apr 22, 2017

aseyboldt commented Apr 22, 2017

aseyboldt commented Apr 22, 2017 • edited

junpenglao commented Apr 22, 2017

aseyboldt commented Apr 22, 2017

junpenglao commented Apr 23, 2017

denadai2 commented Apr 25, 2017

fonnesbeck commented Apr 30, 2017

fonnesbeck commented Apr 30, 2017

junpenglao commented May 1, 2017

aseyboldt commented May 1, 2017 • edited

junpenglao commented Mar 8, 2018

denadai2 commented Apr 21, 2017 •

edited

denadai2 commented Apr 21, 2017 •

edited

aseyboldt commented Apr 22, 2017 •

edited

aseyboldt commented Apr 22, 2017 •

edited

aseyboldt commented May 1, 2017 •

edited