Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model.logp is nan #2066

Closed
denadai2 opened this issue Apr 21, 2017 · 26 comments
Closed

model.logp is nan #2066

denadai2 opened this issue Apr 21, 2017 · 26 comments

Comments

@denadai2
Copy link
Contributor

denadai2 commented Apr 21, 2017

There are two strange behaviours in this model:

  • tau and p get stuck (they don't fluctuate), so in consequence mu_phi is stuck
  • logp is nan: why??

I have the last version (master) of pymc3

import theano.tensor as tt
import pandas as pd
import numpy as np
import pickle
import pymc3 as pm
import theano.tensor as tt
from theano import scan
import theano
floatX = "float32"

from pymc3.distributions import continuous
from pymc3.distributions import distribution

class CAR(distribution.Continuous):
	"""
	Conditional Autoregressive (CAR) distribution. Ref paper: https://www.ncbi.nlm.nih.gov/pubmed/12926715
	"""

	def __init__(self, n_neigh, a, tau, lambda_par, *args, **kwargs):
		super(CAR, self).__init__(*args, **kwargs)
		self.a = tt.as_tensor_variable(a)
		self.n_neigh = tt.as_tensor_variable(n_neigh)
		self.tau = tau
		self.lambda_par = lambda_par
		self.mode = 0.

	def logp(self, x):
		denom = 1 - self.lambda_par + self.lambda_par * self.n_neigh
		tau = self.tau * denom
		num = self.lambda_par / denom

		mu_w = num * tt.sum(x * self.a, axis=1)  # tt.sum(x*a, axis=1)/tt.sum(w, axis=1)
		return tt.sum(continuous.Normal.dist(mu=mu_w, sd=tau).logp(x))


#LOAD DATA
with open('bug.pkl', "rb") as input_file:
    weights_df = pickle.load(input_file)
    X = pickle.load(input_file)
    y = pickle.load(input_file)
    
N = weights_df.values.shape[0]
amat = weights_df.values

n_neighbours = amat.sum(axis=1)


with pm.Model() as unpooled_model:
      
    # define priors, use Normal for Ridge (sd=100, weakly informative)
    b0 = pm.Normal('intercept', mu=5.4, tau=0.5)
    b1 = pm.Normal('b1_mdist_daily', mu=1, tau=1.0E-6)
    
    # random effect precision parameter
    tau = pm.Normal('tau', mu=0, tau=1.0E-6)#.Gamma('tau', alpha=0.001, beta=0.001)##
    # strength of spatial correlation
    p = pm.Uniform('p', lower=0.0, upper=1.0, testval=0)
    
    phi = CAR('mu_phi', n_neigh=n_neighbours, a=amat, tau=tau, lambda_par=p, shape=N)
    
    # define linear model
    mu = tt.exp( b0 + b1 * X['mdist_daily'] + phi) # 
    #a = pm.HalfCauchy(name='alpha', beta=10, testval=1.) 
    a = pm.Normal('alpha', mu=5.2, sd=0.5)
    home_points = pm.NegativeBinomial('home_points', mu=mu, observed=y, alpha=a)
    
    unpooled_trace = pm.sample(1000, njobs=1, step=pm.Metropolis()) #(scaling=C) , , 

Logp:

[unpooled_model.logp(pt) for pt in unpooled_trace[200:]]

[array(nan),
 array(nan),
 array(nan),
 array(nan),
 array(nan),

etc

chains:

index

pickle
bug.pkl.zip

@aseyboldt
Copy link
Member

This might be (part of) the problem:

p = pm.Uniform('p', lower=0.0, upper=1.0, testval=0)

The testval is not valid, it has to be between 0 and 1. If your master is older than a couple of hours, you might want to pull again. #2046 was merged very recently.

@aseyboldt
Copy link
Member

Also, tau should be positive, right? Maybe pm.HalfCauchy('tau', beta=2) or similar.

@denadai2
Copy link
Contributor Author

denadai2 commented Apr 21, 2017

No, I really have the last version ;D

even so, If i replace uniform with

BoundedNormal = pm.Bound(pm.Normal, lower=0, upper=1)
p = BoundedNormal('p', mu=0.9, sd=0.09)

I have -inf as logp -.-. This shouldn't be normal, should it?

With pm.HalfCauchy it works, and the logp is even positive XD (40K) something strange?

@aseyboldt
Copy link
Member

The bounded normal on its own works fine for me. Can you try Uniform('p', lower=0, upper=1, testval=0.9)?

@twiecki
Copy link
Member

twiecki commented Apr 21, 2017

@junpenglao had a CAR implementation posted in a different issue, I recommend finding that.

@denadai2
Copy link
Contributor Author

@aseyboldt same
@twiecki this is similar, but a different paper :) I have to implement this... Do you think it is normal to have nan as logp without a warning? :/ maybe yes but I don't understand

thx

@aseyboldt
Copy link
Member

NUTS prints warnings if there are nans in the non-tuning part of the trace (calls them divergences), Metropolis doesn't. We have been thinking about printing Messages that are more helpful, but it is a bit tricky to do that without a huge performance hit.
You can use the nanguardmode from theano to try to find out where the first nan or inf comes from. Something like this should do it:

from theano.compile.nanguardmode import NanGuardMode
mode = NanGuardMode(nan_is_error=True, inf_is_error=True, big_is_error=True)
with model:
    step = pm.Metropolis(mode=mode)
    trace = pm.sample(1000, step=step)
    # Or if you want to use NUTS
    trace = pm.sample(2000, tune=1000, nuts_kwargs={'mode': mode})

@aseyboldt
Copy link
Member

Something else that can help is to introduce Deterministics. In this case I'd try to add something like mu = Deterministic('mu', mu) after your definition of mu. With that the values of mu end up in the trace and you can check if they are reasonable (maybe NegativeBinomial blows up if mu is too large).

@junpenglao
Copy link
Member

Is there a particular reason you used Metropolis? Using NUTS with ADVI init seems to run normally.

@denadai2
Copy link
Contributor Author

@junpenglao this is just a toy example. My real problem is bigger (more variables). Thus, in the bigger problem NUTS goes to ~1.5 iterations per second, and this is tooooo slow compared to 250 it/s of Metropolis. This is the only reason: trying more models...
BTW I have "FloatingPointError: NaN occurred in ADVI optimization." in the example I posted :)

@aseyboldt I don't have any warning. Am I doing something wrong?
Deterministic mu:

screen shot 2017-04-22 at 10 36 19

thx

@aseyboldt
Copy link
Member

aseyboldt commented Apr 22, 2017

Do you mean that you got the FloatingPointError in the original model with the invalid testval for p? That would be expected.
Don't switch to Metropolis when NUTS seems slow. The metropolis sampler just doesn't work in high dimensions, you get crazy autocorrelations. You can't compare samples per s like that, you'd have to look at effective samples per second.
There aren't any nans or infs in mu? Then it looks to me like you should be fine. Do you still get errors or warnings with NUTS?

@twiecki
Copy link
Member

twiecki commented Apr 22, 2017

@aseyboldt I've seen that mistake being made all the time. I wonder if we should add a warning if someone uses Metropolis or Slice on a continuous model that they should really use NUTS.

@denadai2
Copy link
Contributor Author

@twiecki :) noob mistakes. Consider it! :))

@aseyboldt sorry, my bad: I forgot to put a valid testval for p :)

With:

import theano.tensor as tt
import pandas as pd
import numpy as np
import pickle
import pymc3 as pm
%matplotlib inline
import matplotlib.pyplot as plt
import theano.tensor as tt
from theano import scan
import theano
floatX = "float32"

from pymc3.distributions import continuous
from pymc3.distributions import distribution

class CAR(distribution.Continuous):
	"""
	Conditional Autoregressive (CAR) distribution. Ref paper: https://www.ncbi.nlm.nih.gov/pubmed/12926715
	"""

	def __init__(self, n_neigh, a, tau, lambda_par, *args, **kwargs):
		super(CAR, self).__init__(*args, **kwargs)
		self.a = tt.as_tensor_variable(a)
		self.n_neigh = tt.as_tensor_variable(n_neigh)
		self.tau = tau
		self.lambda_par = lambda_par
		self.mode = 0.

	def logp(self, x):
		denom = 1 - self.lambda_par + self.lambda_par * self.n_neigh
		tau = self.tau * denom
		num = self.lambda_par / denom

		mu_w = num * tt.sum(x * self.a, axis=1)  # tt.sum(x*a, axis=1)/tt.sum(w, axis=1)
		return tt.sum(continuous.Normal.dist(mu=mu_w, sd=tau).logp(x))


#LOAD DATA
with open('bug.pkl', "rb") as input_file:
    weights_df = pickle.load(input_file)
    X = pickle.load(input_file)
    y = pickle.load(input_file)
    
N = weights_df.values.shape[0]
amat = weights_df.values

n_neighbours = amat.sum(axis=1)

from theano.compile.nanguardmode import NanGuardMode
mode = NanGuardMode(nan_is_error=True, inf_is_error=True, big_is_error=True)
with pm.Model() as unpooled_model:
      
    # define priors, use Normal for Ridge (sd=100, weakly informative)
    b0 = pm.Normal('intercept', mu=5.4, tau=0.5)
    b1 = pm.Normal('b1_mdist_daily', mu=1, tau=1.0E-6)
    
    # random effect precision parameter
    tau = pm.HalfCauchy('tau', beta=2)#pm.Gamma('tau', alpha=0.001, beta=0.001)##
    # strength of spatial correlation
    p = pm.Uniform('p', lower=0.0, upper=1.0)
    
    phi = CAR('mu_phi', n_neigh=n_neighbours, a=amat, tau=tau, lambda_par=p, shape=N)
    
    # define linear model
    mu = tt.exp( b0 + b1 * X['mdist_daily'] + phi) # 
    #mus = pm.Deterministic('mu', mu)

    a = pm.HalfCauchy(name='alpha', beta=10, testval=1.) 
    home_points = pm.NegativeBinomial('home_points', mu=mu, observed=y, alpha=a)
    #step = pm.Metropolis(mode=mode)
    
    unpooled_trace = pm.sample(2000, njobs=1, tune=1000, nuts_kwargs={'mode': mode}) #(scaling=C), step=step

With this code I have

AssertionError: Inf detected
Big value detected
NanGuardMode found an error in an input of the graph.

And de-commenting mus = pm.Deterministic('mu', mu), I have:

AttributeError                            Traceback (most recent call last)
<ipython-input-28-1bf6c57dd774> in <module>()
     70     #step = pm.Metropolis(mode=mode)
     71 
---> 72     unpooled_trace = pm.sample(2000, njobs=1, tune=1000, nuts_kwargs={'mode': mode}) #(scaling=C), step=step

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain, njobs, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, **kwargs)
    256         sample_func = _sample
    257 
--> 258     return sample_func(**sample_args)
    259 
    260 

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed, live_plot, **kwargs)
    271     try:
    272         strace = None
--> 273         for it, strace in enumerate(sampling):
    274             if live_plot:
    275                 if it >= skip_first:

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tqdm/_tqdm.py in __iter__(self)
    831 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    832 
--> 833             for obj in iterable:
    834                 yield obj
    835                 # Update and print the progressbar.

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    351         _update_start_vals(start, strace.point(-1), model)
    352     else:
--> 353         _update_start_vals(start, model.test_point, model)
    354 
    355     try:

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in _update_start_vals(a, b, model)
    466         for tname in b:
    467             if tname.startswith(name) and tname!=name:
--> 468                 transform_func = [d.transformation for d in model.deterministics if d.name==name][0]
    469                 b[tname] = transform_func.forward(a[name]).eval()
    470 

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in <listcomp>(.0)
    466         for tname in b:
    467             if tname.startswith(name) and tname!=name:
--> 468                 transform_func = [d.transformation for d in model.deterministics if d.name==name][0]
    469                 b[tname] = transform_func.forward(a[name]).eval()
    470 

AttributeError: 'TensorVariable' object has no attribute 'transformation'

@junpenglao
Copy link
Member

Re-parameterization should always be the first reaction when your model is slow :)
For example, you can try to model p as a inverse logit of normal distribution, the sampler will run 10 times faster.

@fonnesbeck
Copy link
Member

I'm not sure we should warn people off of Metropolis or other samplers, unless it is absolutely not appropriate to do so (e.g. NUTS for a discrete variable). I frequently get continuous models that perform poorly with NUTS and I need to drop back to Metropolis, so I would not want to be seeing warnings in that situation. If users are unsure about sampler choice they should use auto-assignment.

@aseyboldt
Copy link
Member

@denadai2 Your CAR implementation seems a bit strange. I just gave the version at http://mc-stan.org/documentation/case-studies/mbjoseph-CARStan.html a shot, and it runs much faster (100it/s) and seems to converge. But the model can't really tell what the spacial correlation should be. Maybe that is because the data doesn't provide enough information somehow, or there is a bug in the logp of CAR.

import theano.sparse
import scipy.sparse

class CAR(Continuous):
    def __init__(self, alpha, adjacency, *args, **kwargs):
        if not isinstance(adjacency, np.ndarray):
            raise ValueError("Adjacency matrix is not an ndarray.")
        n, m = adjacency.shape
        if n != m or np.any(adjacency != adjacency.T):
            raise ValueError('Adjacency matrix must be symmetric.')
        if 'shape' in kwargs and kwargs['shape'] == n:
            raise ValueError('Invalid shape: Must match matrix dimension.')
        kwargs['shape'] = n
        super(CAR, self).__init__(*args, **kwargs)
        self.n = n
        self.alpha = tt.as_tensor_variable(alpha)
        adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
        self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
        self.neighbors = tt.as_tensor_variable(adjacency.sum(0))
        self.mean = tt.zeros(n)
        self.median = self.mean
        adj = adjacency.astype('d').copy()
        sqrt_neighbors = 1 / np.sqrt(adjacency.sum(0))
        adj[:] *= sqrt_neighbors[:, None]
        adj[:] *= sqrt_neighbors[None, :]
        self.eigs = scipy.linalg.eigvalsh(adj)

    def logp(self, x):
        Wx = theano.sparse.dot(self.adjacency, x.reshape((self.n, 1)))
        tau_dot_x = self.neighbors * x - self.alpha * Wx.ravel()
        logdet = tt.log(1 - self.alpha * self.eigs).sum()
        logp = 0.5 * (logdet - tt.dot(x, tau_dot_x))
        return bound(logp, self.alpha > 0, self.alpha < 1)


with pm.Model() as model:
    b0 = pm.Normal('intercept', mu=5.4, sd=2)
    b1 = pm.Cauchy('b1_mdist_daily', alpha=0, beta=2)

    # random effect precision parameter
    sd = pm.HalfCauchy('sd', beta=2)
    # strength of spatial correlation
    p = pm.Uniform('p', lower=0, upper=1)
    phi = pm.distributions.CAR('mu_phi', alpha=p, adjacency=amat)
    
    mu = tt.exp(b0 + b1 * X['mdist_daily'] + sd * phi)
    alpha = pm.HalfCauchy(name='alpha', beta=2)
    home_points = pm.NegativeBinomial('home_points', mu=mu, alpha=alpha, observed=y)
    
    trace = pm.sample(2000, tune=1000, njobs=4)

@aseyboldt
Copy link
Member

aseyboldt commented Apr 22, 2017

@fonnesbeck @twiecki How about printing a warning if the number of dimensions is large? But I'm not sure what a reasonable cutoff would be. I vaguely remember someone saying that it usually doesn't work well if d>20, but I don't even know where I head that.

@junpenglao
Copy link
Member

nice implementation @aseyboldt ! I am writing a doc on porting models in WinBugs/JAGS/STAN to pymc3, with some additional tips and heuristic - can I use some of your code as an example?

@aseyboldt
Copy link
Member

@junpenglao sure. But keep in mind that I haven't tested this thoroughly.

@junpenglao
Copy link
Member

@aseyboldt I will compare it with the STAN result. Will keep you posted!

@denadai2
Copy link
Contributor Author

@aseyboldt wow thanks for your code. That implementation is especially good when you have multiple CAR variables (MCAR) and it is fast. I could use it if I fail with my code :D
My code is improved thanks to you and it is based on another CAR (Leroux / BYM)

class CAR(distribution.Continuous):
    """
    Conditional Autoregressive (CAR) distribution. Ref paper: https://www.ncbi.nlm.nih.gov/pubmed/12926715
    """
    
    def __init__(self, adjacency, tau, rho, *args, **kwargs):
        super(CAR, self).__init__(*args, **kwargs)
        n, m = adjacency.shape
        self.n = n
        
        adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
        self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
        
        self.neighbors = tt.as_tensor_variable(adjacency.sum(0))
        self.mean = tt.zeros(n)
        self.median = self.mean
        self.tau = tau
        self.rho = rho

    def logp(self, x):
        priorvardenom = 1 - self.rho + self.rho * self.neighbors
        priorvar = self.tau * priorvardenom
        Wx = theano.sparse.dot(self.adjacency, x.reshape((self.n, 1)))
        mu_w = self.rho / priorvardenom * tt.sum(Wx, axis=1)
        tau = self.tau * priorvardenom
        return bound(tt.sum(continuous.Normal.dist(mu=mu_w, tau=tau).logp(x)), self.rho >= 0, self.rho < 1)

Still slow, but it converges.

@junpenglao trying to reparametrize

@fonnesbeck
Copy link
Member

Closed accidentally. Reopening.

@fonnesbeck fonnesbeck reopened this Apr 30, 2017
@fonnesbeck
Copy link
Member

I would be fine with a "report" at the end of sampling that includes warnings and recommendations, perhaps written to disk. Along the lines of:

Run complete. There were 4 warnings and 1 recommendation. Please open report.txt to view. 

@junpenglao
Copy link
Member

@fonnesbeck I seconded the "report" idea. Now it is a bit too much of all the warnings at times.

@aseyboldt
Copy link
Member

aseyboldt commented May 1, 2017

@fonnesbeck @junpenglao I don't think we should overwrite a file without being asked explicitly. But how about adding a trace.report? The traces are a bit of a mess as it is, but from an api point of view that feels much better.
Also, i think we could improve this a lot by better formatting of the warnings – whether in a report or otherwise.

@junpenglao
Copy link
Member

Many of the original issues are fixed. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants