New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model.logp is nan #2066

Closed
denadai2 opened this Issue Apr 21, 2017 · 26 comments

Comments

Projects
None yet
5 participants
@denadai2
Contributor

denadai2 commented Apr 21, 2017

There are two strange behaviours in this model:

  • tau and p get stuck (they don't fluctuate), so in consequence mu_phi is stuck
  • logp is nan: why??

I have the last version (master) of pymc3

import theano.tensor as tt
import pandas as pd
import numpy as np
import pickle
import pymc3 as pm
import theano.tensor as tt
from theano import scan
import theano
floatX = "float32"

from pymc3.distributions import continuous
from pymc3.distributions import distribution

class CAR(distribution.Continuous):
	"""
	Conditional Autoregressive (CAR) distribution. Ref paper: https://www.ncbi.nlm.nih.gov/pubmed/12926715
	"""

	def __init__(self, n_neigh, a, tau, lambda_par, *args, **kwargs):
		super(CAR, self).__init__(*args, **kwargs)
		self.a = tt.as_tensor_variable(a)
		self.n_neigh = tt.as_tensor_variable(n_neigh)
		self.tau = tau
		self.lambda_par = lambda_par
		self.mode = 0.

	def logp(self, x):
		denom = 1 - self.lambda_par + self.lambda_par * self.n_neigh
		tau = self.tau * denom
		num = self.lambda_par / denom

		mu_w = num * tt.sum(x * self.a, axis=1)  # tt.sum(x*a, axis=1)/tt.sum(w, axis=1)
		return tt.sum(continuous.Normal.dist(mu=mu_w, sd=tau).logp(x))


#LOAD DATA
with open('bug.pkl', "rb") as input_file:
    weights_df = pickle.load(input_file)
    X = pickle.load(input_file)
    y = pickle.load(input_file)
    
N = weights_df.values.shape[0]
amat = weights_df.values

n_neighbours = amat.sum(axis=1)


with pm.Model() as unpooled_model:
      
    # define priors, use Normal for Ridge (sd=100, weakly informative)
    b0 = pm.Normal('intercept', mu=5.4, tau=0.5)
    b1 = pm.Normal('b1_mdist_daily', mu=1, tau=1.0E-6)
    
    # random effect precision parameter
    tau = pm.Normal('tau', mu=0, tau=1.0E-6)#.Gamma('tau', alpha=0.001, beta=0.001)##
    # strength of spatial correlation
    p = pm.Uniform('p', lower=0.0, upper=1.0, testval=0)
    
    phi = CAR('mu_phi', n_neigh=n_neighbours, a=amat, tau=tau, lambda_par=p, shape=N)
    
    # define linear model
    mu = tt.exp( b0 + b1 * X['mdist_daily'] + phi) # 
    #a = pm.HalfCauchy(name='alpha', beta=10, testval=1.) 
    a = pm.Normal('alpha', mu=5.2, sd=0.5)
    home_points = pm.NegativeBinomial('home_points', mu=mu, observed=y, alpha=a)
    
    unpooled_trace = pm.sample(1000, njobs=1, step=pm.Metropolis()) #(scaling=C) , , 

Logp:

[unpooled_model.logp(pt) for pt in unpooled_trace[200:]]

[array(nan),
 array(nan),
 array(nan),
 array(nan),
 array(nan),

etc

chains:

index

pickle
bug.pkl.zip

@aseyboldt

This comment has been minimized.

Show comment
Hide comment
@aseyboldt

aseyboldt Apr 21, 2017

Member

This might be (part of) the problem:

p = pm.Uniform('p', lower=0.0, upper=1.0, testval=0)

The testval is not valid, it has to be between 0 and 1. If your master is older than a couple of hours, you might want to pull again. #2046 was merged very recently.

Member

aseyboldt commented Apr 21, 2017

This might be (part of) the problem:

p = pm.Uniform('p', lower=0.0, upper=1.0, testval=0)

The testval is not valid, it has to be between 0 and 1. If your master is older than a couple of hours, you might want to pull again. #2046 was merged very recently.

@aseyboldt

This comment has been minimized.

Show comment
Hide comment
@aseyboldt

aseyboldt Apr 21, 2017

Member

Also, tau should be positive, right? Maybe pm.HalfCauchy('tau', beta=2) or similar.

Member

aseyboldt commented Apr 21, 2017

Also, tau should be positive, right? Maybe pm.HalfCauchy('tau', beta=2) or similar.

@denadai2

This comment has been minimized.

Show comment
Hide comment
@denadai2

denadai2 Apr 21, 2017

Contributor

No, I really have the last version ;D

even so, If i replace uniform with

BoundedNormal = pm.Bound(pm.Normal, lower=0, upper=1)
p = BoundedNormal('p', mu=0.9, sd=0.09)

I have -inf as logp -.-. This shouldn't be normal, should it?

With pm.HalfCauchy it works, and the logp is even positive XD (40K) something strange?

Contributor

denadai2 commented Apr 21, 2017

No, I really have the last version ;D

even so, If i replace uniform with

BoundedNormal = pm.Bound(pm.Normal, lower=0, upper=1)
p = BoundedNormal('p', mu=0.9, sd=0.09)

I have -inf as logp -.-. This shouldn't be normal, should it?

With pm.HalfCauchy it works, and the logp is even positive XD (40K) something strange?

@aseyboldt

This comment has been minimized.

Show comment
Hide comment
@aseyboldt

aseyboldt Apr 21, 2017

Member

The bounded normal on its own works fine for me. Can you try Uniform('p', lower=0, upper=1, testval=0.9)?

Member

aseyboldt commented Apr 21, 2017

The bounded normal on its own works fine for me. Can you try Uniform('p', lower=0, upper=1, testval=0.9)?

@twiecki

This comment has been minimized.

Show comment
Hide comment
@twiecki

twiecki Apr 21, 2017

Member

@junpenglao had a CAR implementation posted in a different issue, I recommend finding that.

Member

twiecki commented Apr 21, 2017

@junpenglao had a CAR implementation posted in a different issue, I recommend finding that.

@denadai2

This comment has been minimized.

Show comment
Hide comment
@denadai2

denadai2 Apr 21, 2017

Contributor

@aseyboldt same
@twiecki this is similar, but a different paper :) I have to implement this... Do you think it is normal to have nan as logp without a warning? :/ maybe yes but I don't understand

thx

Contributor

denadai2 commented Apr 21, 2017

@aseyboldt same
@twiecki this is similar, but a different paper :) I have to implement this... Do you think it is normal to have nan as logp without a warning? :/ maybe yes but I don't understand

thx

@aseyboldt

This comment has been minimized.

Show comment
Hide comment
@aseyboldt

aseyboldt Apr 21, 2017

Member

NUTS prints warnings if there are nans in the non-tuning part of the trace (calls them divergences), Metropolis doesn't. We have been thinking about printing Messages that are more helpful, but it is a bit tricky to do that without a huge performance hit.
You can use the nanguardmode from theano to try to find out where the first nan or inf comes from. Something like this should do it:

from theano.compile.nanguardmode import NanGuardMode
mode = NanGuardMode(nan_is_error=True, inf_is_error=True, big_is_error=True)
with model:
    step = pm.Metropolis(mode=mode)
    trace = pm.sample(1000, step=step)
    # Or if you want to use NUTS
    trace = pm.sample(2000, tune=1000, nuts_kwargs={'mode': mode})
Member

aseyboldt commented Apr 21, 2017

NUTS prints warnings if there are nans in the non-tuning part of the trace (calls them divergences), Metropolis doesn't. We have been thinking about printing Messages that are more helpful, but it is a bit tricky to do that without a huge performance hit.
You can use the nanguardmode from theano to try to find out where the first nan or inf comes from. Something like this should do it:

from theano.compile.nanguardmode import NanGuardMode
mode = NanGuardMode(nan_is_error=True, inf_is_error=True, big_is_error=True)
with model:
    step = pm.Metropolis(mode=mode)
    trace = pm.sample(1000, step=step)
    # Or if you want to use NUTS
    trace = pm.sample(2000, tune=1000, nuts_kwargs={'mode': mode})
@aseyboldt

This comment has been minimized.

Show comment
Hide comment
@aseyboldt

aseyboldt Apr 21, 2017

Member

Something else that can help is to introduce Deterministics. In this case I'd try to add something like mu = Deterministic('mu', mu) after your definition of mu. With that the values of mu end up in the trace and you can check if they are reasonable (maybe NegativeBinomial blows up if mu is too large).

Member

aseyboldt commented Apr 21, 2017

Something else that can help is to introduce Deterministics. In this case I'd try to add something like mu = Deterministic('mu', mu) after your definition of mu. With that the values of mu end up in the trace and you can check if they are reasonable (maybe NegativeBinomial blows up if mu is too large).

@junpenglao

This comment has been minimized.

Show comment
Hide comment
@junpenglao

junpenglao Apr 21, 2017

Member

Is there a particular reason you used Metropolis? Using NUTS with ADVI init seems to run normally.

Member

junpenglao commented Apr 21, 2017

Is there a particular reason you used Metropolis? Using NUTS with ADVI init seems to run normally.

@denadai2

This comment has been minimized.

Show comment
Hide comment
@denadai2

denadai2 Apr 22, 2017

Contributor

@junpenglao this is just a toy example. My real problem is bigger (more variables). Thus, in the bigger problem NUTS goes to ~1.5 iterations per second, and this is tooooo slow compared to 250 it/s of Metropolis. This is the only reason: trying more models...
BTW I have "FloatingPointError: NaN occurred in ADVI optimization." in the example I posted :)

@aseyboldt I don't have any warning. Am I doing something wrong?
Deterministic mu:

screen shot 2017-04-22 at 10 36 19

thx

Contributor

denadai2 commented Apr 22, 2017

@junpenglao this is just a toy example. My real problem is bigger (more variables). Thus, in the bigger problem NUTS goes to ~1.5 iterations per second, and this is tooooo slow compared to 250 it/s of Metropolis. This is the only reason: trying more models...
BTW I have "FloatingPointError: NaN occurred in ADVI optimization." in the example I posted :)

@aseyboldt I don't have any warning. Am I doing something wrong?
Deterministic mu:

screen shot 2017-04-22 at 10 36 19

thx

@aseyboldt

This comment has been minimized.

Show comment
Hide comment
@aseyboldt

aseyboldt Apr 22, 2017

Member

Do you mean that you got the FloatingPointError in the original model with the invalid testval for p? That would be expected.
Don't switch to Metropolis when NUTS seems slow. The metropolis sampler just doesn't work in high dimensions, you get crazy autocorrelations. You can't compare samples per s like that, you'd have to look at effective samples per second.
There aren't any nans or infs in mu? Then it looks to me like you should be fine. Do you still get errors or warnings with NUTS?

Member

aseyboldt commented Apr 22, 2017

Do you mean that you got the FloatingPointError in the original model with the invalid testval for p? That would be expected.
Don't switch to Metropolis when NUTS seems slow. The metropolis sampler just doesn't work in high dimensions, you get crazy autocorrelations. You can't compare samples per s like that, you'd have to look at effective samples per second.
There aren't any nans or infs in mu? Then it looks to me like you should be fine. Do you still get errors or warnings with NUTS?

@twiecki

This comment has been minimized.

Show comment
Hide comment
@twiecki

twiecki Apr 22, 2017

Member

@aseyboldt I've seen that mistake being made all the time. I wonder if we should add a warning if someone uses Metropolis or Slice on a continuous model that they should really use NUTS.

Member

twiecki commented Apr 22, 2017

@aseyboldt I've seen that mistake being made all the time. I wonder if we should add a warning if someone uses Metropolis or Slice on a continuous model that they should really use NUTS.

@denadai2

This comment has been minimized.

Show comment
Hide comment
@denadai2

denadai2 Apr 22, 2017

Contributor

@twiecki :) noob mistakes. Consider it! :))

@aseyboldt sorry, my bad: I forgot to put a valid testval for p :)

With:

import theano.tensor as tt
import pandas as pd
import numpy as np
import pickle
import pymc3 as pm
%matplotlib inline
import matplotlib.pyplot as plt
import theano.tensor as tt
from theano import scan
import theano
floatX = "float32"

from pymc3.distributions import continuous
from pymc3.distributions import distribution

class CAR(distribution.Continuous):
	"""
	Conditional Autoregressive (CAR) distribution. Ref paper: https://www.ncbi.nlm.nih.gov/pubmed/12926715
	"""

	def __init__(self, n_neigh, a, tau, lambda_par, *args, **kwargs):
		super(CAR, self).__init__(*args, **kwargs)
		self.a = tt.as_tensor_variable(a)
		self.n_neigh = tt.as_tensor_variable(n_neigh)
		self.tau = tau
		self.lambda_par = lambda_par
		self.mode = 0.

	def logp(self, x):
		denom = 1 - self.lambda_par + self.lambda_par * self.n_neigh
		tau = self.tau * denom
		num = self.lambda_par / denom

		mu_w = num * tt.sum(x * self.a, axis=1)  # tt.sum(x*a, axis=1)/tt.sum(w, axis=1)
		return tt.sum(continuous.Normal.dist(mu=mu_w, sd=tau).logp(x))


#LOAD DATA
with open('bug.pkl', "rb") as input_file:
    weights_df = pickle.load(input_file)
    X = pickle.load(input_file)
    y = pickle.load(input_file)
    
N = weights_df.values.shape[0]
amat = weights_df.values

n_neighbours = amat.sum(axis=1)

from theano.compile.nanguardmode import NanGuardMode
mode = NanGuardMode(nan_is_error=True, inf_is_error=True, big_is_error=True)
with pm.Model() as unpooled_model:
      
    # define priors, use Normal for Ridge (sd=100, weakly informative)
    b0 = pm.Normal('intercept', mu=5.4, tau=0.5)
    b1 = pm.Normal('b1_mdist_daily', mu=1, tau=1.0E-6)
    
    # random effect precision parameter
    tau = pm.HalfCauchy('tau', beta=2)#pm.Gamma('tau', alpha=0.001, beta=0.001)##
    # strength of spatial correlation
    p = pm.Uniform('p', lower=0.0, upper=1.0)
    
    phi = CAR('mu_phi', n_neigh=n_neighbours, a=amat, tau=tau, lambda_par=p, shape=N)
    
    # define linear model
    mu = tt.exp( b0 + b1 * X['mdist_daily'] + phi) # 
    #mus = pm.Deterministic('mu', mu)

    a = pm.HalfCauchy(name='alpha', beta=10, testval=1.) 
    home_points = pm.NegativeBinomial('home_points', mu=mu, observed=y, alpha=a)
    #step = pm.Metropolis(mode=mode)
    
    unpooled_trace = pm.sample(2000, njobs=1, tune=1000, nuts_kwargs={'mode': mode}) #(scaling=C), step=step

With this code I have

AssertionError: Inf detected
Big value detected
NanGuardMode found an error in an input of the graph.

And de-commenting mus = pm.Deterministic('mu', mu), I have:

AttributeError                            Traceback (most recent call last)
<ipython-input-28-1bf6c57dd774> in <module>()
     70     #step = pm.Metropolis(mode=mode)
     71 
---> 72     unpooled_trace = pm.sample(2000, njobs=1, tune=1000, nuts_kwargs={'mode': mode}) #(scaling=C), step=step

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain, njobs, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, **kwargs)
    256         sample_func = _sample
    257 
--> 258     return sample_func(**sample_args)
    259 
    260 

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed, live_plot, **kwargs)
    271     try:
    272         strace = None
--> 273         for it, strace in enumerate(sampling):
    274             if live_plot:
    275                 if it >= skip_first:

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tqdm/_tqdm.py in __iter__(self)
    831 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    832 
--> 833             for obj in iterable:
    834                 yield obj
    835                 # Update and print the progressbar.

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    351         _update_start_vals(start, strace.point(-1), model)
    352     else:
--> 353         _update_start_vals(start, model.test_point, model)
    354 
    355     try:

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in _update_start_vals(a, b, model)
    466         for tname in b:
    467             if tname.startswith(name) and tname!=name:
--> 468                 transform_func = [d.transformation for d in model.deterministics if d.name==name][0]
    469                 b[tname] = transform_func.forward(a[name]).eval()
    470 

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in <listcomp>(.0)
    466         for tname in b:
    467             if tname.startswith(name) and tname!=name:
--> 468                 transform_func = [d.transformation for d in model.deterministics if d.name==name][0]
    469                 b[tname] = transform_func.forward(a[name]).eval()
    470 

AttributeError: 'TensorVariable' object has no attribute 'transformation'
Contributor

denadai2 commented Apr 22, 2017

@twiecki :) noob mistakes. Consider it! :))

@aseyboldt sorry, my bad: I forgot to put a valid testval for p :)

With:

import theano.tensor as tt
import pandas as pd
import numpy as np
import pickle
import pymc3 as pm
%matplotlib inline
import matplotlib.pyplot as plt
import theano.tensor as tt
from theano import scan
import theano
floatX = "float32"

from pymc3.distributions import continuous
from pymc3.distributions import distribution

class CAR(distribution.Continuous):
	"""
	Conditional Autoregressive (CAR) distribution. Ref paper: https://www.ncbi.nlm.nih.gov/pubmed/12926715
	"""

	def __init__(self, n_neigh, a, tau, lambda_par, *args, **kwargs):
		super(CAR, self).__init__(*args, **kwargs)
		self.a = tt.as_tensor_variable(a)
		self.n_neigh = tt.as_tensor_variable(n_neigh)
		self.tau = tau
		self.lambda_par = lambda_par
		self.mode = 0.

	def logp(self, x):
		denom = 1 - self.lambda_par + self.lambda_par * self.n_neigh
		tau = self.tau * denom
		num = self.lambda_par / denom

		mu_w = num * tt.sum(x * self.a, axis=1)  # tt.sum(x*a, axis=1)/tt.sum(w, axis=1)
		return tt.sum(continuous.Normal.dist(mu=mu_w, sd=tau).logp(x))


#LOAD DATA
with open('bug.pkl', "rb") as input_file:
    weights_df = pickle.load(input_file)
    X = pickle.load(input_file)
    y = pickle.load(input_file)
    
N = weights_df.values.shape[0]
amat = weights_df.values

n_neighbours = amat.sum(axis=1)

from theano.compile.nanguardmode import NanGuardMode
mode = NanGuardMode(nan_is_error=True, inf_is_error=True, big_is_error=True)
with pm.Model() as unpooled_model:
      
    # define priors, use Normal for Ridge (sd=100, weakly informative)
    b0 = pm.Normal('intercept', mu=5.4, tau=0.5)
    b1 = pm.Normal('b1_mdist_daily', mu=1, tau=1.0E-6)
    
    # random effect precision parameter
    tau = pm.HalfCauchy('tau', beta=2)#pm.Gamma('tau', alpha=0.001, beta=0.001)##
    # strength of spatial correlation
    p = pm.Uniform('p', lower=0.0, upper=1.0)
    
    phi = CAR('mu_phi', n_neigh=n_neighbours, a=amat, tau=tau, lambda_par=p, shape=N)
    
    # define linear model
    mu = tt.exp( b0 + b1 * X['mdist_daily'] + phi) # 
    #mus = pm.Deterministic('mu', mu)

    a = pm.HalfCauchy(name='alpha', beta=10, testval=1.) 
    home_points = pm.NegativeBinomial('home_points', mu=mu, observed=y, alpha=a)
    #step = pm.Metropolis(mode=mode)
    
    unpooled_trace = pm.sample(2000, njobs=1, tune=1000, nuts_kwargs={'mode': mode}) #(scaling=C), step=step

With this code I have

AssertionError: Inf detected
Big value detected
NanGuardMode found an error in an input of the graph.

And de-commenting mus = pm.Deterministic('mu', mu), I have:

AttributeError                            Traceback (most recent call last)
<ipython-input-28-1bf6c57dd774> in <module>()
     70     #step = pm.Metropolis(mode=mode)
     71 
---> 72     unpooled_trace = pm.sample(2000, njobs=1, tune=1000, nuts_kwargs={'mode': mode}) #(scaling=C), step=step

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain, njobs, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, **kwargs)
    256         sample_func = _sample
    257 
--> 258     return sample_func(**sample_args)
    259 
    260 

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in _sample(draws, step, start, trace, chain, tune, progressbar, model, random_seed, live_plot, **kwargs)
    271     try:
    272         strace = None
--> 273         for it, strace in enumerate(sampling):
    274             if live_plot:
    275                 if it >= skip_first:

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/tqdm/_tqdm.py in __iter__(self)
    831 """, fp_write=getattr(self.fp, 'write', sys.stderr.write))
    832 
--> 833             for obj in iterable:
    834                 yield obj
    835                 # Update and print the progressbar.

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    351         _update_start_vals(start, strace.point(-1), model)
    352     else:
--> 353         _update_start_vals(start, model.test_point, model)
    354 
    355     try:

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in _update_start_vals(a, b, model)
    466         for tname in b:
    467             if tname.startswith(name) and tname!=name:
--> 468                 transform_func = [d.transformation for d in model.deterministics if d.name==name][0]
    469                 b[tname] = transform_func.forward(a[name]).eval()
    470 

/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pymc3/sampling.py in <listcomp>(.0)
    466         for tname in b:
    467             if tname.startswith(name) and tname!=name:
--> 468                 transform_func = [d.transformation for d in model.deterministics if d.name==name][0]
    469                 b[tname] = transform_func.forward(a[name]).eval()
    470 

AttributeError: 'TensorVariable' object has no attribute 'transformation'
@junpenglao

This comment has been minimized.

Show comment
Hide comment
@junpenglao

junpenglao Apr 22, 2017

Member

Re-parameterization should always be the first reaction when your model is slow :)
For example, you can try to model p as a inverse logit of normal distribution, the sampler will run 10 times faster.

Member

junpenglao commented Apr 22, 2017

Re-parameterization should always be the first reaction when your model is slow :)
For example, you can try to model p as a inverse logit of normal distribution, the sampler will run 10 times faster.

@fonnesbeck

This comment has been minimized.

Show comment
Hide comment
@fonnesbeck

fonnesbeck Apr 22, 2017

Member

I'm not sure we should warn people off of Metropolis or other samplers, unless it is absolutely not appropriate to do so (e.g. NUTS for a discrete variable). I frequently get continuous models that perform poorly with NUTS and I need to drop back to Metropolis, so I would not want to be seeing warnings in that situation. If users are unsure about sampler choice they should use auto-assignment.

Member

fonnesbeck commented Apr 22, 2017

I'm not sure we should warn people off of Metropolis or other samplers, unless it is absolutely not appropriate to do so (e.g. NUTS for a discrete variable). I frequently get continuous models that perform poorly with NUTS and I need to drop back to Metropolis, so I would not want to be seeing warnings in that situation. If users are unsure about sampler choice they should use auto-assignment.

@aseyboldt

This comment has been minimized.

Show comment
Hide comment
@aseyboldt

aseyboldt Apr 22, 2017

Member

@denadai2 Your CAR implementation seems a bit strange. I just gave the version at http://mc-stan.org/documentation/case-studies/mbjoseph-CARStan.html a shot, and it runs much faster (100it/s) and seems to converge. But the model can't really tell what the spacial correlation should be. Maybe that is because the data doesn't provide enough information somehow, or there is a bug in the logp of CAR.

import theano.sparse
import scipy.sparse

class CAR(Continuous):
    def __init__(self, alpha, adjacency, *args, **kwargs):
        if not isinstance(adjacency, np.ndarray):
            raise ValueError("Adjacency matrix is not an ndarray.")
        n, m = adjacency.shape
        if n != m or np.any(adjacency != adjacency.T):
            raise ValueError('Adjacency matrix must be symmetric.')
        if 'shape' in kwargs and kwargs['shape'] == n:
            raise ValueError('Invalid shape: Must match matrix dimension.')
        kwargs['shape'] = n
        super(CAR, self).__init__(*args, **kwargs)
        self.n = n
        self.alpha = tt.as_tensor_variable(alpha)
        adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
        self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
        self.neighbors = tt.as_tensor_variable(adjacency.sum(0))
        self.mean = tt.zeros(n)
        self.median = self.mean
        adj = adjacency.astype('d').copy()
        sqrt_neighbors = 1 / np.sqrt(adjacency.sum(0))
        adj[:] *= sqrt_neighbors[:, None]
        adj[:] *= sqrt_neighbors[None, :]
        self.eigs = scipy.linalg.eigvalsh(adj)

    def logp(self, x):
        Wx = theano.sparse.dot(self.adjacency, x.reshape((self.n, 1)))
        tau_dot_x = self.neighbors * x - self.alpha * Wx.ravel()
        logdet = tt.log(1 - self.alpha * self.eigs).sum()
        logp = 0.5 * (logdet - tt.dot(x, tau_dot_x))
        return bound(logp, self.alpha > 0, self.alpha < 1)


with pm.Model() as model:
    b0 = pm.Normal('intercept', mu=5.4, sd=2)
    b1 = pm.Cauchy('b1_mdist_daily', alpha=0, beta=2)

    # random effect precision parameter
    sd = pm.HalfCauchy('sd', beta=2)
    # strength of spatial correlation
    p = pm.Uniform('p', lower=0, upper=1)
    phi = pm.distributions.CAR('mu_phi', alpha=p, adjacency=amat)
    
    mu = tt.exp(b0 + b1 * X['mdist_daily'] + sd * phi)
    alpha = pm.HalfCauchy(name='alpha', beta=2)
    home_points = pm.NegativeBinomial('home_points', mu=mu, alpha=alpha, observed=y)
    
    trace = pm.sample(2000, tune=1000, njobs=4)
Member

aseyboldt commented Apr 22, 2017

@denadai2 Your CAR implementation seems a bit strange. I just gave the version at http://mc-stan.org/documentation/case-studies/mbjoseph-CARStan.html a shot, and it runs much faster (100it/s) and seems to converge. But the model can't really tell what the spacial correlation should be. Maybe that is because the data doesn't provide enough information somehow, or there is a bug in the logp of CAR.

import theano.sparse
import scipy.sparse

class CAR(Continuous):
    def __init__(self, alpha, adjacency, *args, **kwargs):
        if not isinstance(adjacency, np.ndarray):
            raise ValueError("Adjacency matrix is not an ndarray.")
        n, m = adjacency.shape
        if n != m or np.any(adjacency != adjacency.T):
            raise ValueError('Adjacency matrix must be symmetric.')
        if 'shape' in kwargs and kwargs['shape'] == n:
            raise ValueError('Invalid shape: Must match matrix dimension.')
        kwargs['shape'] = n
        super(CAR, self).__init__(*args, **kwargs)
        self.n = n
        self.alpha = tt.as_tensor_variable(alpha)
        adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
        self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
        self.neighbors = tt.as_tensor_variable(adjacency.sum(0))
        self.mean = tt.zeros(n)
        self.median = self.mean
        adj = adjacency.astype('d').copy()
        sqrt_neighbors = 1 / np.sqrt(adjacency.sum(0))
        adj[:] *= sqrt_neighbors[:, None]
        adj[:] *= sqrt_neighbors[None, :]
        self.eigs = scipy.linalg.eigvalsh(adj)

    def logp(self, x):
        Wx = theano.sparse.dot(self.adjacency, x.reshape((self.n, 1)))
        tau_dot_x = self.neighbors * x - self.alpha * Wx.ravel()
        logdet = tt.log(1 - self.alpha * self.eigs).sum()
        logp = 0.5 * (logdet - tt.dot(x, tau_dot_x))
        return bound(logp, self.alpha > 0, self.alpha < 1)


with pm.Model() as model:
    b0 = pm.Normal('intercept', mu=5.4, sd=2)
    b1 = pm.Cauchy('b1_mdist_daily', alpha=0, beta=2)

    # random effect precision parameter
    sd = pm.HalfCauchy('sd', beta=2)
    # strength of spatial correlation
    p = pm.Uniform('p', lower=0, upper=1)
    phi = pm.distributions.CAR('mu_phi', alpha=p, adjacency=amat)
    
    mu = tt.exp(b0 + b1 * X['mdist_daily'] + sd * phi)
    alpha = pm.HalfCauchy(name='alpha', beta=2)
    home_points = pm.NegativeBinomial('home_points', mu=mu, alpha=alpha, observed=y)
    
    trace = pm.sample(2000, tune=1000, njobs=4)
@aseyboldt

This comment has been minimized.

Show comment
Hide comment
@aseyboldt

aseyboldt Apr 22, 2017

Member

@fonnesbeck @twiecki How about printing a warning if the number of dimensions is large? But I'm not sure what a reasonable cutoff would be. I vaguely remember someone saying that it usually doesn't work well if d>20, but I don't even know where I head that.

Member

aseyboldt commented Apr 22, 2017

@fonnesbeck @twiecki How about printing a warning if the number of dimensions is large? But I'm not sure what a reasonable cutoff would be. I vaguely remember someone saying that it usually doesn't work well if d>20, but I don't even know where I head that.

@junpenglao

This comment has been minimized.

Show comment
Hide comment
@junpenglao

junpenglao Apr 22, 2017

Member

nice implementation @aseyboldt ! I am writing a doc on porting models in WinBugs/JAGS/STAN to pymc3, with some additional tips and heuristic - can I use some of your code as an example?

Member

junpenglao commented Apr 22, 2017

nice implementation @aseyboldt ! I am writing a doc on porting models in WinBugs/JAGS/STAN to pymc3, with some additional tips and heuristic - can I use some of your code as an example?

@aseyboldt

This comment has been minimized.

Show comment
Hide comment
@aseyboldt

aseyboldt Apr 22, 2017

Member

@junpenglao sure. But keep in mind that I haven't tested this thoroughly.

Member

aseyboldt commented Apr 22, 2017

@junpenglao sure. But keep in mind that I haven't tested this thoroughly.

@junpenglao

This comment has been minimized.

Show comment
Hide comment
@junpenglao

junpenglao Apr 23, 2017

Member

@aseyboldt I will compare it with the STAN result. Will keep you posted!

Member

junpenglao commented Apr 23, 2017

@aseyboldt I will compare it with the STAN result. Will keep you posted!

@denadai2

This comment has been minimized.

Show comment
Hide comment
@denadai2

denadai2 Apr 25, 2017

Contributor

@aseyboldt wow thanks for your code. That implementation is especially good when you have multiple CAR variables (MCAR) and it is fast. I could use it if I fail with my code :D
My code is improved thanks to you and it is based on another CAR (Leroux / BYM)

class CAR(distribution.Continuous):
    """
    Conditional Autoregressive (CAR) distribution. Ref paper: https://www.ncbi.nlm.nih.gov/pubmed/12926715
    """
    
    def __init__(self, adjacency, tau, rho, *args, **kwargs):
        super(CAR, self).__init__(*args, **kwargs)
        n, m = adjacency.shape
        self.n = n
        
        adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
        self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
        
        self.neighbors = tt.as_tensor_variable(adjacency.sum(0))
        self.mean = tt.zeros(n)
        self.median = self.mean
        self.tau = tau
        self.rho = rho

    def logp(self, x):
        priorvardenom = 1 - self.rho + self.rho * self.neighbors
        priorvar = self.tau * priorvardenom
        Wx = theano.sparse.dot(self.adjacency, x.reshape((self.n, 1)))
        mu_w = self.rho / priorvardenom * tt.sum(Wx, axis=1)
        tau = self.tau * priorvardenom
        return bound(tt.sum(continuous.Normal.dist(mu=mu_w, tau=tau).logp(x)), self.rho >= 0, self.rho < 1)

Still slow, but it converges.

@junpenglao trying to reparametrize

Contributor

denadai2 commented Apr 25, 2017

@aseyboldt wow thanks for your code. That implementation is especially good when you have multiple CAR variables (MCAR) and it is fast. I could use it if I fail with my code :D
My code is improved thanks to you and it is based on another CAR (Leroux / BYM)

class CAR(distribution.Continuous):
    """
    Conditional Autoregressive (CAR) distribution. Ref paper: https://www.ncbi.nlm.nih.gov/pubmed/12926715
    """
    
    def __init__(self, adjacency, tau, rho, *args, **kwargs):
        super(CAR, self).__init__(*args, **kwargs)
        n, m = adjacency.shape
        self.n = n
        
        adjacency_sparse = scipy.sparse.csr_matrix(adjacency)
        self.adjacency = theano.sparse.as_sparse_variable(adjacency_sparse)
        
        self.neighbors = tt.as_tensor_variable(adjacency.sum(0))
        self.mean = tt.zeros(n)
        self.median = self.mean
        self.tau = tau
        self.rho = rho

    def logp(self, x):
        priorvardenom = 1 - self.rho + self.rho * self.neighbors
        priorvar = self.tau * priorvardenom
        Wx = theano.sparse.dot(self.adjacency, x.reshape((self.n, 1)))
        mu_w = self.rho / priorvardenom * tt.sum(Wx, axis=1)
        tau = self.tau * priorvardenom
        return bound(tt.sum(continuous.Normal.dist(mu=mu_w, tau=tau).logp(x)), self.rho >= 0, self.rho < 1)

Still slow, but it converges.

@junpenglao trying to reparametrize

@fonnesbeck

This comment has been minimized.

Show comment
Hide comment
@fonnesbeck

fonnesbeck Apr 30, 2017

Member

Closed accidentally. Reopening.

Member

fonnesbeck commented Apr 30, 2017

Closed accidentally. Reopening.

@fonnesbeck fonnesbeck reopened this Apr 30, 2017

@fonnesbeck

This comment has been minimized.

Show comment
Hide comment
@fonnesbeck

fonnesbeck Apr 30, 2017

Member

I would be fine with a "report" at the end of sampling that includes warnings and recommendations, perhaps written to disk. Along the lines of:

Run complete. There were 4 warnings and 1 recommendation. Please open report.txt to view. 
Member

fonnesbeck commented Apr 30, 2017

I would be fine with a "report" at the end of sampling that includes warnings and recommendations, perhaps written to disk. Along the lines of:

Run complete. There were 4 warnings and 1 recommendation. Please open report.txt to view. 
@junpenglao

This comment has been minimized.

Show comment
Hide comment
@junpenglao

junpenglao May 1, 2017

Member

@fonnesbeck I seconded the "report" idea. Now it is a bit too much of all the warnings at times.

Member

junpenglao commented May 1, 2017

@fonnesbeck I seconded the "report" idea. Now it is a bit too much of all the warnings at times.

@aseyboldt

This comment has been minimized.

Show comment
Hide comment
@aseyboldt

aseyboldt May 1, 2017

Member

@fonnesbeck @junpenglao I don't think we should overwrite a file without being asked explicitly. But how about adding a trace.report? The traces are a bit of a mess as it is, but from an api point of view that feels much better.
Also, i think we could improve this a lot by better formatting of the warnings – whether in a report or otherwise.

Member

aseyboldt commented May 1, 2017

@fonnesbeck @junpenglao I don't think we should overwrite a file without being asked explicitly. But how about adding a trace.report? The traces are a bit of a mess as it is, but from an api point of view that feels much better.
Also, i think we could improve this a lot by better formatting of the warnings – whether in a report or otherwise.

@junpenglao

This comment has been minimized.

Show comment
Hide comment
@junpenglao

junpenglao Mar 8, 2018

Member

Many of the original issues are fixed. Closing.

Member

junpenglao commented Mar 8, 2018

Many of the original issues are fixed. Closing.

@junpenglao junpenglao closed this Mar 8, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment