Added evidence-based Gibbs sampling to Bayesian Networks #650

NamesAreAPain · 2019-11-11T03:28:21Z

Although the code could be made more efficient by leveraging numpy, this basic implementation of Gibbs sampling works adequately for plenty of purposes.

jmschrei · 2019-11-23T23:23:59Z

Thanks for the contribution. I'm trying to understand the code now. I'll get back to you soon about this PR.

jmschrei · 2019-11-24T00:41:20Z

It would also be helpful if you could add some unit tests to ensure that sampling (with a fixed seed) gives the expected behavior.

pascal-schetelat · 2020-05-05T07:33:03Z

Hi,

Some colleagues and I are really interested in this and we are willing to contribute. I tried the pull requests and noticed breaking issues that need correcting and testing.

Since it has been proposed several month ago, we would like to propose a new pull requests. Is that fine with every one ?

jmschrei · 2020-05-07T04:28:58Z

That's fine with me. If you could add unit tests and appropriate documentation on your additions as well that would be great.

pascal-schetelat · 2020-05-07T12:05:04Z

Ok, I have a fisrt implementation using rejection sampling in order to take into account evidence on non marginal nodes.

I have resulted to use networkx topological sort to order the nodes. I didn't manged to find it else where in pomegranate, did I missed something ?

I'll make a pull requests once tests are in place.

As for API, this is what I have for now, let me know if this depart too much from what you had in mind :

	def sample(self, n=1, evidences=[{}],min_prob=0.01):
		"""Sample the network, optionally given some evidences
		Use rejection to condition on non marginal nodes

		Parameters
		----------
		n : int, optional
				The number of samples to generate. Defaults to 1.
		evidences : list of dict, optional
				Evidence to set constant while samples are generated.
		min_prob : stop iterations when  Sum P(X|Evidence) < min_prob. generated samples for a given evidence will be
				incomplete (<n)
		Returns
		-------
		a nested list of sampled states
		"""
```

pascal-schetelat · 2020-05-12T13:46:57Z

Ok, so rejection sampling was easy.

Gibbs sampling, on the other hand require a little bit more work to build the transition kernel of the markov chains.

Nothing impossible, but I stumbled upon an issue when data do not contains all possible combination. This break distribution.joint() and marginal()

The issue seems to be adressed by #680

Do you know if/when this could be merged ?

Edit : nevermind, using model.graph.state.distribution instead of model.state.distribution did the trick without having to re-estimate marginals.

jmschrei · 2020-05-16T22:09:38Z

What is the status of this? Are you looking into adding Gibbs sampling or do you want to move forward with this addition? If so, can you add unit tests?

pascal-schetelat · 2020-05-17T13:26:58Z

I started with rejection sampling wich is easy to implement but quite slow as soon as evidences are specified on non root nodes, due to a large rate of rejection. This is working for 2 weeks in my fork, but tests still need to be implemented.

If you want to look : https://github.com/cstb/pomegranate

I also started to implement gibbs sampling, which is a bit more involved than rejection sampling. I tried a couple of unsuccesfull approches, but I am happy to report that I am finally getting there, and I expect to submit a pull requests in the next couple of week.
I'm currently comparing results with the rejection sampler to make sure everything is ok.
I still need to work on seed management as I don't rely on numpy for weigthed choice but on a cython function.

On a 12 nodes network with an average of 2-3 parent by node, the Gibbs sampler takes 150 ms to draw 1000 samples without evidence. The more evidence, the faster it goes. The algorithm execution time is roughly linear with

the number of samples
the number of unknown nodes
the number of edges in the markov blanket of unknown nodes

So far, on the network I tried, I did not observe the need to dicards burn in samples, but it is a bit early to be definitive.

pascal-schetelat · 2020-05-17T21:06:20Z

By the way, I should probably have not relied on that assumption, but is there a good reason why distribution.keys() and keys in the distribution.keymap are sometimes ordered the same, and sometimes they are not ?

ekosman · 2020-06-14T13:21:31Z

Hi @pascal-schetelat
Firstly, many thanks for this contribution.

Is the gibbs sampler for BN's has an inplementation yet?

pascal-schetelat · 2020-06-17T19:12:31Z

I just made the PR here : #778

ekosman · 2020-06-17T19:28:37Z

Great! @pascal-schetelat

Looking foward to using it :)

jmschrei · 2023-04-16T06:05:03Z

Thank you for your contribution. However, pomegranate has recently been rewritten from the ground up to use PyTorch instead of Cython, and so this PR is unfortunately out of date and will be closed.

NamesAreAPain added 2 commits November 10, 2019 19:22

Added evidence-based sampling to Bayesian Networks

eb945c0

Fixed tab-space problem

5a555c3

jmschrei mentioned this pull request Nov 23, 2019

How can I sample the data from Bayesian Network model? #656

Closed

jmschrei closed this Apr 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added evidence-based Gibbs sampling to Bayesian Networks #650

Added evidence-based Gibbs sampling to Bayesian Networks #650

NamesAreAPain commented Nov 11, 2019

jmschrei commented Nov 23, 2019

jmschrei commented Nov 24, 2019

pascal-schetelat commented May 5, 2020

jmschrei commented May 7, 2020

pascal-schetelat commented May 7, 2020

pascal-schetelat commented May 12, 2020 •

edited

jmschrei commented May 16, 2020

pascal-schetelat commented May 17, 2020

pascal-schetelat commented May 17, 2020

ekosman commented Jun 14, 2020

pascal-schetelat commented Jun 17, 2020

ekosman commented Jun 17, 2020

jmschrei commented Apr 16, 2023

Added evidence-based Gibbs sampling to Bayesian Networks #650

Added evidence-based Gibbs sampling to Bayesian Networks #650

Conversation

NamesAreAPain commented Nov 11, 2019

jmschrei commented Nov 23, 2019

jmschrei commented Nov 24, 2019

pascal-schetelat commented May 5, 2020

jmschrei commented May 7, 2020

pascal-schetelat commented May 7, 2020

pascal-schetelat commented May 12, 2020 • edited

jmschrei commented May 16, 2020

pascal-schetelat commented May 17, 2020

pascal-schetelat commented May 17, 2020

ekosman commented Jun 14, 2020

pascal-schetelat commented Jun 17, 2020

ekosman commented Jun 17, 2020

jmschrei commented Apr 16, 2023

pascal-schetelat commented May 12, 2020 •

edited