Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added evidence-based Gibbs sampling to Bayesian Networks #650

Closed
wants to merge 2 commits into from

Conversation

NamesAreAPain
Copy link

Although the code could be made more efficient by leveraging numpy, this basic implementation of Gibbs sampling works adequately for plenty of purposes.

@jmschrei
Copy link
Owner

Thanks for the contribution. I'm trying to understand the code now. I'll get back to you soon about this PR.

@jmschrei
Copy link
Owner

It would also be helpful if you could add some unit tests to ensure that sampling (with a fixed seed) gives the expected behavior.

@pascal-schetelat
Copy link
Contributor

Hi,

Some colleagues and I are really interested in this and we are willing to contribute. I tried the pull requests and noticed breaking issues that need correcting and testing.

Since it has been proposed several month ago, we would like to propose a new pull requests. Is that fine with every one ?

@jmschrei
Copy link
Owner

jmschrei commented May 7, 2020

That's fine with me. If you could add unit tests and appropriate documentation on your additions as well that would be great.

@pascal-schetelat
Copy link
Contributor

Ok, I have a fisrt implementation using rejection sampling in order to take into account evidence on non marginal nodes.

I have resulted to use networkx topological sort to order the nodes. I didn't manged to find it else where in pomegranate, did I missed something ?

I'll make a pull requests once tests are in place.

As for API, this is what I have for now, let me know if this depart too much from what you had in mind :

	def sample(self, n=1, evidences=[{}],min_prob=0.01):
		"""Sample the network, optionally given some evidences
		Use rejection to condition on non marginal nodes

		Parameters
		----------
		n : int, optional
				The number of samples to generate. Defaults to 1.
		evidences : list of dict, optional
				Evidence to set constant while samples are generated.
		min_prob : stop iterations when  Sum P(X|Evidence) < min_prob. generated samples for a given evidence will be
				incomplete (<n)
		Returns
		-------
		a nested list of sampled states
		"""
```

@pascal-schetelat
Copy link
Contributor

pascal-schetelat commented May 12, 2020

Ok, so rejection sampling was easy.

Gibbs sampling, on the other hand require a little bit more work to build the transition kernel of the markov chains.

Nothing impossible, but I stumbled upon an issue when data do not contains all possible combination. This break distribution.joint() and marginal()

The issue seems to be adressed by #680

Do you know if/when this could be merged ?

Edit : nevermind, using model.graph.state.distribution instead of model.state.distribution did the trick without having to re-estimate marginals.

@jmschrei
Copy link
Owner

What is the status of this? Are you looking into adding Gibbs sampling or do you want to move forward with this addition? If so, can you add unit tests?

@pascal-schetelat
Copy link
Contributor

I started with rejection sampling wich is easy to implement but quite slow as soon as evidences are specified on non root nodes, due to a large rate of rejection. This is working for 2 weeks in my fork, but tests still need to be implemented.

If you want to look : https://github.com/cstb/pomegranate

I also started to implement gibbs sampling, which is a bit more involved than rejection sampling. I tried a couple of unsuccesfull approches, but I am happy to report that I am finally getting there, and I expect to submit a pull requests in the next couple of week.
I'm currently comparing results with the rejection sampler to make sure everything is ok.
I still need to work on seed management as I don't rely on numpy for weigthed choice but on a cython function.

On a 12 nodes network with an average of 2-3 parent by node, the Gibbs sampler takes 150 ms to draw 1000 samples without evidence. The more evidence, the faster it goes. The algorithm execution time is roughly linear with

  • the number of samples
  • the number of unknown nodes
  • the number of edges in the markov blanket of unknown nodes

So far, on the network I tried, I did not observe the need to dicards burn in samples, but it is a bit early to be definitive.

@pascal-schetelat
Copy link
Contributor

By the way, I should probably have not relied on that assumption, but is there a good reason why distribution.keys() and keys in the distribution.keymap are sometimes ordered the same, and sometimes they are not ?

@ekosman
Copy link
Contributor

ekosman commented Jun 14, 2020

Hi @pascal-schetelat
Firstly, many thanks for this contribution.

Is the gibbs sampler for BN's has an inplementation yet?

@pascal-schetelat
Copy link
Contributor

I just made the PR here : #778

@ekosman
Copy link
Contributor

ekosman commented Jun 17, 2020

Great! @pascal-schetelat

Looking foward to using it :)

@jmschrei
Copy link
Owner

Thank you for your contribution. However, pomegranate has recently been rewritten from the ground up to use PyTorch instead of Cython, and so this PR is unfortunately out of date and will be closed.

@jmschrei jmschrei closed this Apr 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants