-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] predict_proba too slow on a Bayesian Network #811
Comments
Howdy Unfortunately, loopy belief propogation is a pretty slow algorithm due, in part, to its iterative nature. It may not be the right inference algorithm for a model that is as large as the one you're defining. Further, currently, using additional jobs does not speed up this process for a single example, but rather parallelizes the processing of examples (e.g. if you had 32 examples, one thread would process each one in parallel). However, I think that limiting the number of incoming edges per node might significantly speed things up. The size of a CPT is equal to n**m * p where p is the number of values that variable can take, n is the number of values each parent can take (assuming they're all the same, prod(n) more generally when n is a vector of the number of values each parent can take), and m is the number of parents. If each of your variables can take 4 values this means you have at least one CPT with a million values in it. I'd see whether there are any instances where you can reduce this size. |
You were definitely right: I have aggressively reduced the maximum fan-in per node down to 6, and the inference time has drastically reduced to 5-7 seconds. Fighting with O(4^m) (where m is the incoming edge per node) complexity is definitely a tough thing! |
Hello! I have a custom-made Bayesian Network with the following specs:
I am trying to get the marginal probabilities of every node without any evidence by the following code:
However, the code is running very slow: It didn't finish after 30 minutes on a 32-core machine with 128GB of RAM.
Is there any way to speed up my computation, or is it impossible due to the sheer size of my network?
The text was updated successfully, but these errors were encountered: