Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pm.model_graph.model_to_graphviz hangs (powerset computation) #3458

Closed
rpgoldman opened this issue May 1, 2019 · 4 comments
Closed

pm.model_graph.model_to_graphviz hangs (powerset computation) #3458

rpgoldman opened this issue May 1, 2019 · 4 comments

Comments

@rpgoldman
Copy link
Contributor

rpgoldman commented May 1, 2019

Description of the problem

Trying to generate a graphviz graph of a model containing a normal mixture variable with a lot of parents, ModelGraph._get_ancestors() hangs exploring the powerset of upstream.

Here's the big upstream from my mixture model (from pdb break in powerset):

(Pdb) p iterable
{OR Output_interval__, temperature influences, OR Output, err_sd, OR DO_interval__, modality, od vector, devp, od influences, ideal out, OR IO_interval__, unimodal out, fixed influences, deviation out, medium vector, err_sd_log__, Unimodal or bimodal, OR DO, Unimodal or bimodal_logodds__, medium influences, p(unimodal), p(unimodal)_logodds__, OR IO, temp vector}

Is there a way to do this graphing without searching the powerset? It seems like this won't scale. I don't fully understand what's going on here, but I think the code is trying to calculate a Markov blanket and minimize the size of the graph of the parents of a deterministic variable, but in my case the mixture has three components, each of which is computed by a fairly complex deterministic function.

If there is any interest in addressing this, I will assemble the bits of my model in an executable form, but I believe the issue can be seen by examining the code, without the need for such a case.

Versions and main components

  • PyMC3 Version: 3.6
  • Theano Version: 1.0.4
  • Python Version: 3.7
  • Operating system: MacOS 10.13.6
  • How did you install PyMC3: (conda/pip) pip
@lucianopaz
Copy link
Contributor

@rpgoldman, yes, the code is trying to find the set of ancestors that are "one link" away. By "one link" I don't mean that they are owner.inputs of a given node in theano, but that the intermediate theano apply nodes are not either RVs or deterministics of the Model. Doing a depth first search instead of a full ancestry tree, and then finding the set of "one link" away nodes using the powerset, could be implemented. In fact, I had done something like that for an unmerged PR that aimed to address a completely different problem. The graph ownership transversal and the conditional dependence discovery that had been implemented there could be recycled into a fix for this issue. If you are willing to try submitting a PR for this, it would be great!

@rpgoldman
Copy link
Contributor Author

@lucianopaz Thank you for the clarification. I will see if I can figure this out, but I don't have a good working knowledge of Theano. Here are some conjectures that I hope will inform this PR. If you could sanity check them, it would be very helpful.

The powerset is used to find a set of variables that are ancestors where ancestors are defined with respect to the logpt function.

The advantage of the current technique is that it avoids having to traverse the Theano graph explicitly -- instead we just walk the powerset and check each element (this is a neat example of an NP algorithm: finding the right element of the powerset is not tractable, but checking an element to see if it's a blanket is relatively easy).

We can replace this by using the stack_search() function from theano.graph.gof, in breadth first mode (so we visit all the nodes in distance order).

When we visit a node, we add its parents to the queue unless:

  1. The node is already in the set of blockers or
  2. The node is a variable or deterministic of the model (in which case we add it to the set of blockers).

When the search terminates, we return the set of blockers.

If this sounds right, I will get it built and submitted. Any suggestions for adding tests to the suite would be helpful; I don't really understand the structure of the test suite.

@lucianopaz
Copy link
Contributor

@rpgoldman, sorry for the late reply.

The powerset is used to find a set of variables that are ancestors where ancestors are defined with respect to the logpt function.

Yes, but the term ancestor in theano is broader than what is being used in ModelGraph. In theano, a certain node is an ancestor of a root node, if it is anywhere in the compute graph of the root node. This means that not only the nodes that are members of root's owner.inputs list are ancestors (these would be ancestors that are one step away in the hierarchy), but also the ancestors of ancestors are included recursively. What the powerset tries to do is to get the ancestors that are one step away only.

The advantage of the current technique is that it avoids having to traverse the Theano graph explicitly -- instead we just walk the powerset and check each element (this is a neat example of an NP algorithm: finding the right element of the powerset is not tractable, but checking an element to see if it's a blanket is relatively easy).

The function theano.gof.graph.ancestors transverses the entire graph already. The powerset changes the blockers to find the set of nodes that are "one step" away from the root.

We can replace this by using the stack_search() function from theano.graph.gof, in breadth first mode (so we visit all the nodes in distance order).

When we visit a node, we add its parents to the queue unless:

  1. The node is already in the set of blockers or
  2. The node is a variable or deterministic of the model (in which case we add it to the set of blockers).

When the search terminates, we return the set of blockers.

I think this is perfect.

@lucianopaz
Copy link
Contributor

Closed via #3460

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants