Skip to content

C++ getParents and conditionally independent sets#1094

Merged
paciorek merged 11 commits into
develfrom
getParents
Mar 20, 2021
Merged

C++ getParents and conditionally independent sets#1094
paciorek merged 11 commits into
develfrom
getParents

Conversation

@perrydv
Copy link
Copy Markdown
Contributor

@perrydv perrydv commented Feb 3, 2021

This PR has two distinct components:

  1. A C++ version of getParents. This is behind a nimble option use_C_getParents which is turned on so that testing will be interesting. Note that the uncompiled version of getParents returned nodes in undefined order. The new version returns nodes in sorted order. That means the new version will not match the old version in order, and it does not seem worth the effort to make it do so, since the orders result from different algorithms that make sense in the different languages. A model interface to getParents might want some options similar to getDependencies.

2a. A new model structure algorithm called getConditionallyIndependentSets. (Ideas for a shorter name or do we like how explicit it is?). Documentation is drafted in comments above the function in BUGS_model.R. Basic use: getConditionallyIndependentSets(model) will default to finding conditionally independent sets of stochastic latent nodes given top nodes and data nodes.

2b. A test function testConditionallyIndependentSets that checks whether sets of nodes are conditionally independent by holding one constant while simulating all the others and checking that the logProbs of the held-constant set does not change, then doing that for each set. See comments above the function.

Items 2a and 2b are not exported and not used. They are for upcoming development.

@perrydv
Copy link
Copy Markdown
Contributor Author

perrydv commented Feb 5, 2021

Comment for the record: Test failures are of two kinds:

  1. Different ordering of the same nodes.
  2. Cases with split (LHSinferred) nodes, where it looks like the new (C++) version is actually the correct one. Based on dev team discussion, we will draft this out to a slightly more fully featured method and then update testing.

@danielturek
Copy link
Copy Markdown
Member

I was confused by the type argument to getConditionallyIndependentSets, even after reading the preceding documentation. Does this argument change what is returned? Or merely how it's found?

About posterior predictive nodes, I guess I'm fine to have the default being to exclude (not include them) in the results, but how about via an option includePosteriorPredictiveNodes, with default value of FALSE ?

I'm ok with the long name getConditionallyIndependentSets, it's lengthy, but conveys the functionality well. That said, if anyone has better (more concise) ideas, I'd be interested.

I imagine that eventually we'd make these model class functions, is that the idea?

  • model$getParents(...)
  • model$getConditionallyIndependentSets(..)

@perrydv
Copy link
Copy Markdown
Contributor Author

perrydv commented Feb 11, 2021

I pushed some changes before finished by accident, so this branch is in an intermediate unintended state.

@perrydv
Copy link
Copy Markdown
Contributor Author

perrydv commented Feb 12, 2021

This PR is now substantially changed. The two main components are now:

  1. model$getParents, with prototype:
getParents = function(nodes, omit = character(), self = FALSE,
                      determOnly = FALSE, stochOnly = TRUE,
                      includeData = TRUE, dataOnly = FALSE,
                      includeRHSonly = FALSE, upstream = FALSE,
                      returnType = 'names', returnScalarComponents = FALSE)

Note that arguments closely follow getDependencies but have different defaults. By default, only stochastic nodes, omitting self, are returned.

There is also a model$getParentsList similar to the model$getDependenciesList for access to more raw internals.

An intermediate step, described when this PR was opened, was getParentNodesC, a C++ version of getParentNodes (a function, not a model method). What happened was (i) the new version correctly traced through split (LHSinferred) nodes where the old one didn't and (ii) the new version returned nodes in topological order where the old one didn't. So the new version seemed better, but broke testing. Currently , the old version is turned on (by an option), so code that currently uses getParentNodes should work and can be migrated to model$getParents in separate PRs.

  1. model$getConditionallyIndependentSets with prototype
getConditionallyIndependentSets = function( nodes,
                                            givenNodes,
                                            omit = integer(),
                                            type = c("both", "fromTop", "fromBottom"),
                                            stochOnly = TRUE,
                                            returnType = 'names',
                                           returnScalarComponents = FALSE)

We could provide this feature not as a model method if there is concern about clogging up models.

Here is a copy of the draft roxygen method text:

Get a list of conditionally independent sets of nodes in a nimble model.

Conditionally independent sets of nodes are typically groups of latent states whose joint [conditional] probability (density) will not change even if any other non-fixed node is changed. Default fixed nodes are data nodes and parameter nodes ([nodes] with no parent nodes), but this can be controlled.

model: A nimble model object (uncompiled or compiled).

nodes: A vector of node names or their graph IDs that are the starting nodes from which conditionally independent sets of nodes should be found. If omitted, the default will be all latent nodes, defined as stochastic nodes that are not data and have at least one stochastic parent node (possible with determinstic nodes in between). Note that this will omit latent states that have no hyperparameters. An example is the first latent state in some state-space (time-series) models, which is sometimes declared with known prior. See type because it relates to the interpretation of nodes.

givenNodes: A vector of node names or their graph IDs that should be considered as fixed and hence can be conditioned on. If omitted, the default will be all data nodes and all parameter nodes, the latter defined as nodes with no stochastic parent nodes (skipping over deterministic parent nodes).

omit: A vector of node names or their graph IDs that should be omitted and should block further graph exploration.

type: Type of graph exploration (upstream through parent nodes, downstream through dependent nodes, or both). For ""both"", the input nodes are interpreted as latent states, from which both downstream and upstream exploration should be done to find nodes in the same set (nodes that are not conditionally independent from each other). For ""fromTop"", the input nodes are interpreted as parameters, so graph exploration begins from the top (input) downstream. For ""fromBottom"", the input nodes are interpreted and data nodes, so graph exploration begins from the bottom (input) upstream.

stochOnly: Logical for whether only stochastic nodes should be returned (default = TRUE). If FALSE, both deterministic and stochastic nodes are returned.

returnType: Either ""names"" for returned nodes to be node names or ""ids"" for returned nodes to be graph IDs.

returnScalarComponents: If FALSE (default), multivariate nodes are returned as full names (e.g. ""x[1:3]""). If TRUE, they are returned as scalar elements (e.g. ""x[1]"", ""x[2]"", ""x[3]"").

Details: This function returns sets of conditionally independent nodes. Multiple input nodes might be in the same set or different sets, and other nodes (not in codes) will be included.

There is a non-exported function nimble:::testConditionallyIndependentSets(model, sets, initialize = TRUE)} that tests whether the conditional independence of sets is valid. It should be the case that nimble:::testConditionallyIndependentSets(model, getConditionallyIndependentSets(model), initialize = TRUE) returns TRUE.

Return value: List of nodes that are in conditionally independent sets. Within each set, nodes are returned in topologically sorted order. The sets themselves are returned in topologically sorted order of their first nodes.

There are some tests in test-getDependencies. These are features that invite creative models and invocations, potentially including unwise/unintended invocations, so there is room for more tests.

@paciorek
Copy link
Copy Markdown
Contributor

Comments from 2021-02-18 meeting.

For getParents model method: Defaults to be self=FALSE, stochOnly=FALSE. I think we also agreed to have an option for only immediate parents (but not to do immediate dependencies in getDependencies) as part of the method.

Once Perry does final development work, Chris can merge in and can update the getParenNodes function to use getParents method if a nimbleOption is turned on (which it will be by default). No change in use of getParentNodes in BNP or posterior predictive stuff should be needed. Testing will need to be updated as new approach returns results in sorted order.

@perrydv
Copy link
Copy Markdown
Contributor Author

perrydv commented Mar 4, 2021

I have updated this PR as follows:

  1. model$getParents type argument is replaced with inputType, taking values latent, param or data, with associated change in documentation.
  2. model$getParents default for stochOnly changed to FALSE.
  3. model$getConditionallyIndependentSets has new addition to default behavior for givenNodes (whether they are provided or take their default values). Their deterministic dependencies are now automatically ensured to be included with them. This gives the behavior one will normally want. (See pump example at the end of test-getDependencies.R.) This can be turned off by nimbleOptions(groupDetermWithGivenInCondIndSets = FALSE). Specifically, say A -> B -> C -> D and also B -> E -> F; A, D, and F are givenNodes (param and data); C and E are latent nodes; and B is deterministic. Previously C and E would be in a set together because both depend on B. Since B is deterministic, it is now automatically treated as given since A is given. The allows C and E to be in different sets.
  4. New option oneStep for getParents, defaulting to FALSE. If TRUE, only immediate (one-step) parent nodes are returned.
  5. A bug in handling cases where there are empty sets during C++ processing of getConditionallyIndependentSets was fixed.

There is no special handling of posterior predictive nodes. An early comment in code raised this issue, but they end up treated as any other nodes right now.

@danielturek
Copy link
Copy Markdown
Member

@perrydv Do we definitely like the new name oneStep better than immediateOnly ? I guess you've given this plenty of thought, but something involving the word "immediate" seemed more intuitive to me.

And thanks for explaining (by way of your A, B, C, D, .... example) the new functionality for grouping deterministic nodes along with given nodes. By way of your example, that makes a lot of sense, and I agree sounds like the functionality that we want.

@paciorek
Copy link
Copy Markdown
Contributor

paciorek commented Mar 5, 2021

minor two cents - I agree with Daniel about 'immediate' seeming more intuitive to me.

@perrydv
Copy link
Copy Markdown
Contributor Author

perrydv commented Mar 19, 2021

I changed oneStep to immediateOnly.

I think this is ready to include. Note that I have not put anything in the user manual. on getParents or getConditionallyIndependentSets. We could keep both low profile for now, or we could declare getParents more stable (and document it) but hold off on getConditionallyIndependentSets in case its syntax needs to evolve. Up to you @paciorek. Both do have roxygen method entries.

@paciorek
Copy link
Copy Markdown
Contributor

I added a brief mention of getParents in manual. Let's hold off on any further doc'n of getConditionallyIndependentSets.

@paciorek paciorek merged commit 66b2202 into devel Mar 20, 2021
@paciorek paciorek deleted the getParents branch March 20, 2021 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants