Improve AD system, clear memory explosions, add dmnormAD and PDinverse_logdet#1574
Conversation
|
I would think that this doesn't change our previous thinking that we would generally just extract the known derivative info based on the MVN rather than using AD to get it. |
|
It's worth comparing because in this case (that the log prior Hessian is constant, or constant for purposes of inner optimization), CppAD with some of the changes is capable of more or less recording it as a constant operation and returning that known Hessian very fast. |
|
Using This breaks our conjugate updating in some cases as we use the full The most minimal fix is, I think, to modify |
|
I am working through lifting PDinverse_logdet. It's a bit of a hassle as the dimension of the lifted node is 1-d of length |
|
I am setting up a set of tests in Also there is a bug in transposing in |
|
Ok, on this branch I have:
In a separate PR with branch Also for not good reasons, I am fixing the |
|
Thanks @paciorek. Let me know when you want to go over this stuff. |
|
@paciorek A question for you: How important is it to support the case that the user provide the precision rather than the covariance with dmnormAD? I presume we should support it for completeness but just asking because, for example, they should not invert the covariance in the model just because they can, or it will result in less efficient AD. If we're going to support it, should I just try to jump in on the nimble and nimbleQuad branches you've also worked on at this point, or wait for a merge and then go from there? |
|
@perrydv I think it matters for future sparsity computation we might include as the precision matrix for spatial problems will often be sparse when the covariance is not. See "Bayesian Spatial Modelling with R-INLA" So I think important for nimbleQuad, and even important before we have sparsity if we want to support these spatial models.
|
|
Yeah, I agree with @paul-vdb 's point. |
Comment out seg-faulting PDinverse_logdet test.
|
@perrydv here is what I am seeing in terms of test failures:
|
* Automatically use dmnormAD if building model derivs. Handle conjugacy and PG with dmnormAD declarations. * Fix dmnormAD conjugacy. * Fix getting full prec for dmnormAD. * Add testing for dmnormAD. * Trap use of cholesky with dmnormAD and fix length of lifted node. * Make minor edits to test-ADdmnorm.R. * Add option to avoid using dmnormAD. * Fix first PDinverse test.
|
As far as the seg-faulting, it seems this may be related to our long-standing issue of having seg faults in testing when running tests one after another. If I re-run the first test after running it once, it seg faults. If I run the second test alone, it runs, and the only failures are because the Jacobian from the {0,1} order derivs is now equal but not identical to the Jacobian from the {0,1,2} order derivs, plus a minor out-of-tolerance issue. I presume this is due to Perry's changes to how forward and reverse mode are being used. At least for now I am checking Running the second test first and then the first test seems to succeed, so for now I am going to reverse the order. |
|
Ok, @perrydv I think I've got this down to the only test failures coming from R and C deriv values not matching in Can you take a look and see what you think? As I mentioned above, it looks like the first test failure where you can dig into it is the second test in |
|
Thanks @paciorek for the work to isolate where I should look. It appears that part of the new pathways we use in some cases for derivs ( Most of our use of conditionals are for the
The number of operations is small and may not be worse in complexity than the conditional on the AD tape. The other CppAD conditional uses are for boundaries such as in That is my current thought. Open to other ideas. |
|
I have made the changes outlined. There are no longer any CppAD conditional calls. Instead of e.g. I have seen this fixes many of the test failures, so let's see what else comes up next. I have branched off of this branch to |
|
@perrydv Ok, I think we are close. I think all tests are passing except for this very weird problem: In As far as I can tell the inputs into Any thoughts? I am stuck. |
|
@paciorek yes I'm pretty sure I know what is going on. Look at the new |
|
All tests failed and a bunch of them showed "No internet connection" in the outputs, so I suspect there was an infrastructure problem and restarted them. |

There is a lot in the PR, and there are also some loose ends as I put this up.
Summary of changes
PDinverse_logdet(mat, prec_param)that returns a flattened vector with the upper triangular elements of the inverse ofmat(ifprec_paramisFALSE, somatis a covariance), withlog det(mat)appended as a final element. This results in length n*(n+1)/2 + 1, wherematis n x n. There is also an atomic for AD forPDinverse_logdet. The reason to do this is that for both the inverse and log det, working from the Cholesky is useful, but we don't want to put Cholesky on an AD tape because it is inefficient. And the forward and reverse mode AD needs for log det can use the inverse, so it helps to have it in the same operation.dmnormADthat takes the result of PDinverse_logdet as an input. This allows a more efficient AD tape to be recorded because CppAD is good at achieving efficiency for a quadratic form (i.e. not doing Cholesky solves as we would for only the value).outIndsis analogous towrtbut for the the output or "y" dimensions. If one hasmoutput values and wants derivatives only for some of them,outIndscan control that, potentially saving work.inDirallows a single input direction (weights for linear combination of x directions) to be used in Forward mode for order 1. (Forward mode is not used for order 2.) This reduces work in some cases, although not currently used but built alongsideoutDirbecause they go hand in hand.outDirallows a single output direction (derivatives will be taken of the inner product ofoutDir[i.e. weights] and y). For some needs in Laplace, this can save a large amount of computing.Performance:
nimbleQuad. One is a spatial model and the other a crossed random effects GLMM. Along with changes I will PR onnimbleQuad, use of the changes here makes the memory explosions go away and the run times way faster.Loose ends
PDinverse_logdet, the atomic forprec_param=TRUEIS NOT IMPLEMENTED. This is a real gap and needs attention.test-ADPDinverse_logdetwith a test of the function on its own and in a model withdmnormAD. There is room for more thorough testing, e.g. ofdmnormADon its own.outDir,inDir, andoutIndsundocumented for purposes of the next release in order not to delay further.dmnormADdoes not work. I expect any of us could get to the bottom of this. @paciorek and I started to discuss whetherdmnormADshould really have a separate name fromdmnormor instead because another alternative parameterization.Naming
PDinverse_logdet: PD is for "positive definite". It seems verbose but is accurate...?dmnormAD: This may or may not work well as a name. The "AD" indicates "recommended for AD", however it is not the case that it only works with AD (which is how it might sound). Also see last "loose end".