Skip to content

Improve AD system, clear memory explosions, add dmnormAD and PDinverse_logdet#1574

Merged
paciorek merged 40 commits into
develfrom
fix-cppad-memory-explosion
Aug 7, 2025
Merged

Improve AD system, clear memory explosions, add dmnormAD and PDinverse_logdet#1574
paciorek merged 40 commits into
develfrom
fix-cppad-memory-explosion

Conversation

@perrydv
Copy link
Copy Markdown
Contributor

@perrydv perrydv commented Jul 13, 2025

There is a lot in the PR, and there are also some loose ends as I put this up.

Summary of changes

  • New function PDinverse_logdet(mat, prec_param) that returns a flattened vector with the upper triangular elements of the inverse of mat (if prec_param is FALSE, so mat is a covariance), with log det(mat) appended as a final element. This results in length n*(n+1)/2 + 1, where mat is n x n. There is also an atomic for AD for PDinverse_logdet. The reason to do this is that for both the inverse and log det, working from the Cholesky is useful, but we don't want to put Cholesky on an AD tape because it is inefficient. And the forward and reverse mode AD needs for log det can use the inverse, so it helps to have it in the same operation.
  • New distribution dmnormAD that takes the result of PDinverse_logdet as an input. This allows a more efficient AD tape to be recorded because CppAD is good at achieving efficiency for a quadratic form (i.e. not doing Cholesky solves as we would for only the value).
  • Changes to CppAD to make its "dynamic" (what we call "update") tape more efficient for no-ops like +0 or *1. CppAD had this kind of efficiency for the primary "variable" tape but not for the "dynamic" tape, and we use that heavily, so we were getting burned there. I am also doing a PR to get these changes into CppAD, but that is delayed while I work on it for our package(s) here.
  • Changes to execution of forward and reverse mode steps in our C++ core "getDerivs" function, which in some cases were wasteful.
  • Extension of the AD system to allow additional arguments "outInds", "inDir" and "outDir".
    • outInds is analogous to wrt but for the the output or "y" dimensions. If one has m output values and wants derivatives only for some of them, outInds can control that, potentially saving work.
    • inDir allows a single input direction (weights for linear combination of x directions) to be used in Forward mode for order 1. (Forward mode is not used for order 2.) This reduces work in some cases, although not currently used but built alongside outDir because they go hand in hand.
    • outDir allows a single output direction (derivatives will be taken of the inner product of outDir [i.e. weights] and y). For some needs in Laplace, this can save a large amount of computing.
  • Probably some other things that I'm forgetting.

Performance:

  • We have had two working cases of memory explosion and long run times with Laplace in the development package nimbleQuad. One is a spatial model and the other a crossed random effects GLMM. Along with changes I will PR on nimbleQuad, use of the changes here makes the memory explosions go away and the run times way faster.

Loose ends

  • For PDinverse_logdet, the atomic for prec_param=TRUE IS NOT IMPLEMENTED. This is a real gap and needs attention.
  • I added a single test file test-ADPDinverse_logdet with a test of the function on its own and in a model with dmnormAD. There is room for more thorough testing, e.g. of dmnormAD on its own.
  • Methods of the PDinverse_logdet atomic are still inlined in the .h file, easier for development, and could be moved over to the .cpp (with inline removed).
  • I have not modified documentation at all. I would be comfortable leaving outDir, inDir, and outInds undocumented for purposes of the next release in order not to delay further.
  • Lifting of alternative parmeterizations of dmnormAD does not work. I expect any of us could get to the bottom of this. @paciorek and I started to discuss whether dmnormAD should really have a separate name from dmnorm or instead because another alternative parameterization.

Naming

  • PDinverse_logdet: PD is for "positive definite". It seems verbose but is accurate...?
  • dmnormAD: This may or may not work well as a name. The "AD" indicates "recommended for AD", however it is not the case that it only works with AD (which is how it might sound). Also see last "loose end".

@paul-vdb
Copy link
Copy Markdown
Contributor

@perrydv @paciorek For a prior on the random effects that uses this new dmnormAD, will we want to do the manually known gradient and hessian on the prior distribution or leave it for the tape?

@paciorek
Copy link
Copy Markdown
Contributor

I would think that this doesn't change our previous thinking that we would generally just extract the known derivative info based on the MVN rather than using AD to get it.

@perrydv
Copy link
Copy Markdown
Contributor Author

perrydv commented Jul 15, 2025

It's worth comparing because in this case (that the log prior Hessian is constant, or constant for purposes of inner optimization), CppAD with some of the changes is capable of more or less recording it as a constant operation and returning that known Hessian very fast.

@paciorek
Copy link
Copy Markdown
Contributor

paciorek commented Jul 16, 2025

Using getParam with the precision for dmnormAD when dmnormAD is provided with the cov returns an upper-triangular matrix rather than the full precision matrix.

This breaks our conjugate updating in some cases as we use the full prec. And could break other things or user code.

The most minimal fix is, I think, to modify calc_dmnorm_prec_ldet_AltParams to ensure the full precision is returned (leaving PDinverse_logdet as an internal function that returns only the upper triangle by design). I will plan to do that, at least for now to continue with testing, but will want @perrydv input

@paciorek
Copy link
Copy Markdown
Contributor

I am working through lifting PDinverse_logdet. It's a bit of a hassle as the dimension of the lifted node is 1-d of length $n^2+1$, which doesn't fit the pattern of how we generally handle dimensionality of lifted nodes.

@paciorek
Copy link
Copy Markdown
Contributor

I am setting up a set of tests in test-ADPDinverse_logdet.R (renamed as test-ADdmnorm.R) on my own branch automate_dmnormAD, where I am also changing all model use of dmnorm to dmnormAD when buildDerivs=TRUE.

Also there is a bug in transposing in calc_dmnorm_prec_ldet_AltParams() that I will fix on fix-cppad-memory-explosion.

@paciorek
Copy link
Copy Markdown
Contributor

Ok, on this branch I have:

  • fixed a linear algebra bug in calc_dmnorm_prec_ldet_AltParams
  • set things up to lift PDinverse_logdet
  • fix an argument error in calling the new rmnorm from R

In a separate PR with branch automate_dmnormAD, I will discuss changes I've made to use dmnormAD whenever model derivs are enabled, with a bunch of testing added that requires the automation change.

Also for not good reasons, I am fixing the getParam issue of returning only the upper triangle of the prec cholesky on the automate_dmnormAD branch.

@perrydv
Copy link
Copy Markdown
Contributor Author

perrydv commented Jul 23, 2025

Thanks @paciorek. Let me know when you want to go over this stuff.

@perrydv
Copy link
Copy Markdown
Contributor Author

perrydv commented Jul 23, 2025

@paciorek A question for you: How important is it to support the case that the user provide the precision rather than the covariance with dmnormAD? I presume we should support it for completeness but just asking because, for example, they should not invert the covariance in the model just because they can, or it will result in less efficient AD. If we're going to support it, should I just try to jump in on the nimble and nimbleQuad branches you've also worked on at this point, or wait for a merge and then go from there?

@paul-vdb
Copy link
Copy Markdown
Contributor

@perrydv I think it matters for future sparsity computation we might include as the precision matrix for spatial problems will often be sparse when the covariance is not. See "Bayesian Spatial Modelling with R-INLA" So I think important for nimbleQuad, and even important before we have sparsity if we want to support these spatial models.

image

@paciorek
Copy link
Copy Markdown
Contributor

Yeah, I agree with @paul-vdb 's point.

@paciorek
Copy link
Copy Markdown
Contributor

@perrydv here is what I am seeing in terms of test failures:

  1. test-ADPDinverse_logdet.R had its first test fail because cov was not defined. I fixed that.
  2. test-ADPDinverse_logdet.R had its second test fail because of a segfault that I am hoping you'll look at. It's in the complicated test_ADmodelCalculate stuff, but hopefully it won't be too hard to get into.
    I commented that test out to allow later tests to run but didn't actually re-run testing, so not sure what would happen. All later tests in the batch are not AD-related, so hopefully ok.
  3. test-ADdists.R is failing with numerical comparisons, but it seems to mainly (hopefully only) relate to 2nd derivs and to CderivsJacF1 having all zeros. The first case where you can dig into it is the second (Dirichlet) test.

* Automatically use dmnormAD if building model derivs.
Handle conjugacy and PG with dmnormAD declarations.

* Fix dmnormAD conjugacy.

* Fix getting full prec for dmnormAD.

* Add testing for dmnormAD.

* Trap use of cholesky with dmnormAD and fix length of lifted node.

* Make minor edits to test-ADdmnorm.R.

* Add option to avoid using dmnormAD.

* Fix first PDinverse test.
@paciorek
Copy link
Copy Markdown
Contributor

test-ADdmnorm.R passes tests locally (apart from the commented-out second test). I'm going to re-push to trigger testing again, as I don't see what could cause the weird failures seen in the CI output.

As far as the seg-faulting, it seems this may be related to our long-standing issue of having seg faults in testing when running tests one after another. If I re-run the first test after running it once, it seg faults. If I run the second test alone, it runs, and the only failures are because the Jacobian from the {0,1} order derivs is now equal but not identical to the Jacobian from the {0,1,2} order derivs, plus a minor out-of-tolerance issue. I presume this is due to Perry's changes to how forward and reverse mode are being used. At least for now I am checking AD_test_utils.R to check for equality not identical.

Running the second test first and then the first test seems to succeed, so for now I am going to reverse the order.

@paciorek
Copy link
Copy Markdown
Contributor

Ok, @perrydv I think I've got this down to the only test failures coming from R and C deriv values not matching in test-ADdists.R, mainly for order 2 derivs, and in cases where the compiled deriv is exactly equal to 0.

Can you take a look and see what you think? As I mentioned above, it looks like the first test failure where you can dig into it is the second test in test-ADdists.R, which is a test involving the Dirichlet distribution.

@perrydv
Copy link
Copy Markdown
Contributor Author

perrydv commented Aug 1, 2025

Thanks @paciorek for the work to isolate where I should look.

It appears that part of the new pathways we use in some cases for derivs (subgraph_jac_rev) is not compatible with the combination of double-taping and CppAD conditionals. It is not specific to ddirch but that is just the first test that uses a non-fixed log argument version. This is a glitch but I think we can work around it.

Most of our use of conditionals are for the log argument of distributions. We could replace

dens = CppAD::CondExpEq(give_log, Type(1), dens, exp(dens));
with
dense = give_log * dens + (1-give_log)*exp(dens)
or with an atomic.

The number of operations is small and may not be worse in complexity than the conditional on the AD tape.

The other CppAD conditional uses are for boundaries such as in dunif or dhalfflat. We could handle those with an atomic, which might be as good or better anyway.

That is my current thought. Open to other ideas.

@perrydv
Copy link
Copy Markdown
Contributor Author

perrydv commented Aug 3, 2025

I have made the changes outlined. There are no longer any CppAD conditional calls. Instead of e.g. CppAD::CondExpGt there is nimDerivs_CondExpGt which are all built on an underlying atomic for the Heaviside step function (aka nimStep).

I have seen this fixes many of the test failures, so let's see what else comes up next.

I have branched off of this branch to rework-dmnormAD to make dmnormAD support precision parameterization. This involves changes to the parameterizations of the input list, where I am currently stuck at the moment. That is work in progress and not in a PR.

@paciorek
Copy link
Copy Markdown
Contributor

paciorek commented Aug 6, 2025

@perrydv Ok, I think we are close. I think all tests are passing except for this very weird problem:

In test_ADdists.R, the test test_AD2(distn_tests2_short[[32]]) is failing on rework_dmnormAD, but not on fix-cppad-memory-explosion. That test concerns the Weibull dist, so nothing to do with dmnorm. Some of the 0th-order C derivs are totally different (and presumably incorrect) from the other C derivs and from the R derivs:

Browse[1]> resRecord[[1]]
$Rrun
[1] -1.8175306 -0.8830347 -0.9908013 -5.3980914 -1.5652969 -1.3064737 -0.7815425

$Crun
[1] -1.8175306 -0.8830347 -0.9908013 -5.3980914 -1.5652969 -1.3064737 -0.7815425

$RderivsRun
[1] -1.8175306 -0.8830347 -0.9908013 -5.3980914 -1.5652969 -1.3064737 -0.7815425

$CderivsRun
[1] 0.16242636 0.41352610 0.37127907 0.00452521 0.20902594 0.27077320 0.45769945

$Cvalue
[1] 0.16242636 0.41352610 0.37127907 0.00452521 0.20902594 0.27077320 0.45769945

$CderivsValue
[1] 0.16242636 0.41352610 0.37127907 0.00452521 0.20902594 0.27077320 0.45769945

As far as I can tell the inputs into Cfun in do_one_set and the C++ code use by Cfun are the same in rework_dmnormAD and fix-cppad-memory-explosion, as one would expect since rework_dmnormAD only changes stuff related to dmnormAD handling.

Any thoughts? I am stuck.

@perrydv
Copy link
Copy Markdown
Contributor Author

perrydv commented Aug 6, 2025

@paciorek yes I'm pretty sure I know what is going on. Look at the new log_or_exp function in nimDerivs_dists.h. This puts in one place the common logic in distributions about what to return in a new way that avoids CppAD conditionals. But Weibull actually has that logic flipped (calculating on non-log scale and then optionally logging it). When I first made the changes on cppad-fix-memory-explosion, I didn't catch that Weibull needed it the other way, saw that it was wrong, and fixed it with a custom line. But I guess I branched rework-dmnormAD off of cppad-fix-memory-explosion before that, so it still had the mistake. I've just pushed what I think is the correction to rework-dmnormAD. I haven't tested it, just changed the line that I remember is where the issue is.

@perrydv
Copy link
Copy Markdown
Contributor Author

perrydv commented Aug 7, 2025

All tests failed and a bunch of them showed "No internet connection" in the outputs, so I suspect there was an infrastructure problem and restarted them.

@paciorek paciorek merged commit 8cb143c into devel Aug 7, 2025
8 checks passed
@paciorek paciorek deleted the fix-cppad-memory-explosion branch August 7, 2025 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants