Improve AD system, clear memory explosions, add dmnormAD and PDinverse_logdet by perrydv · Pull Request #1574 · nimble-dev/nimble

perrydv · 2025-07-13T21:48:59Z

There is a lot in the PR, and there are also some loose ends as I put this up.

Summary of changes

New function PDinverse_logdet(mat, prec_param) that returns a flattened vector with the upper triangular elements of the inverse of mat (if prec_param is FALSE, so mat is a covariance), with log det(mat) appended as a final element. This results in length n*(n+1)/2 + 1, where mat is n x n. There is also an atomic for AD for PDinverse_logdet. The reason to do this is that for both the inverse and log det, working from the Cholesky is useful, but we don't want to put Cholesky on an AD tape because it is inefficient. And the forward and reverse mode AD needs for log det can use the inverse, so it helps to have it in the same operation.
New distribution dmnormAD that takes the result of PDinverse_logdet as an input. This allows a more efficient AD tape to be recorded because CppAD is good at achieving efficiency for a quadratic form (i.e. not doing Cholesky solves as we would for only the value).
Changes to CppAD to make its "dynamic" (what we call "update") tape more efficient for no-ops like +0 or *1. CppAD had this kind of efficiency for the primary "variable" tape but not for the "dynamic" tape, and we use that heavily, so we were getting burned there. I am also doing a PR to get these changes into CppAD, but that is delayed while I work on it for our package(s) here.
Changes to execution of forward and reverse mode steps in our C++ core "getDerivs" function, which in some cases were wasteful.
Extension of the AD system to allow additional arguments "outInds", "inDir" and "outDir".
- outInds is analogous to wrt but for the the output or "y" dimensions. If one has m output values and wants derivatives only for some of them, outInds can control that, potentially saving work.
- inDir allows a single input direction (weights for linear combination of x directions) to be used in Forward mode for order 1. (Forward mode is not used for order 2.) This reduces work in some cases, although not currently used but built alongside outDir because they go hand in hand.
- outDir allows a single output direction (derivatives will be taken of the inner product of outDir [i.e. weights] and y). For some needs in Laplace, this can save a large amount of computing.
Probably some other things that I'm forgetting.

Performance:

We have had two working cases of memory explosion and long run times with Laplace in the development package nimbleQuad. One is a spatial model and the other a crossed random effects GLMM. Along with changes I will PR on nimbleQuad, use of the changes here makes the memory explosions go away and the run times way faster.

Loose ends

For PDinverse_logdet, the atomic for prec_param=TRUE IS NOT IMPLEMENTED. This is a real gap and needs attention.
I added a single test file test-ADPDinverse_logdet with a test of the function on its own and in a model with dmnormAD. There is room for more thorough testing, e.g. of dmnormAD on its own.
Methods of the PDinverse_logdet atomic are still inlined in the .h file, easier for development, and could be moved over to the .cpp (with inline removed).
I have not modified documentation at all. I would be comfortable leaving outDir, inDir, and outInds undocumented for purposes of the next release in order not to delay further.
Lifting of alternative parmeterizations of dmnormAD does not work. I expect any of us could get to the bottom of this. @paciorek and I started to discuss whether dmnormAD should really have a separate name from dmnorm or instead because another alternative parameterization.

Naming

PDinverse_logdet: PD is for "positive definite". It seems verbose but is accurate...?
dmnormAD: This may or may not work well as a name. The "AD" indicates "recommended for AD", however it is not the case that it only works with AD (which is how it might sound). Also see last "loose end".

…cases

…v_jac sometimes.

paul-vdb · 2025-07-15T16:19:59Z

@perrydv @paciorek For a prior on the random effects that uses this new dmnormAD, will we want to do the manually known gradient and hessian on the prior distribution or leave it for the tape?

paciorek · 2025-07-15T17:09:13Z

I would think that this doesn't change our previous thinking that we would generally just extract the known derivative info based on the MVN rather than using AD to get it.

perrydv · 2025-07-15T17:15:44Z

It's worth comparing because in this case (that the log prior Hessian is constant, or constant for purposes of inner optimization), CppAD with some of the changes is capable of more or less recording it as a constant operation and returning that known Hessian very fast.

paciorek · 2025-07-16T16:06:35Z

Using getParam with the precision for dmnormAD when dmnormAD is provided with the cov returns an upper-triangular matrix rather than the full precision matrix.

This breaks our conjugate updating in some cases as we use the full prec. And could break other things or user code.

The most minimal fix is, I think, to modify calc_dmnorm_prec_ldet_AltParams to ensure the full precision is returned (leaving PDinverse_logdet as an internal function that returns only the upper triangle by design). I will plan to do that, at least for now to continue with testing, but will want @perrydv input

paciorek · 2025-07-16T17:42:49Z

I am working through lifting PDinverse_logdet. It's a bit of a hassle as the dimension of the lifted node is 1-d of length $n^2+1$, which doesn't fit the pattern of how we generally handle dimensionality of lifted nodes.

paciorek · 2025-07-17T01:39:09Z

I am setting up a set of tests in test-ADPDinverse_logdet.R (renamed as test-ADdmnorm.R) on my own branch automate_dmnormAD, where I am also changing all model use of dmnorm to dmnormAD when buildDerivs=TRUE.

Also there is a bug in transposing in calc_dmnorm_prec_ldet_AltParams() that I will fix on fix-cppad-memory-explosion.

paciorek · 2025-07-18T01:01:42Z

Ok, on this branch I have:

fixed a linear algebra bug in calc_dmnorm_prec_ldet_AltParams
set things up to lift PDinverse_logdet
fix an argument error in calling the new rmnorm from R

In a separate PR with branch automate_dmnormAD, I will discuss changes I've made to use dmnormAD whenever model derivs are enabled, with a bunch of testing added that requires the automation change.

Also for not good reasons, I am fixing the getParam issue of returning only the upper triangle of the prec cholesky on the automate_dmnormAD branch.

perrydv · 2025-07-23T11:01:21Z

Thanks @paciorek. Let me know when you want to go over this stuff.

perrydv · 2025-07-23T15:20:11Z

@paciorek A question for you: How important is it to support the case that the user provide the precision rather than the covariance with dmnormAD? I presume we should support it for completeness but just asking because, for example, they should not invert the covariance in the model just because they can, or it will result in less efficient AD. If we're going to support it, should I just try to jump in on the nimble and nimbleQuad branches you've also worked on at this point, or wait for a merge and then go from there?

paul-vdb · 2025-07-23T16:36:18Z

@perrydv I think it matters for future sparsity computation we might include as the precision matrix for spatial problems will often be sparse when the covariance is not. See "Bayesian Spatial Modelling with R-INLA" So I think important for nimbleQuad, and even important before we have sparsity if we want to support these spatial models.

paciorek · 2025-07-25T19:45:01Z

Yeah, I agree with @paul-vdb 's point.

Comment out seg-faulting PDinverse_logdet test.

paciorek · 2025-07-28T20:37:20Z

@perrydv here is what I am seeing in terms of test failures:

test-ADPDinverse_logdet.R had its first test fail because cov was not defined. I fixed that.
test-ADPDinverse_logdet.R had its second test fail because of a segfault that I am hoping you'll look at. It's in the complicated test_ADmodelCalculate stuff, but hopefully it won't be too hard to get into.
I commented that test out to allow later tests to run but didn't actually re-run testing, so not sure what would happen. All later tests in the batch are not AD-related, so hopefully ok.
test-ADdists.R is failing with numerical comparisons, but it seems to mainly (hopefully only) relate to 2nd derivs and to CderivsJacF1 having all zeros. The first case where you can dig into it is the second (Dirichlet) test.

* Automatically use dmnormAD if building model derivs. Handle conjugacy and PG with dmnormAD declarations. * Fix dmnormAD conjugacy. * Fix getting full prec for dmnormAD. * Add testing for dmnormAD. * Trap use of cholesky with dmnormAD and fix length of lifted node. * Make minor edits to test-ADdmnorm.R. * Add option to avoid using dmnormAD. * Fix first PDinverse test.

…testing.

paciorek · 2025-07-30T01:32:16Z

test-ADdmnorm.R passes tests locally (apart from the commented-out second test). I'm going to re-push to trigger testing again, as I don't see what could cause the weird failures seen in the CI output.

As far as the seg-faulting, it seems this may be related to our long-standing issue of having seg faults in testing when running tests one after another. If I re-run the first test after running it once, it seg faults. If I run the second test alone, it runs, and the only failures are because the Jacobian from the {0,1} order derivs is now equal but not identical to the Jacobian from the {0,1,2} order derivs, plus a minor out-of-tolerance issue. I presume this is due to Perry's changes to how forward and reverse mode are being used. At least for now I am checking AD_test_utils.R to check for equality not identical.

Running the second test first and then the first test seems to succeed, so for now I am going to reverse the order.

to deal with finickiness.

paciorek · 2025-07-30T23:18:23Z

Ok, @perrydv I think I've got this down to the only test failures coming from R and C deriv values not matching in test-ADdists.R, mainly for order 2 derivs, and in cases where the compiled deriv is exactly equal to 0.

Can you take a look and see what you think? As I mentioned above, it looks like the first test failure where you can dig into it is the second test in test-ADdists.R, which is a test involving the Dirichlet distribution.

perrydv · 2025-08-01T11:46:12Z

Thanks @paciorek for the work to isolate where I should look.

It appears that part of the new pathways we use in some cases for derivs (subgraph_jac_rev) is not compatible with the combination of double-taping and CppAD conditionals. It is not specific to ddirch but that is just the first test that uses a non-fixed log argument version. This is a glitch but I think we can work around it.

Most of our use of conditionals are for the log argument of distributions. We could replace

dens = CppAD::CondExpEq(give_log, Type(1), dens, exp(dens));
with
dense = give_log * dens + (1-give_log)*exp(dens)
or with an atomic.

The number of operations is small and may not be worse in complexity than the conditional on the AD tape.

The other CppAD conditional uses are for boundaries such as in dunif or dhalfflat. We could handle those with an atomic, which might be as good or better anyway.

That is my current thought. Open to other ideas.

…function.

perrydv · 2025-08-03T16:55:35Z

I have made the changes outlined. There are no longer any CppAD conditional calls. Instead of e.g. CppAD::CondExpGt there is nimDerivs_CondExpGt which are all built on an underlying atomic for the Heaviside step function (aka nimStep).

I have seen this fixes many of the test failures, so let's see what else comes up next.

I have branched off of this branch to rework-dmnormAD to make dmnormAD support precision parameterization. This involves changes to the parameterizations of the input list, where I am currently stuck at the moment. That is work in progress and not in a PR.

paciorek · 2025-08-06T19:13:23Z

@perrydv Ok, I think we are close. I think all tests are passing except for this very weird problem:

In test_ADdists.R, the test test_AD2(distn_tests2_short[[32]]) is failing on rework_dmnormAD, but not on fix-cppad-memory-explosion. That test concerns the Weibull dist, so nothing to do with dmnorm. Some of the 0th-order C derivs are totally different (and presumably incorrect) from the other C derivs and from the R derivs:

Browse[1]> resRecord[[1]]
$Rrun
[1] -1.8175306 -0.8830347 -0.9908013 -5.3980914 -1.5652969 -1.3064737 -0.7815425

$Crun
[1] -1.8175306 -0.8830347 -0.9908013 -5.3980914 -1.5652969 -1.3064737 -0.7815425

$RderivsRun
[1] -1.8175306 -0.8830347 -0.9908013 -5.3980914 -1.5652969 -1.3064737 -0.7815425

$CderivsRun
[1] 0.16242636 0.41352610 0.37127907 0.00452521 0.20902594 0.27077320 0.45769945

$Cvalue
[1] 0.16242636 0.41352610 0.37127907 0.00452521 0.20902594 0.27077320 0.45769945

$CderivsValue
[1] 0.16242636 0.41352610 0.37127907 0.00452521 0.20902594 0.27077320 0.45769945

As far as I can tell the inputs into Cfun in do_one_set and the C++ code use by Cfun are the same in rework_dmnormAD and fix-cppad-memory-explosion, as one would expect since rework_dmnormAD only changes stuff related to dmnormAD handling.

Any thoughts? I am stuck.

perrydv · 2025-08-06T19:41:13Z

@paciorek yes I'm pretty sure I know what is going on. Look at the new log_or_exp function in nimDerivs_dists.h. This puts in one place the common logic in distributions about what to return in a new way that avoids CppAD conditionals. But Weibull actually has that logic flipped (calculating on non-log scale and then optionally logging it). When I first made the changes on cppad-fix-memory-explosion, I didn't catch that Weibull needed it the other way, saw that it was wrong, and fixed it with a custom line. But I guess I branched rework-dmnormAD off of cppad-fix-memory-explosion before that, so it still had the mistake. I've just pushed what I think is the correction to rework-dmnormAD. I haven't tested it, just changed the line that I remember is where the issue is.

perrydv · 2025-08-07T11:34:27Z

All tests failed and a bunch of them showed "No internet connection" in the outputs, so I suspect there was an infrastructure problem and restarted them.

perrydv added 10 commits June 5, 2025 09:52

improve CppAD new_dynamic by catching IdenticalZero and IdenticalOne …

36c3caf

…cases

Ad chol_PDlogdet functions in C++, with GitHub copilot support

b5ac955

progress towards a version of dmnorm for AD using prec_ldet setup

aec408d

ignore .vscode

722b49a

Remove .vscode (after adding to .gitignore)

c16c527

working on dmnormAD

93203e7

Updates for 2nd order reverse and clean up

18c9cf7

Updates to PDinverse_logDet and getDerivs_internal to use subgraph_re…

e1076c1

…v_jac sometimes.

add inDir, outDir, and outInds to nimDerivs. Also PDinverse_logdet

4cb7428

Add test-ADPDinverse_logdet.R

56fcc19

Export dmnormAD stuff and do not pass n to C code.

dd09382

Lift PDinverse_logdet.

f106ed2

paciorek added 2 commits July 17, 2025 11:15

Fix transpose bug in calc_dmnorm_prec_ldet_AltParams.

0ad31ff

Merge branch 'devel' into fix-cppad-memory-explosion

0e2c3f3

paciorek added 3 commits July 28, 2025 11:08

Add missing input for PDinverse_logdet test;

f5129a3

Comment out seg-faulting PDinverse_logdet test.

Fix seeming bug in error reporting in test_AD2_oneCall.

6a441ae

Fix first PDinverse test.

c958715

paciorek added 2 commits July 29, 2025 18:28

Monkey with test-ADdmnorm.R to address finicky test failures.

3fdbb21

Check for equality not identical in cOutput01$jac vs. cOutput012$jac …

f9310dd

…testing.

paciorek added 3 commits July 30, 2025 11:50

Tweak tests in light of AD changes.

63c298e

Fix tweak to AD_test_utils.

99af509

Tweak verbosity and test ordering in test-ADdmnorm.R

9888258

to deal with finickiness.

perrydv added 3 commits August 3, 2025 16:34

write nimDerivs_ versions of CppAD conditionals based on atomic step …

cefe32e

…function.

one more log_or_exp

3936c74

initial steps to make dmnormAD take prec or cov parameterization

bfef83e

perrydv and others added 7 commits August 3, 2025 21:45

fix Weibull log_or_exp. fix dhalfflat use of new AD conditional.

63d72dc

Fix up handling of prec for dmnormAD.

9847635

fix updated dmnorm_inv_pd including renaming and adding tests

a142ce2

Make minor testing comment change.

53b2377

Fix dmnormAD alt param calc based on inv_ld having inverse.

214c200

Refine calc_dmnorm_inv_ld_AltParams.

f717b0b

Fix dwish-dmnormAD conjugacy checking and add tests.

174da49

change order of res and log(res) in nimDerivs weibull

dfd2c24

paciorek added 4 commits August 6, 2025 15:51

Fix minor merge conflict.

738556a

Remove stray browsers in test file.

39eb2fa

Remove another browser.

2ebf3cd

Fix dumb mistake in checking in cc_otherParamsCheck.

a0e1eb6

Fix dumb mistake #2 in checking in cc_otherParamsCheck.

a90fef9

paciorek merged commit 8cb143c into devel Aug 7, 2025
8 checks passed

paciorek deleted the fix-cppad-memory-explosion branch August 7, 2025 16:49

Conversation

perrydv commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Performance:

Loose ends

Naming

Uh oh!

paul-vdb commented Jul 15, 2025

Uh oh!

paciorek commented Jul 15, 2025

Uh oh!

perrydv commented Jul 15, 2025

Uh oh!

paciorek commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paciorek commented Jul 16, 2025

Uh oh!

paciorek commented Jul 17, 2025

Uh oh!

paciorek commented Jul 18, 2025

Uh oh!

perrydv commented Jul 23, 2025

Uh oh!

perrydv commented Jul 23, 2025

Uh oh!

paul-vdb commented Jul 23, 2025

Uh oh!

paciorek commented Jul 25, 2025

Uh oh!

paciorek commented Jul 28, 2025

Uh oh!

paciorek commented Jul 30, 2025

Uh oh!

paciorek commented Jul 30, 2025

Uh oh!

perrydv commented Aug 1, 2025

Uh oh!

perrydv commented Aug 3, 2025

Uh oh!

paciorek commented Aug 6, 2025

Uh oh!

perrydv commented Aug 6, 2025

Uh oh!

perrydv commented Aug 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perrydv commented Jul 13, 2025 •

edited

Loading

paciorek commented Jul 16, 2025 •

edited

Loading