Reweighting utilities #654

maxentile · 2022-02-26T00:23:16Z

Adds functions for two kinds of differentiable reweighting (construct_endpoint_reweighting_estimator, construct_mixture_reweighting_estimator).

These are tested for correctness on a 1D system (where comparisons to exact free energies and gradients are possible).

Less stringent tests confirm that the same implementation is compatible with custom_ops in an absolute hydration free energy test system.

Notes to reviewers:

Length -- Although the raw number of lines added is pretty large, some consolation is that there are only ~40-50 lines of Python statements in reweighting.py (most of the total line count is in tests, and most of the lines in reweighting.py are documentation / white space)
Comparison to Differentiable reweighting #404 -- Structured differently (free functions rather than classses), omits some features (no effective sample size diagnostic), is less jax-specific (assumes access to a batched_u_fxn, rather than a u_fxn which can be transformed by jax.vmap), and has a more generic interface.
- The more generic interface attempts to clarify relation to MBAR:
  - Both of the reweighting estimators in this PR are compatible with inputs that are not necessarily processed by MBAR.
  - The log_weights required by one of these approaches can be computed from a collection of sampled states using MBAR via interpret_as_mixture_potential.
  - The tests assert the approach is compatible with other estimators in principle (although if we have the full u_kn matrix in hand, there would be no reason to prefer, say, TI's estimate of f_k over mbar.f_k).
  - A reference is included about other ways to get (higher-variance estimates) of these log_weights, that would not require the full u_kn matrix.
  - In the future, if we sampled from a mixture distribution with exactly known logpdf (rather than inferring the mixture logpdf from multiple states via MBAR), that would also supply us with these log_weights.
Test structure -- there's some avoidable repetition in the tests, happy to re-organize this
Type annotations -- added in this commit: bff4ae5 -- not sure if these clarify or just add noise, happy to pare down
Docstrings -- didn't fully expand parameter descriptions, ASCII-art diagrams are currently a bit ugly

…ential aside: nice way to express MBAR

Co-authored-by: Matt Wittmann <mwittmann@relaytx.com>

Applying suggestion from #654 (comment) Co-Authored-By: Matt Wittmann <mcwitt@gmail.com>

timemachine/fe/reweighting.py

tests/test_reweighting.py

step through sign flips, log conversion, ignored constant prefactor, and implicit rank-promotion (initially inspired by avoiding a mypy complaint about a transpose #654 (comment) )

Addressing #654 (comment)

proteneer

Did a first pass of the tests - will resume adding more comments on the actual implementation after dinner.

tests/test_reweighting.py

proteneer · 2022-03-01T22:29:15Z

tests/test_reweighting.py

+        f_ref, g_ref = value_and_grad(analytical_delta_f)(trial_params)
+
+        onp.testing.assert_allclose(f_hat, f_ref, atol=atol)
+        onp.testing.assert_allclose(g_hat, g_ref, atol=atol)


How should we think about precision of dDeltaF/dParams both for the gaussian case, and in general? For the gaussian case, I'd imagine that dDeltaF/dlog_sigma might require more precision than dDeltaF/dmean. Presumably, an acceptable level of error is proportional to the internal learning rate one might take when optimizing the parameters, so maybe rtol is more approriate here?

Will need to think about this.

For the gaussian case, I'd imagine that dDeltaF/dlog_sigma might require more precision than dDeltaF/dmean.

I'd imagine so.

In a documentation notebook of #404 , cell [46], note that estimates of the different gradient components have different variance.

How should we think about precision of dDeltaF/dParams both for the gaussian case, and in general?

Preferred interpretation: Exact gradient of a randomized approximation of DeltaF(params).

delta_f_approx_fxn = construct_approximation(ref_params, random_seed)

Presumably, an acceptable level of error is proportional to the internal learning rate one might take when optimizing the parameters

In the case that you created a fresh random approximation every optimization step

delta_f_approx_fxn_0 = construct_approximation(params_0, seed_0) params_1 = params_0 + stepsize * grad(delta_f_approx_fxn_0)(params_0) delta_f_approx_fxn_1 = construct_approximation(params_1, seed_1) params_2 = params_1 + stepsize * grad(delta_f_approx_fxn_1)(params_1) ...

then appropriate step size would be a function of both the geometry of delta_f_exact_fxn and the precision of the gradient estimate.

Instead, I think we would want to perform iterates like

delta_f_approx_fxn_0 = construct_approximation(params_0, seed_0) params_1 = local_optimize(fun=delta_f_approx_fxn_0, x0=params_0, ...) delta_f_approx_fxn_1 = construct_approximation(params_1, seed_1) params_2 = local_optimize(fun=delta_f_approx_fxn_1, x0=params_1, ...) ...

where local_optimize(fun, x0, ...) returns the local optimum of fun (+ some regularization as a function of x0), optionally restricted to some trust region around x0. (Glossing over how that trust region is defined.)

Then whatever step sizes etc. are used inside local_optimize do NOT depend on the precision of the estimate, only on "the geometry of the problem."

The amount of progress you make in each outer loop step (construct_approximation, call local_optimize) would be influenced by how that trust region is defined.

(There's a similar way to view gradient descent: each step of SGD is exactly optimizing a random local approximation of f + some regularization.)

Sorry, missed replying to this part:

so maybe rtol is more approriate here?

I don't think so.

isclose(estimated_gradient, exact_gradient, rtol=...) might be appropriate if the noise magnitude of each component estimated_gradient[i] was proportional to the magnitude of the corresponding gradient component abs(exact_gradient[i]).

(Counterexample: 1D Lennard-Jones testsystem: exact gradient w.r.t. [sigma, epsilon] parameters ~= [-0.15, -0.3] (cell [43]), but estimates of these gradient components have marginal variance ~= [0.1, 0.01] (cell [41]).)

I'm going to leave this "unresolved" so I can find the comments above easier. But this mostly addresses my concerns.

Opened #667 for continued discussion after this PR

tests/test_reweighting.py

timemachine/fe/reweighting.py

proteneer · 2022-03-01T23:31:17Z

timemachine/fe/reweighting.py

+    batched_u_0_fxn: BatchedReducedPotentialFxn,
+    batched_u_1_fxn: BatchedReducedPotentialFxn,


we use batched_ pretty extensively in other parts of the code too

proteneer · 2022-03-01T23:37:00Z

timemachine/fe/reweighting.py

+
+    def endpoint_correction_0(params) -> float:
+        """estimate f(ref, 0) -> f(params, 0) by reweighting"""
+        delta_us = batched_u_0_fxn(samples_0, params) - ref_u_0


We'd probably need to pin samples to a DeviceBuffer or something to truly see any meaningful performance gains here.

timemachine/fe/reweighting.py

Addressing #654 (comment)

Addressing #654 (comment) and #654 (comment)

Addressing #654 (comment)

Previous change to make ref_params the same for both 1D tests ( 475a399 ) causes: > onp.testing.assert_allclose(g_hat, g_ref, atol=atol) E AssertionError: E Not equal to tolerance rtol=1e-07, atol=0.001 E E Mismatched elements: 1 / 2 (50%) E Max absolute difference: 0.0011654208166751 E Max relative difference: 0.0011654208166751 E x: array([-2.282977e-04, -9.988346e-01]) E y: array([ 0., -1.])

proteneer

This PR lgtm - especially after some in person discussions last week.

mcwitt

Looks great! The docstrings, comments, and type annotations are all very useful for understanding

maxentile added 30 commits February 23, 2022 17:50

Add basic reweighting estimators

4dcc322

Move documentation functions into comments

c08b622

Improve documentation

0a997fd

Add 1D test for endpoint reweighting

14876f8

Add utility for interpreting a list of states as a single mixture pot…

1a88155

…ential aside: nice way to express MBAR

Remove asides and alternate phrasings...

baf7db6

Add mixture reweighting test in 1D

d3f2961

Merge branch 'master' into differentiable-reweighting-redux

b0961ff

Avoid jax.numpy.sum(list) type error...

46aa17f

Oops, logsumexp also doesn't like lists

6206aad

Fix TI test

0be8732

Adjust randomized test settings

fc915a0

Refactor test, add AHFE reweighting test

c8e90b1

Rename vec_u -> batched_u

6fa4f10

Rename pseudo_sample_batch -> fake_sample_batch

377bb63

Transpose (confs, boxes) -> zip(confs, boxes) in test

c102b8b

Improve test documentation

62483db

Expect different estimates when parameters are changed slightly

b3551dc

Finish transposing (confs, boxes) -> zip(confs, boxes)

0bb4bcd

Update test_reweighting.py

f5e5eed

Remove todo for redundant test

ae7b58b

Update test_reweighting.py

41425de

Merge branch 'master' into differentiable-reweighting-redux

ffe09ee

Remove tangent about mbar

dfca385

Add test_one_sided_exp

134e0c5

Expand documentation, add references

db7b8ad

Add self-consistency check

05d3cb3

Add more notes and references

d3c5766

Add type annotations

bff4ae5

Merge branch 'master' into differentiable-reweighting-redux

1e6f35e

maxentile and others added 2 commits February 28, 2022 14:41

Fix Array type annotation

f1ad7ab

Co-authored-by: Matt Wittmann <mwittmann@relaytx.com>

Avoid chained assignment in type definition

d0b5659

Applying suggestion from #654 (comment) Co-Authored-By: Matt Wittmann <mcwitt@gmail.com>

mcwitt reviewed Feb 28, 2022

View reviewed changes

timemachine/fe/reweighting.py Show resolved Hide resolved

mcwitt reviewed Feb 28, 2022

View reviewed changes

tests/test_reweighting.py Outdated Show resolved Hide resolved

mcwitt reviewed Feb 28, 2022

View reviewed changes

tests/test_reweighting.py Outdated Show resolved Hide resolved

maxentile added 6 commits February 28, 2022 15:40

Expand one-liner to be more self-documenting

4dd4008

step through sign flips, log conversion, ignored constant prefactor, and implicit rank-promotion (initially inspired by avoiding a mypy complaint about a transpose #654 (comment) )

Merge branch 'master' into differentiable-reweighting-redux

73d474a

Remove atol default, add documentation

dcfbe1d

Addressing #654 (comment)

N_k as np array

59d8a63

Addressing #654 (comment)

Document steps better, fix broadcast bug

9f9971a

Was missing weights in docstring

9f70659

badisa approved these changes Mar 1, 2022

View reviewed changes

proteneer reviewed Mar 1, 2022

View reviewed changes

maxentile added 8 commits March 1, 2022 16:30

Avoid "log_weights" name collision, use more natural sign convention

297f061

Addressing #654 (comment)

docstring typo: array(x_n) -> array[n]

2a95e53

docstring typo: sign-flip

38162b2

Merge branch 'master' into differentiable-reweighting-redux

e60aea0

Document 1D testsystem parameters inline

475a399

Addressing #654 (comment) and #654 (comment)

Add comments about atol

7c8b0f0

Addressing #654 (comment)

Test +inf

722a6d8

Addressing #654 (comment)

proteneer approved these changes Mar 14, 2022

View reviewed changes

Merge branch 'master' into differentiable-reweighting-redux

ca8f534

maxentile mentioned this pull request Mar 14, 2022

Gradient estimator precision #667

Open

mcwitt self-requested a review March 14, 2022 16:02

mcwitt approved these changes Mar 14, 2022

View reviewed changes

maxentile merged commit b5f1302 into master Mar 14, 2022

proteneer added the cr_scipy label Apr 11, 2022

maxentile deleted the differentiable-reweighting-redux branch June 15, 2022 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reweighting utilities #654

Reweighting utilities #654

maxentile commented Feb 26, 2022

proteneer left a comment

proteneer Mar 1, 2022

maxentile Mar 1, 2022

maxentile Mar 2, 2022

maxentile Mar 2, 2022 •

edited

Loading

maxentile Mar 2, 2022 •

edited

Loading

proteneer Mar 14, 2022

maxentile Mar 14, 2022

proteneer Mar 1, 2022

proteneer Mar 1, 2022

proteneer left a comment

mcwitt left a comment

		batched_u_0_fxn: BatchedReducedPotentialFxn,
		batched_u_1_fxn: BatchedReducedPotentialFxn,

Reweighting utilities #654

Reweighting utilities #654

Conversation

maxentile commented Feb 26, 2022

proteneer left a comment

Choose a reason for hiding this comment

proteneer Mar 1, 2022

Choose a reason for hiding this comment

maxentile Mar 1, 2022

Choose a reason for hiding this comment

maxentile Mar 2, 2022

Choose a reason for hiding this comment

maxentile Mar 2, 2022 • edited Loading

Choose a reason for hiding this comment

maxentile Mar 2, 2022 • edited Loading

Choose a reason for hiding this comment

proteneer Mar 14, 2022

Choose a reason for hiding this comment

maxentile Mar 14, 2022

Choose a reason for hiding this comment

proteneer Mar 1, 2022

Choose a reason for hiding this comment

proteneer Mar 1, 2022

Choose a reason for hiding this comment

proteneer left a comment

Choose a reason for hiding this comment

mcwitt left a comment

Choose a reason for hiding this comment

maxentile Mar 2, 2022 •

edited

Loading

maxentile Mar 2, 2022 •

edited

Loading