Facilitate streaming (row-by-row update) in ESMDA #162

tommyod · 2023-10-19T05:39:59Z

To support row-by-row (parameter by parameter) computation, either literally by updating a single parameter or updating batches, we should avoid doing duplicate work.
Consider the equation:

$$C_{MD} (C_{DD} + \alpha C_D)^{-1} (D - Y)=X Y^T /(N-1) (C_{DD} + \alpha C_D)^{-1} (D - Y)$$

If we update row by row and introduce a transition matrix $K := Y^T /(N-1) (C_{DD} + \alpha C_D)^{-1} (D - Y) $,
then $[x_1 | x_2 | \ldots]^T K = [x_1 K | x_2 K | \ldots ]^T$. In other words, we can pre-compute $K$, since it's independent on $X$, and apply it row-by-row or group-by-group, instead of re-computing it for each group.

This PR proposes two methods for this:

compute_transition_matrix, which computes the transition matrix $K$
perturb_observations, called by both compute_transition_matrix and assimilate, to avoid duplicating code

Note: this PR assumes #161 will be merged.

I also realized that we don't have to center both $X$ and $Y$! So I changed the code away from that, saving both memory and time by only centering the smaller matrix $Y$. See this comment. This is also documented in the code with a few lines.

tommyod · 2023-10-23T06:44:28Z

tests/test_esmda_inversion.py

+        ans = function(alpha=alpha, C_D=C_D, D=D, Y=Y, X=X)
+        K = function(alpha=alpha, C_D=C_D, D=D, Y=Y, X=None, return_K=True)
+
+        X - np.mean(X, axis=1, keepdims=True)


Remove this

Suggested change

X - np.mean(X, axis=1, keepdims=True)

X - np.mean(X, axis=1, keepdims=True)

Blunde1

LGTM.
This is very nice! In particular, see comparison between low-level and high-level API in tests. Good job!
Note for later: the names inversion are slightly missleading, they all involve inversion, but their purpose is either to return K or multiply X with K. Perhaps we can come up with good naming. As discussed, they are not part of the public api, so we may think of it later.

tommyod added 6 commits October 19, 2023 07:13

removed 'ensemble_mask' argument

083476f

create method get_D

a0edab1

test failing - unsure why

64e04e0

resolved merge conflict

7a9c3e7

updated docstrings

5a77c23

more descriptive method names

a1c26fc

tommyod changed the title ~~Create method get_D for ESMDA~~ Facilitate streaming (row-by-row update) in ESMDA Oct 19, 2023

tommyod marked this pull request as ready for review October 19, 2023 08:57

tommyod requested a review from Blunde1 October 19, 2023 08:58

realized that we don't need to center X

5992306

tommyod commented Oct 23, 2023

View reviewed changes

Blunde1 approved these changes Oct 23, 2023

View reviewed changes

Tommy Odland added 2 commits October 23, 2023 11:12

resolved merge conflict

f72bacf

remove unused centering in test

4dfa391

tommyod merged commit b526c49 into equinor:main Oct 23, 2023
9 checks passed

tommyod deleted the refactor-D branch October 23, 2023 09:34

This was referenced Oct 23, 2023

Reduce memory requirements #160

Closed

ESMDA - multiple calls to assimilate #154

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facilitate streaming (row-by-row update) in ESMDA #162

Facilitate streaming (row-by-row update) in ESMDA #162

tommyod commented Oct 19, 2023 •

edited

Loading

tommyod Oct 23, 2023

Blunde1 left a comment

	X - np.mean(X, axis=1, keepdims=True)
	X - np.mean(X, axis=1, keepdims=True)

Facilitate streaming (row-by-row update) in ESMDA #162

Facilitate streaming (row-by-row update) in ESMDA #162

Conversation

tommyod commented Oct 19, 2023 • edited Loading

tommyod Oct 23, 2023

Choose a reason for hiding this comment

Blunde1 left a comment

Choose a reason for hiding this comment

tommyod commented Oct 19, 2023 •

edited

Loading