Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facilitate streaming (row-by-row update) in ESMDA #162

Merged
merged 9 commits into from
Oct 23, 2023

Conversation

tommyod
Copy link
Collaborator

@tommyod tommyod commented Oct 19, 2023

To support row-by-row (parameter by parameter) computation, either literally by updating a single parameter or updating batches, we should avoid doing duplicate work.
Consider the equation:

$$C_{MD} (C_{DD} + \alpha C_D)^{-1} (D - Y)=X Y^T /(N-1) (C_{DD} + \alpha C_D)^{-1} (D - Y)$$

If we update row by row and introduce a transition matrix $K := Y^T /(N-1) (C_{DD} + \alpha C_D)^{-1} (D - Y) $,
then $[x_1 | x_2 | \ldots]^T K = [x_1 K | x_2 K | \ldots ]^T$. In other words, we can pre-compute $K$, since it's independent on $X$, and apply it row-by-row or group-by-group, instead of re-computing it for each group.


This PR proposes two methods for this:

  • compute_transition_matrix, which computes the transition matrix $K$
  • perturb_observations, called by both compute_transition_matrix and assimilate, to avoid duplicating code

Note: this PR assumes #161 will be merged.


I also realized that we don't have to center both $X$ and $Y$! So I changed the code away from that, saving both memory and time by only centering the smaller matrix $Y$. See this comment. This is also documented in the code with a few lines.

@tommyod tommyod changed the title Create method get_D for ESMDA Facilitate streaming (row-by-row update) in ESMDA Oct 19, 2023
@tommyod tommyod marked this pull request as ready for review October 19, 2023 08:57
@tommyod tommyod requested a review from Blunde1 October 19, 2023 08:58
ans = function(alpha=alpha, C_D=C_D, D=D, Y=Y, X=X)
K = function(alpha=alpha, C_D=C_D, D=D, Y=Y, X=None, return_K=True)

X - np.mean(X, axis=1, keepdims=True)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

Suggested change
X - np.mean(X, axis=1, keepdims=True)
X - np.mean(X, axis=1, keepdims=True)

Copy link
Contributor

@Blunde1 Blunde1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
This is very nice! In particular, see comparison between low-level and high-level API in tests. Good job!
Note for later: the names inversion are slightly missleading, they all involve inversion, but their purpose is either to return K or multiply X with K. Perhaps we can come up with good naming. As discussed, they are not part of the public api, so we may think of it later.

@tommyod tommyod merged commit b526c49 into equinor:main Oct 23, 2023
9 checks passed
@tommyod tommyod deleted the refactor-D branch October 23, 2023 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants