added z-scoring method for structured data (e.g., time series) #597

rdgao · 2022-01-24T18:34:52Z

standardizing_net() and standardizing_transform() now both have options to perform z-scoring for structured data, i.e., to compute mean and std for each sample first, then taking the global mean to be used for z-scoring the batch (instead of z-scoring each dimension independently):

x_mean = torch.mean(x)
x_std = torch.mean(torch.std(x, dim=1))

Re: #570

michaeldeistler · 2022-01-24T18:38:08Z

Is this ready for review?

rdgao · 2022-01-24T18:41:19Z

oops i should've written here:

no not yet:

it still needs tests
I'm not sure what to do with sbi.analysis.sensitivity_analysis.Destandardize

otherwise yes

michaeldeistler · 2022-01-24T18:50:40Z

Regarding sbi.analysis.sensitivity_analysis.Destandardize. Please make sure that it still runs and throws no error (I think it should be fine?).

You do not have to apply the changes made in this PR to sbi.analysis.sensitivity_analysis.Destandardize. It only accepts scalar x so there's nothing to change.

rdgao · 2022-01-26T19:11:35Z

good to review now @michaeldeistler

michaeldeistler

Great, thanks a lot! Left minor nitpicks regarding docstrings in the comments.

One more request: could you modify one of the snpe tests to use "structured" z-scoring? E.g. this one. And make sure to put a comment like "Test whether SNPE works properly with structured z-scoring".

michaeldeistler · 2022-01-27T07:02:33Z

sbi/neural_nets/classifier.py

+            - `none`, None: do not z-score
+            - `independent`: z-score each dimension independently
+            - `structured`: treat dimensions as related, therefore compute mean and std
+            over the entire batch, instead of per-dimension.


Maybe add something like. Should be used when the data are time series or an image

michaeldeistler · 2022-01-27T07:03:43Z

sbi/neural_nets/classifier.py

+            - `independent`: z-score each dimension independently
+            - `structured`: treat dimensions as related, therefore compute mean and std
+            over the entire batch, instead of per-dimension.
+        z_score_y: Whether to z-score ys passing into the network, same as z_score_x.


same options as z_score_x

michaeldeistler · 2022-01-27T07:05:10Z

sbi/utils/get_nn_models.py

+            - `none`, None: do not z-score
+            - `independent`: z-score each dimension independently
+            - `structured`: treat dimensions as related, therefore compute mean and std
+            over the entire batch, instead of per-dimension.


The docstring comments i made above especially apply to these user-facing methods

michaeldeistler · 2022-01-27T07:07:17Z

sbi/utils/sbiutils.py

+
+    Args:
+        z_score_flag: str flag for z-scoring method stating whether the data
+        dimensions are "structured" or "independent", or does not require z-scoring


indent second and third line

michaeldeistler · 2022-01-27T07:08:17Z

sbi/utils/sbiutils.py

+    if type(z_score_flag) is bool:
+        # Raise warning if boolean was passed.
+        warnings.warn(
+            """Boolean flag for z-scoring is accepted for backwards compatibility only.


Boolean flag for z-scoring is deprecated as of sbi v0.18.0. It will be removed in a future release. Use 'none', 'independent', or 'structured' to indicate z-scoring option.

michaeldeistler · 2022-01-27T07:09:04Z

sbi/utils/sbiutils.py

+        structured_data = True if z_score_flag == "structured" else False
+
+    else:
+        # Return warning due to invalid option, defaults to not z-scoring.


Please raise a ValueError here.

rdgao · 2022-01-27T10:31:16Z

tests/linearGaussian_snpe_test.py

@@ -416,7 +416,7 @@ def simulator(theta):
        else:
            return linear_gaussian(theta, -likelihood_shift, likelihood_cov)

-    net = utils.posterior_nn("maf", hidden_features=20)
+    net = utils.posterior_nn("maf", z_score_x="structured", hidden_features=20)


@michaeldeistler I changed this one to use structured z-scoring, is this sufficient?

hm the sample_conditional might not be the perfect place for that. Maybe use some other function that actually tests the posterior (not the conditional posterior)

sounds good, i put it in def test_c2st_multi_round_snpe_on_linearGaussian() because it already has a posterior_nn call, instead of test_c2st_snpe_on_linearGaussian_different_dims() which calls SNPE_C() directly

rdgao · 2022-01-27T11:05:37Z

just realized, when you call SNPE_C directly, for example, it takes the default value for z-scoring, which is now 'independent' but does the same as before, but there's no way to specify 'structured'. Is that fine?

codecov-commenter · 2022-01-27T11:21:36Z

Codecov Report

Merging #597 (0a0ac3d) into main (b8551d0) will increase coverage by 1.60%.
The diff coverage is 92.85%.

@@            Coverage Diff             @@
##             main     #597      +/-   ##
==========================================
+ Coverage   66.77%   68.37%   +1.60%     
==========================================
  Files          67       67              
  Lines        4199     4285      +86     
==========================================
+ Hits         2804     2930     +126     
+ Misses       1395     1355      -40

Flag	Coverage Δ
unittests	`68.37% <92.85%> (+1.60%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
sbi/utils/__init__.py	`100.00% <ø> (ø)`
sbi/utils/plot.py	`47.21% <0.00%> (ø)`
sbi/utils/restriction_estimator.py	`14.42% <0.00%> (-0.07%)`	⬇️
sbi/utils/sbiutils.py	`78.16% <96.55%> (+4.38%)`	⬆️
sbi/neural_nets/classifier.py	`100.00% <100.00%> (+13.95%)`	⬆️
sbi/neural_nets/flow.py	`86.07% <100.00%> (+44.98%)`	⬆️
sbi/neural_nets/mdn.py	`100.00% <100.00%> (ø)`
sbi/utils/get_nn_models.py	`85.41% <100.00%> (+8.33%)`	⬆️
sbi/inference/posteriors/vi_posterior.py	`66.66% <0.00%> (-15.95%)`	⬇️
sbi/inference/posteriors/rejection_posterior.py	`37.50% <0.00%> (-8.16%)`	⬇️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8551d0...0a0ac3d. Read the comment docs.

michaeldeistler · 2022-01-27T17:33:45Z

Yes that's fine

…h structured or independent dimensions

… calls to use the new options

… optioin

… option

michaeldeistler requested changes Jan 27, 2022

View reviewed changes

rdgao commented Jan 27, 2022

View reviewed changes

rdgao added 10 commits January 27, 2022 18:50

sbiutils.standardizing_net now takes flag whether to z-score data wit…

e551947

…h structured or independent dimensions

added util function to parse z-score flag, and updated all classifier…

444618e

… calls to use the new options

changed standardizing_transformed to also accept structured z-scoring…

c5249e0

… optioin

updated all flow and mdn NN calls to incorporate structured z-scoring…

b037265

… option

updated restriction estimator to use structured z-scoring

cc972fb

fixed utils import now tests run

d81e77e

implemented tests for z-scoring options

20e3b97

updated docstrings

3ba0662

added structured z-score test to c2st for snpe

b4e0137

fixing merge conflicts again

8c19049

rdgao force-pushed the structured_y branch from 07ab847 to 8c19049 Compare January 27, 2022 17:52

michaeldeistler linked an issue Jan 27, 2022 that may be closed by this pull request

Our z-scoring can really mess things up for structured data e.g. time series. #570

Closed

fixing merge conflicts again

ae9942b

michaeldeistler merged commit a7195e1 into main Jan 28, 2022

michaeldeistler deleted the structured_y branch January 28, 2022 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added z-scoring method for structured data (e.g., time series) #597

added z-scoring method for structured data (e.g., time series) #597

rdgao commented Jan 24, 2022

michaeldeistler commented Jan 24, 2022

rdgao commented Jan 24, 2022

michaeldeistler commented Jan 24, 2022

rdgao commented Jan 26, 2022

michaeldeistler left a comment •

edited

michaeldeistler Jan 27, 2022

michaeldeistler Jan 27, 2022

michaeldeistler Jan 27, 2022

michaeldeistler Jan 27, 2022

michaeldeistler Jan 27, 2022

michaeldeistler Jan 27, 2022

rdgao Jan 27, 2022

michaeldeistler Jan 27, 2022

rdgao Jan 27, 2022

rdgao commented Jan 27, 2022

codecov-commenter commented Jan 27, 2022

michaeldeistler commented Jan 27, 2022 •

edited

added z-scoring method for structured data (e.g., time series) #597

added z-scoring method for structured data (e.g., time series) #597

Conversation

rdgao commented Jan 24, 2022

michaeldeistler commented Jan 24, 2022

rdgao commented Jan 24, 2022

michaeldeistler commented Jan 24, 2022

rdgao commented Jan 26, 2022

michaeldeistler left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdgao commented Jan 27, 2022

codecov-commenter commented Jan 27, 2022

Codecov Report

michaeldeistler commented Jan 27, 2022 • edited

michaeldeistler left a comment •

edited

michaeldeistler commented Jan 27, 2022 •

edited