Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added z-scoring method for structured data (e.g., time series) #597

Merged
merged 11 commits into from
Jan 28, 2022

Conversation

rdgao
Copy link
Contributor

@rdgao rdgao commented Jan 24, 2022

standardizing_net() and standardizing_transform() now both have options to perform z-scoring for structured data, i.e., to compute mean and std for each sample first, then taking the global mean to be used for z-scoring the batch (instead of z-scoring each dimension independently):

x_mean = torch.mean(x)
x_std = torch.mean(torch.std(x, dim=1))

Re: #570

@michaeldeistler
Copy link
Contributor

Is this ready for review?

@rdgao
Copy link
Contributor Author

rdgao commented Jan 24, 2022

oops i should've written here:

no not yet:

  • it still needs tests
  • I'm not sure what to do with sbi.analysis.sensitivity_analysis.Destandardize

otherwise yes

@michaeldeistler
Copy link
Contributor

Regarding sbi.analysis.sensitivity_analysis.Destandardize. Please make sure that it still runs and throws no error (I think it should be fine?).

You do not have to apply the changes made in this PR to sbi.analysis.sensitivity_analysis.Destandardize. It only accepts scalar x so there's nothing to change.

@rdgao
Copy link
Contributor Author

rdgao commented Jan 26, 2022

good to review now @michaeldeistler

Copy link
Contributor

@michaeldeistler michaeldeistler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks a lot! Left minor nitpicks regarding docstrings in the comments.

One more request: could you modify one of the snpe tests to use "structured" z-scoring? E.g. this one. And make sure to put a comment like "Test whether SNPE works properly with structured z-scoring".

- `none`, None: do not z-score
- `independent`: z-score each dimension independently
- `structured`: treat dimensions as related, therefore compute mean and std
over the entire batch, instead of per-dimension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add something like. Should be used when the data are time series or an image

- `independent`: z-score each dimension independently
- `structured`: treat dimensions as related, therefore compute mean and std
over the entire batch, instead of per-dimension.
z_score_y: Whether to z-score ys passing into the network, same as z_score_x.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same options as z_score_x

- `none`, None: do not z-score
- `independent`: z-score each dimension independently
- `structured`: treat dimensions as related, therefore compute mean and std
over the entire batch, instead of per-dimension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring comments i made above especially apply to these user-facing methods


Args:
z_score_flag: str flag for z-scoring method stating whether the data
dimensions are "structured" or "independent", or does not require z-scoring
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent second and third line

if type(z_score_flag) is bool:
# Raise warning if boolean was passed.
warnings.warn(
"""Boolean flag for z-scoring is accepted for backwards compatibility only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean flag for z-scoring is deprecated as of sbi v0.18.0. It will be removed in a future release. Use 'none', 'independent', or 'structured' to indicate z-scoring option.

structured_data = True if z_score_flag == "structured" else False

else:
# Return warning due to invalid option, defaults to not z-scoring.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please raise a ValueError here.

@@ -416,7 +416,7 @@ def simulator(theta):
else:
return linear_gaussian(theta, -likelihood_shift, likelihood_cov)

net = utils.posterior_nn("maf", hidden_features=20)
net = utils.posterior_nn("maf", z_score_x="structured", hidden_features=20)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaeldeistler I changed this one to use structured z-scoring, is this sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm the sample_conditional might not be the perfect place for that. Maybe use some other function that actually tests the posterior (not the conditional posterior)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, i put it in def test_c2st_multi_round_snpe_on_linearGaussian() because it already has a posterior_nn call, instead of test_c2st_snpe_on_linearGaussian_different_dims() which calls SNPE_C() directly

@rdgao
Copy link
Contributor Author

rdgao commented Jan 27, 2022

just realized, when you call SNPE_C directly, for example, it takes the default value for z-scoring, which is now 'independent' but does the same as before, but there's no way to specify 'structured'. Is that fine?

@codecov-commenter
Copy link

Codecov Report

Merging #597 (0a0ac3d) into main (b8551d0) will increase coverage by 1.60%.
The diff coverage is 92.85%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #597      +/-   ##
==========================================
+ Coverage   66.77%   68.37%   +1.60%     
==========================================
  Files          67       67              
  Lines        4199     4285      +86     
==========================================
+ Hits         2804     2930     +126     
+ Misses       1395     1355      -40     
Flag Coverage Δ
unittests 68.37% <92.85%> (+1.60%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sbi/utils/__init__.py 100.00% <ø> (ø)
sbi/utils/plot.py 47.21% <0.00%> (ø)
sbi/utils/restriction_estimator.py 14.42% <0.00%> (-0.07%) ⬇️
sbi/utils/sbiutils.py 78.16% <96.55%> (+4.38%) ⬆️
sbi/neural_nets/classifier.py 100.00% <100.00%> (+13.95%) ⬆️
sbi/neural_nets/flow.py 86.07% <100.00%> (+44.98%) ⬆️
sbi/neural_nets/mdn.py 100.00% <100.00%> (ø)
sbi/utils/get_nn_models.py 85.41% <100.00%> (+8.33%) ⬆️
sbi/inference/posteriors/vi_posterior.py 66.66% <0.00%> (-15.95%) ⬇️
sbi/inference/posteriors/rejection_posterior.py 37.50% <0.00%> (-8.16%) ⬇️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8551d0...0a0ac3d. Read the comment docs.

@michaeldeistler
Copy link
Contributor

michaeldeistler commented Jan 27, 2022

Yes that's fine

@michaeldeistler michaeldeistler merged commit a7195e1 into main Jan 28, 2022
@michaeldeistler michaeldeistler deleted the structured_y branch January 28, 2022 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Our z-scoring can really mess things up for structured data e.g. time series.
3 participants