Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test for valid use of (S)NPE API #24

Merged
merged 6 commits into from Nov 12, 2021

Conversation

psteinb
Copy link
Contributor

@psteinb psteinb commented Nov 5, 2021

This PR attempts a solution to #23 without (yet) introducing test categories a la @pytest.mark.slow (see https://github.com/mackelab/sbi/blob/86256e02c1080965795e65062c4ab9d3a19015d2/tests/linearGaussian_snpe_test.py#L196)

@psteinb
Copy link
Contributor Author

psteinb commented Nov 5, 2021

btw, when run, this unit tests results in:

        # For torch < 1.8 log_abs_det_jacobian is returned for each dimension.
        if vals.ndim > 1 and vals.shape[1] == dim_parameters:
            vals = vals.sum(-1)
    
>       assert (
            vals.numel == batch_size
        ), "Mismatch in batch size, took sum over whole batch?"
E       AssertionError: Mismatch in batch size, took sum over whole batch?

../../sbibm/utils/torch.py:105: AssertionError

which is related to #15 I guess

@jan-matthis
Copy link
Contributor

which is related to #15 I guess

Yes, this is due to #15. @janfb will fix this beginning of next week, that will iron it out

[
(task_name, num_observation)
for task_name in ["gaussian_linear", "slcp",]
for num_observation in random.sample(range(1, 11), 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer keeping the tests deterministic, i.e., let's just run them on observation 1 and 2 only if 10 is too slow. How long does this test take on your machine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 2 observations and 2.500 simulated samples per task, all 6 tests in total run for 220s on my box. That is still quite long and mostly due to NPE running epochs. In this case, this is a hen-and-egg problem: we balance simulation budget versus expressiveness of the predictor.
I'd propose to ditch the SLCP task or at least one gaussian task. Limiting the number of epochs would be most ideal - but impossible I guess at the moment.

Copy link
Contributor Author

@psteinb psteinb Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I boiled it down to 2 tasks and about 140s to complete (9e91818). Still quite long. I guess the central question is, what we want to achieve with this test. If it is really only about the interface, then one task with one observation is enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we only want to do interface testing as part of sbibm. I have just merged a PR (#33) that adds an argument to be able to set max_num_epochs explicitly. How about running the test for a low maximum number of epochs and only checking that the shape of the return is correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, these tests are now going at 30s on my box. That is still long, but at least c2st appears to return something meaningful. e642d77

tests/algorithms/test_snpe_posterior.py Outdated Show resolved Hide resolved
@jan-matthis jan-matthis changed the title SNPE tests for sbi in order to check valid use of API Test for valid use of (S)NPE API Nov 12, 2021
@jan-matthis jan-matthis merged commit a9c20c8 into sbi-benchmark:main Nov 12, 2021
@jan-matthis
Copy link
Contributor

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants