New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test for valid use of (S)NPE API #24
Conversation
btw, when run, this unit tests results in:
which is related to #15 I guess |
[ | ||
(task_name, num_observation) | ||
for task_name in ["gaussian_linear", "slcp",] | ||
for num_observation in random.sample(range(1, 11), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer keeping the tests deterministic, i.e., let's just run them on observation 1 and 2 only if 10 is too slow. How long does this test take on your machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 2 observations and 2.500 simulated samples per task, all 6 tests in total run for 220s on my box. That is still quite long and mostly due to NPE running epochs. In this case, this is a hen-and-egg problem: we balance simulation budget versus expressiveness of the predictor.
I'd propose to ditch the SLCP task or at least one gaussian task. Limiting the number of epochs would be most ideal - but impossible I guess at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I boiled it down to 2 tasks and about 140s to complete (9e91818). Still quite long. I guess the central question is, what we want to achieve with this test. If it is really only about the interface, then one task with one observation is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we only want to do interface testing as part of sbibm
. I have just merged a PR (#33) that adds an argument to be able to set max_num_epochs
explicitly. How about running the test for a low maximum number of epochs and only checking that the shape of the return is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, these tests are now going at 30s on my box. That is still long, but at least c2st
appears to return something meaningful. e642d77
8b0196b
to
667092b
Compare
- ditch slcp task (posterior would need more samples and epochs to train) - 2000 sim samples and 100 observations appear to be valid choices - runs in ~140s on my box
6688571
to
e642d77
Compare
Cheers! |
This PR attempts a solution to #23 without (yet) introducing test categories a la
@pytest.mark.slow
(see https://github.com/mackelab/sbi/blob/86256e02c1080965795e65062c4ab9d3a19015d2/tests/linearGaussian_snpe_test.py#L196)