Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sampler testing framework #318

Open
5 tasks
bob-carpenter opened this issue Oct 23, 2013 · 6 comments
Open
5 tasks

sampler testing framework #318

bob-carpenter opened this issue Oct 23, 2013 · 6 comments

Comments

@bob-carpenter
Copy link
Contributor

We can't get new patches into samplers because there aren't any reliable tests.

We need tests for the samplers for

  • accuracy on means
  • accuracy on variances
  • speed regression tests

We also want to test things that Michael has suggested for HMC like

  • step size * 2 ^ tree_depth is in a range --- how often and what range?

We have to make all these sensitive to the fact that we have MCMC.

  • component testing for mcmc
@ghost ghost assigned syclik Oct 23, 2013
@betanalpha
Copy link
Contributor

We can't get new patches into samplers because there aren't any reliable tests.

We need tests for the samplers for

• accuracy on means
• accuracy on variances
• speed regression tests
We also want to test things that Michael has suggested for HMC like

• step size * 2 ^ tree_depth is in a range --- how often and what range?
We have to make all these sensitive to the fact that we have MCMC.

We have to be careful because, by construction, MCMC is stochastic and not exactly amenable to unit tests
as they are usually defined.

Mean/variance estimation:

Assuming a Monte Carlo CLT we'll still have to worry about the expected randomness. Running an ensemble
of tests and only requiring the expected number pass would help, but also make the tests much more demanding.

That said, iid gaussian and a correlated gaussian are natural first tests.

Adaptation:

Some distributions undercut the usual optimization criteria that we use for adaptation. Hierarchical models like
the funnel are a big example that we might want to test.

The interaction between the distributions and adaptation would require sampler-specific tests, not happy generic tests.
There are some exceptions -- the gaussians mentioned above are "linear" and about as easy to adapt to as possible.

Speed regression tests:

Depends on the machine running the tests, so we can't just define definite thresholds. Is it possible to build up the
testing framework to run examples using two difference tags for comparison?

@bob-carpenter
Copy link
Contributor Author

On 10/23/13 4:45 PM, Michael Betancourt wrote:

We can't get new patches into samplers because there aren't any reliable tests.

We need tests for the samplers for

• accuracy on means
• accuracy on variances
• speed regression tests
We also want to test things that Michael has suggested for HMC like

• step size * 2 ^ tree_depth is in a range --- how often and what range?
We have to make all these sensitive to the fact that we have MCMC.

We have to be careful because, by construction, MCMC is stochastic and not exactly amenable to unit tests
as they are usually defined.

Right. That's why, for example, the RNG tests that Peter
wrote do a very large number of samples and then use a very
liberal threshold for a chi-square test. We have a classical
multiple testing problem where we want to control the false positive
rate.

This is similar to what Andrew calls the "Cook-Gelman-Rubin" approach.

Mean/variance estimation:

Assuming a Monte Carlo CLT we'll still have to worry about the expected randomness. Running an ensemble
of tests and only requiring the expected number pass would help, but also make the tests much more demanding.

Right. That's what we're doing for the RNGs, but those
are much simpler to run multiple times.

That said, iid gaussian and a correlated gaussian are natural first tests.

We mostly want to have tests in place to make sure we didn't mess
anything up badly. Finer-grained performance testing can't be part
of our "unit testing" framework. (Though I do believe Jenkins currently
reports total time for all the tests in a browsable way, not that I've
ever browsed it.)

Adaptation:

Some distributions undercut the usual optimization criteria that we use for adaptation. Hierarchical models like
the funnel are a big example that we might want to test.

The interaction between the distributions and adaptation would require sampler-specific tests, not happy generic tests.
There are some exceptions -- the gaussians mentioned above are "linear" and about as easy to adapt to as possible.

We already have tests that vary configuration (e.g, for number
of iterations) for different models.

Speed regression tests:

Depends on the machine running the tests, so we can't just define definite thresholds. Is it possible to build up the
testing framework to run examples using two difference tags for comparison?

I don't see why not. Daniel's a wizard with Jenkins.

For the foreseeable future, the machine running the tests will
be the Jenkins Windows box. Our latest grant proposal applied
for some more hardware for ongoing testing.

And we can test on our own machines.

  • Bob

@betanalpha
Copy link
Contributor

It's not a matter of varying the parameters but figuring out how they need to be varied. Just warning that because of these
interactions it will be hard to have generic "sampler" tests instead of individually-tuned tests for each sampler.

On Oct 23, 2013, at 10:22 PM, Bob Carpenter notifications@github.com wrote:

We already have tests that vary configuration (e.g, for number
of iterations) for different models.

@syclik
Copy link
Member

syclik commented Oct 24, 2013

I'm ok with individually-tuned tests for each sampler.

@syclik syclik modified the milestones: Future, v2.3.0 May 15, 2014
@syclik syclik removed the C++ API label Sep 19, 2014
@betanalpha
Copy link
Contributor

Testing framework proposed in https://github.com/stan-dev/stan/tree/feature/stat_valid_test -- currently needs to be updated so that the tests can be run without depending on CmdStan.

@syclik
Copy link
Member

syclik commented Nov 30, 2016

@bob-carpenter, this is what we were talking about doing. This will depend on #1751, so I'll branch from there as I start working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

3 participants