Feature/2569 Analysis API: compute effective sample size #2575

roualdes · 2018-07-09T19:15:36Z

Submission Checklist

Run unit tests: ./runTests.py src/test/unit
Run cpplint: make cpplint
Declare copyright holder and open-source license: see below

Ran tests mentioned below and ./runTests.py src/test/unit/mcmc

Summary

Create new route for unified calculation of effective sample size. Discussion on Discourse and in issue #2569.

Add function src/stan/analyze/mcmc/compute_ess.hpp along with tests.

Add method on chains class in src/stan/mcmc/chains.hpp in an effort to minimize changes necessary for adoption into CmdStan.

Intended Effect

Add unified API for interfaces to compute effective sample size, and maintain backwards compatibility during the transition to the new route.

How to Verify

Tests live in src/stan/test/unit/analyze/mcmc/compte_ess_test.cpp. Run

make clean && ./runTests.py -j2 src/test/unit/analyze/

Side Effects

None intended.

Documentation

Inline, via doxygen.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): California State University, Chico

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

roualdes · 2018-07-09T19:27:22Z

Please note my use of a function template on compute_effective_sample_size. With the template, no interface should need to make copies of draws. Without it, CmdStan will need to make a copy. My understanding is that CmdStan and RStan will have no problems instantiating the templates they need:

CmdStan is able/will instantiate compute_effective_sample_size(std::vector<const double*>, std::vector<size_t> sizes) to reuse the Eigen data structures.

RStan is able/will instantiate either the above or the below signature, as I don't think Rcpp is as unhappy about const as cython is.

Is PyStan able to instantiate compute_effective_sample_size(std::vector<double*>, std::vector<size_t>)? Or could we get away with Stan instantiating it for PyStan, somehow?

@ariddell, would you please opine your ability to deal with my use of a function template?

riddell-stan · 2018-07-09T20:01:31Z

That signature looks fine to me. I don't think there should be any problem calling that function with Cython from PyStan.

roualdes · 2018-07-11T16:49:11Z

10/11 travis checks were successful. The one error just says no output received in the last 10 minutes. I can't find more details on the error. Let me know what I can do to help.

bob-carpenter · 2018-07-12T22:59:44Z

Thanks for the heads up. I restarted the job.

If you don't have credentials for kicking Travis, @seantalts should be able to set you up.

syclik

The biggest thing to change here is the implementation of chains. It shouldn't have picked up a new method. Instead, the old method should have dispatched to your new function.

syclik · 2018-07-24T14:08:21Z

src/stan/analyze/mcmc/compute_ess.hpp

+     * @param std::vector stores sizes of chains
+     * @return effective sample size for the specified parameter
+     */
+    template<typename T>


Why is this templated? It looks like on the issue and looking at the implementation, this can be removed and the type of the input be double *. I don't know if it needs to be templated for something like RStan to use well, but if so, please document it.

If leaving the template parameter in, please add the template parameter documentation using @tparam. See: https://github.com/stan-dev/stan/wiki/How-to-Write-Doxygen-Doc-Comments#function-doc

Thanks for asking, as I was hoping for feedback on this. I addressed my reasoning behind templating this function in a comment immediately following this PR's initial comment. I've come to learn that you'd prefer such details/requests for feedback in the PR's initial comment.

The short story is that CmdStan won't have to make any copies if it can use compute_effective_sample_size(std::vector<const double*>, std::vector<size_t> sizes), but everyone agreed to compute_effective_sample_size(std::vector<double*>, std::vector<size_t> sizes), for which CmdStan will necessitate a copy. In an attempt to satisfy all requests of our previous discussions, I templated this function.

If RStan and PyStan can both call either signature, then I'd recommend we change the type of the first argument from std::vector<double*> to std::vector<const double*> and drop the template.

What do you think?

Thanks for bringing this up! We should use const double *. I think it should actually be const double * const. The pointers don't change and we never mutate the elements.

CmdStan should be able to handle it. We'll figure out how to make RStan and PyStan work with that signature, even if it means casting the consts away.

The const double * const is a good thought and I agree with the reasoning behind it, but I don't know how to deal with that type here. Wouldn't this necessitate initialization at declaration, since it won't allow modification later? For instance, I don't see a reasonable way get the rvalues of line 616 in src/stan/mcmc/chains.hpp into the (would be) container std::vector<const double * const> draws.

syclik · 2018-07-24T14:12:05Z

src/stan/analyze/mcmc/compute_ess.hpp

+      Eigen::VectorXd chain_mean(num_chains);
+      Eigen::VectorXd chain_var(num_chains);
+      for (size_t chain = 0; chain < num_chains; ++chain) {
+        Eigen::Map<const Eigen::Matrix<double, Eigen::Dynamic, 1> >


With C++11, we don't need to have the space between the two right angle brackets (> >). Please remove the space between the two.

syclik · 2018-07-24T14:17:10Z

src/stan/analyze/mcmc/compute_ess.hpp

@@ -0,0 +1,100 @@
+#ifndef STAN_ANALYZE_ESS_HPP


Please rename the file to match the function name: compute_effective_sample_size.hpp.

Please update the header guard to match the file structure exactly: STAN_ANALYZE_MCMC_COMPUTE_EFFECTIVE_SAMPLE_SIZE_HPP.

syclik · 2018-07-24T14:21:48Z

src/stan/analyze/mcmc/compute_ess.hpp

+    template<typename T>
+    double compute_effective_sample_size(std::vector<T> draws, size_t size) {
+      size_t num_chains = draws.size();
+      std::vector<size_t> sizes(num_chains);


This is a lot easier with C++11! This can just be:

std::vector<size_t> sizes(num_chains, size);

and you can remove the loop!

syclik · 2018-07-24T14:25:10Z

src/stan/mcmc/chains.hpp

@@ -701,6 +702,20 @@ namespace stan {
        return effective_sample_size(index(name));
      }

+      double compute_effective_sample_size(const int index) const {


So... this isn't what should happen. Remove this function signature. Replace the existing class method, effective_sample_size() (line 230), with a dispatch to the function you've written.

We'll replace each part of this class in that fashion where all the work is done outside. Then remove the class itself later.

I cleaned up chains.hpp as requested.

Should I also remove the effective sample size tests in src/test/unit/mcmc/chain_test.cpp, so as to rely on the tests in src/test/unit/analyze/mcmc/compute_effective_sample_size.cpp?

That would be ideal! Thanks! The test in chains should just check that it's callable, not really check for the validity of the value that's returned.

If you don't want to do that, no prob.

RE check that it's callable. Is something like this satisfactory within src/test/unit/mcmc/chains_test.cpp?

EXPECT_NO_THROW(chains.effective_sample_size(1.0)) << "calling chain.effective_sample_size(index = 1.0).";

roualdes · 2018-07-25T12:52:42Z

Thanks for the feedback. Please give me some time to address your comments as I don’t have access to a computer until Tuesday.

syclik · 2018-07-25T13:05:42Z

No problem! Sorry for the delay on getting it reviewed. Hopefully we can keep up with the queue a little faster.

…re/issue-2569-analysis-api-ess

syclik · 2018-07-31T22:19:58Z

Yup! Except that should be the index 1 instead of 1.0? If there's some error condition it should trigger (index < 0 or greater than the max) that's currently tested, that's should be kept around. If it's not there, it's ok not to add it.

…

On Tue, Jul 31, 2018 at 5:47 PM Edward A. Roualdes ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/stan/mcmc/chains.hpp <#2575 (comment)>: > @@ -701,6 +702,20 @@ namespace stan { return effective_sample_size(index(name)); } + double compute_effective_sample_size(const int index) const { RE check that it's callable. Is something like this satisfactory within src/test/unit/mcmc/chains_test.cpp? EXPECT_NO_THROW(chains.effective_sample_size(1.0)) << "calling chain.effective_sample_size(index = 1.0)."; — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2575 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAZ_F-7RzxUQU_fmjZpsLeWFfYev92sSks5uMNBJgaJpZM4VIPkB> .

…ream rather than my fork

…n's method effective_sample_size

roualdes · 2018-08-02T03:16:03Z

I'm trying to debug this error from jenkins, but failing. The error message on jenkins provides enough for me to locate the error -- something following

bin/stansummary src/test/interface/matrix_output.csv

but not enough info to move from there. I would like to recreate the above command relative to this branch. When I execute the following, this branch isn't available.

git clone --recursive https://github.com/stan-dev/cmdstan.git
cd stan
# git checkout feature/issue-2569-analysis-api-ess # branch not available

Is this branch not available because it's coming from my fork of stan-dev/stan? Are my attempts at git off base? Is this idea possible?

@syclik Any other tips or suggestions you might have to help me move forward would be much appreciated. Thanks.

syclik · 2018-08-02T03:19:20Z

After a series of clicks (which I can’t remember), I got to this: http://d1m1s1b1.stat.columbia.edu:8080/blue/organizations/jenkins/CmdStan/detail/downstream_tests/47/pipeline Does that help?

…

On Wed, Aug 1, 2018 at 11:16 PM Edward A. Roualdes ***@***.***> wrote: I'm trying to debug this error from jenkins, but failing. The error message on jenkins provides enough for me to locate the error -- something following bin/stansummary src/test/interface/matrix_output.csv but not enough info to move from there. I would like to recreate the above command relative to this branch. When I execute the following, this branch isn't available. git clone --recursive https://github.com/stan-dev/cmdstan.git cd stan # git checkout feature/issue-2569-analysis-api-ess # branch not available Is this branch not available because it's coming from my fork of stan-dev/stan? Are my attempts at git off base? Is this idea possible? @syclik <https://github.com/syclik> Any other tips or suggestions you might have to help me move forward would be much appreciated. Thanks. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2575 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAZ_F7-lJj2r8aeu4UK8PU4DNfux0I5-ks5uMm70gaJpZM4VIPkB> .

seantalts · 2018-08-02T03:33:09Z

When you click on the failing build from github, you get to this page. On that page you can see that the upstream tests failed. At the bottom of the page is a link to the failing upstream tests, awkwardly named "downstream tests" (the red X should draw your attention). Clicking that brings you here, which is where Daniel's link goes. Looks like the interface tests failed on all platforms (we should maybe stage those so they aren't all executing in parallel, haha).

roualdes · 2018-08-02T14:54:55Z

Thanks to you both for your quick responses.

I (previously) made it as far you both suggest.

The error comes when the test, named CommandStansummary.functional_test__issue_342, issues the following command: bin/stansummary src/test/interface/matrix_output.csv. I don't understand why this command is failing.

As I see it, either my code isn't building correctly and thus CmdStan can't find the function stan::analyze::compute_effective_sample_size, or something is wrong with my code.

In an effort to learn why this fails, I'd like to run the same CmdStan command on my personal machine: bin/stansummary src/test/interface/matrix_output.csv. My last comment offers what git commands did not work for me. Is there any git magic I can use to build CmdStan from this (forked) branch of stan-dev/stan?

If this thought process is off target, please don't hesitate to say so.

seantalts · 2018-08-02T15:01:17Z

I see, sorry. There are failing tests and that's why the command quit with an error.

re: testing CmdStan with your branch - Cmdstan has Stan and Math as submodules. Clone your fork of cmdstan with --recursive and then cd stan; git checkout <my_branch>. If you want to work from our fork of CmdStan but your fork of Stan, you can add your fork as a remote.

roualdes · 2018-08-02T15:58:15Z

That did it. Thanks @seantalts for the tip about remotes. I'll report back when I have something to offer about this error.

roualdes · 2018-08-02T21:20:37Z

The error: I had previously used size_t frequently, instead of int. This caused problems for the code chunk following the comment "Geyer's initial monotone sequence" in src/stan/analyze/mcmc/compute_effective_sample_size.hpp. According to the Google style guide, even though size_t is not disallowed, int is much preferred.

The fix: replace size_t with int in most places.

Note that neither stan-dev/stan:develop nor this PR's tests caught this. Is it worth adding a test?

@syclik Before this gets final approval, did you see my comment about the types const double * const and const double*?

syclik · 2018-08-03T14:11:24Z

Sorry -- I missed it the first time around. Here's the last comment:

The const double * const is a good thought and I agree with the reasoning behind it, but I don't know how to deal with that type here. Wouldn't this necessitate initialization at declaration, since it won't allow modification later? For instance, I don't see a reasonable way get the rvalues of line 616 in src/stan/mcmc/chains.hpp into the (would be) container std::vector<const double * const> draws.

Please ignore the const double * const suggestion. I think it's going to be practically difficult for reasons you've seen.

Thanks!

roualdes · 2018-08-03T16:19:59Z

Thanks all for your help on this. I especially appreciate your patience with me as I learn and gain experience in Stan and C++11.

bob-carpenter · 2018-08-04T10:33:31Z

On Aug 3, 2018, at 6:20 PM, Edward A. Roualdes ***@***.***> wrote: Thanks all for your help on this. I especially appreciate your patience with me as I learn and gain experience in Stan and C++11.

We try to prioritize helping new devs get across the first merged PR hurdle. After that, you're in the same boat as the rest of us trying to keep up with the moving C++ standard and best practices; we all need to help each other on that.

first attempt compute n_eff; template based might not be desirable

a739c36

roualdes mentioned this pull request Jul 9, 2018

Analysis API: compute effective sample size #2569

Closed

syclik requested changes Jul 24, 2018

View reviewed changes

roualdes added 4 commits July 31, 2018 12:38

change file and header guard; clean up chains.hpp

329fc95

Merge branch 'develop' of https://github.com/roualdes/stan into featu…

fb0e982

…re/issue-2569-analysis-api-ess

remove old files

7c4f34e

Merge branch 'develop' of https://github.com/stan-dev/stan into featu…

ea5d79d

…re/issue-2569-analysis-api-ess

roualdes added 2 commits July 31, 2018 17:28

drop template in favor of std::vector<const double*>; merge from upst…

a1e80b2

…ream rather than my fork

fix typo in error message; mimic validation checks on overloaded chai…

a4e2f12

…n's method effective_sample_size

replace size_t with in, save when warned about comparison issues.

59ba66e

syclik approved these changes Aug 3, 2018

View reviewed changes

syclik merged commit 11d5b7b into stan-dev:develop Aug 3, 2018

roualdes mentioned this pull request Aug 9, 2018

effective_sample_size() to POD types for RStan/PyStan usage #2462

Closed

avehtari mentioned this pull request Apr 21, 2019

rhat and ess updates in analyze #2752

Open

6 tasks

roualdes mentioned this pull request Aug 6, 2019

Feature/issue 2752 update rhat and ess (part II) #2794

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/2569 Analysis API: compute effective sample size #2575

Feature/2569 Analysis API: compute effective sample size #2575

roualdes commented Jul 9, 2018

roualdes commented Jul 9, 2018

riddell-stan commented Jul 9, 2018

roualdes commented Jul 11, 2018

bob-carpenter commented Jul 12, 2018

syclik left a comment

syclik Jul 24, 2018

roualdes Jul 31, 2018

syclik Jul 31, 2018

syclik Jul 31, 2018

roualdes Jul 31, 2018

syclik Jul 24, 2018

syclik Jul 24, 2018

syclik Jul 24, 2018

syclik Jul 24, 2018

roualdes Jul 31, 2018

syclik Jul 31, 2018

syclik Jul 31, 2018

roualdes Jul 31, 2018

roualdes commented Jul 25, 2018

syclik commented Jul 25, 2018

syclik commented Jul 31, 2018 via email

roualdes commented Aug 2, 2018

syclik commented Aug 2, 2018 via email

seantalts commented Aug 2, 2018

roualdes commented Aug 2, 2018

seantalts commented Aug 2, 2018

roualdes commented Aug 2, 2018

roualdes commented Aug 2, 2018

syclik commented Aug 3, 2018

roualdes commented Aug 3, 2018

bob-carpenter commented Aug 4, 2018 via email

Feature/2569 Analysis API: compute effective sample size #2575

Feature/2569 Analysis API: compute effective sample size #2575

Conversation

roualdes commented Jul 9, 2018

Submission Checklist

Summary

Intended Effect

How to Verify

Side Effects

Documentation

Copyright and Licensing

roualdes commented Jul 9, 2018

riddell-stan commented Jul 9, 2018

roualdes commented Jul 11, 2018

bob-carpenter commented Jul 12, 2018

syclik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roualdes commented Jul 25, 2018

syclik commented Jul 25, 2018

syclik commented Jul 31, 2018 via email

roualdes commented Aug 2, 2018

syclik commented Aug 2, 2018 via email

seantalts commented Aug 2, 2018

roualdes commented Aug 2, 2018

seantalts commented Aug 2, 2018

roualdes commented Aug 2, 2018

roualdes commented Aug 2, 2018

syclik commented Aug 3, 2018

roualdes commented Aug 3, 2018

bob-carpenter commented Aug 4, 2018 via email