Don't do inverse metric decomposition every draw #2894

bbbales2 · 2020-03-06T23:23:35Z

Fix for Issue #2881 .

This involved switching to setter/getters for interfacing with dense_e_point so I made the change for diag_e_point as well.

I also changed all the set_metric verbage to set_inv_metric.

I didn't know what tests needed added for this. Lemme know.

Submission Checklist

Run unit tests: ./runTests.py src/test/unit
Run cpplint: make cpplint
Declare copyright holder and open-source license: see below

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Columbia University

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

… once each time the inverse metric is set (instead of every sample, Issue #2881). This involved switching to setter/getters for interfacing with dense_e_point so I made the change for diag_e_point as well. Also changed set_metric verbage to set_inv_metric.

SteveBronder

couple quick suggestions and one Q!

src/stan/mcmc/hmc/hamiltonians/dense_e_point.hpp

src/stan/mcmc/hmc/hamiltonians/diag_e_metric.hpp

src/stan/mcmc/hmc/hamiltonians/diag_e_point.hpp

src/stan/mcmc/hmc/nuts/adapt_dense_e_nuts.hpp

src/stan/mcmc/hmc/hamiltonians/dense_e_point.hpp

betanalpha · 2020-03-17T14:45:55Z

The current design is intentional.

The Cholesky is needed only once per transition which is a relatively small cost compared to the many gradient evaluations needed within each transition. Saving the Cholesky decomposition introduces an additional $\mathcal{O}(N^{2})$ memory burden which is much more than the $mathcal{O}(N)$ burden everywhere else, and becomes problematic for sufficiently large models. Without any explicit profiling demonstrating that the Cholesky is a substantial const for typical models I don't see any strong motivation for the change. I will add this point to the original issue.

At the same time the members of ps_point and its derivations are intentionally left public to avoid any code overhead for accessors and mutators. Because these classes are not exposed through the algorithm API and everything is public we decided that there isn't any need for encapsulation. If encapsulation were added then the accessors and mutators would need to be unit tested as is done elsewhere in the library.

SteveBronder

lgtm! Though I'd like to wait to merge till @betanalpha approves the test example is satisfying

…-dense-metric-decomposition

…tion

… and diagonal metrics (Issue #2881)

…4.1 (tags/RELEASE_600/final)

stan-buildbot · 2020-11-01T23:15:37Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan	3.19	3.21	0.99	-0.58% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.02	0.02	1.0	-0.23% slower
eight_schools/eight_schools.stan	0.12	0.09	1.27	21.05% faster
gp_regr/gp_regr.stan	0.17	0.18	0.99	-1.31% slower
irt_2pl/irt_2pl.stan	5.69	5.8	0.98	-1.89% slower
performance.compilation	91.35	88.51	1.03	3.11% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.46	9.02	0.94	-6.69% slower
pkpd/one_comp_mm_elim_abs.stan	29.44	29.83	0.99	-1.33% slower
sir/sir.stan	133.71	134.61	0.99	-0.67% slower
gp_regr/gen_gp_data.stan	0.04	0.05	0.98	-2.21% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.96	3.08	0.96	-3.97% slower
pkpd/sim_one_comp_mm_elim_abs.stan	0.4	0.41	0.97	-3.18% slower
arK/arK.stan	1.81	2.67	0.68	-47.53% slower
arma/arma.stan	0.63	0.86	0.74	-36.05% slower
garch/garch.stan	0.7	0.65	1.07	6.43% faster
Mean result: 0.971182373984

Jenkins Console Log
Blue Ocean
Commit hash: a67e9e5

Machine information

ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

stan-buildbot · 2020-11-03T08:26:07Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan	3.12	3.17	0.98	-1.65% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.02	0.02	1.0	0.31% faster
eight_schools/eight_schools.stan	0.12	0.12	0.99	-1.13% slower
gp_regr/gp_regr.stan	0.17	0.17	1.01	0.54% faster
irt_2pl/irt_2pl.stan	5.7	5.7	1.0	-0.0% slower
performance.compilation	90.54	88.68	1.02	2.06% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.45	8.42	1.0	0.27% faster
pkpd/one_comp_mm_elim_abs.stan	30.54	28.12	1.09	7.91% faster
sir/sir.stan	128.64	131.17	0.98	-1.97% slower
gp_regr/gen_gp_data.stan	0.05	0.04	1.07	6.27% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.96	2.95	1.0	0.43% faster
pkpd/sim_one_comp_mm_elim_abs.stan	0.4	0.37	1.08	7.34% faster
arK/arK.stan	1.8	1.78	1.01	1.07% faster
arma/arma.stan	0.61	0.74	0.83	-20.48% slower
garch/garch.stan	0.67	0.55	1.22	17.77% faster
Mean result: 1.01859530072

Jenkins Console Log
Blue Ocean
Commit hash: a8e7500

Machine information

ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

SteveBronder

Couple little comments

src/stan/mcmc/hmc/hamiltonians/dense_e_point.hpp

src/stan/mcmc/hmc/hamiltonians/diag_e_point.hpp

src/stan/mcmc/hmc/hamiltonians/diag_e_metric.hpp

src/stan/mcmc/hmc/static_uniform/adapt_diag_e_static_uniform.hpp

SteveBronder · 2020-11-19T20:02:37Z

src/stan/mcmc/hmc/static_uniform/adapt_diag_e_static_uniform.hpp

@@ -31,10 +31,15 @@ class adapt_diag_e_static_uniform
      this->stepsize_adaptation_.learn_stepsize(this->nom_epsilon_,
                                                s.accept_stat());

-      bool update = this->var_adaptation_.learn_variance(this->z_.inv_e_metric_,
-                                                         this->z_.q);
+      Eigen::VectorXd inv_metric;


Can you know the size of this when you make it?

I could, but it gets resized in learn_variance. Should I still set the size when declaring it?

If you declare the sizes there then the resize is a no-op since it's already the correct size. But idt it's that big of a deal either way

I just left it off. It's kindof a weird thing happening here anyway (like learn_variance is effectively returning two things).

…-dense-metric-decomposition

stan-buildbot · 2020-11-28T18:56:38Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan	3.54	3.62	0.98	-2.16% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.02	0.02	1.05	4.57% faster
eight_schools/eight_schools.stan	0.12	0.12	1.0	0.18% faster
gp_regr/gp_regr.stan	0.17	0.17	0.98	-1.9% slower
irt_2pl/irt_2pl.stan	5.8	5.74	1.01	0.99% faster
performance.compilation	86.89	86.1	1.01	0.91% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.55	8.44	1.01	1.28% faster
pkpd/one_comp_mm_elim_abs.stan	30.82	29.5	1.04	4.27% faster
sir/sir.stan	129.2	135.34	0.95	-4.75% slower
gp_regr/gen_gp_data.stan	0.05	0.04	1.03	3.27% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan	3.0	2.93	1.02	2.41% faster
pkpd/sim_one_comp_mm_elim_abs.stan	0.37	0.4	0.91	-9.98% slower
arK/arK.stan	2.47	1.8	1.37	27.04% faster
arma/arma.stan	0.6	0.6	1.0	-0.41% slower
garch/garch.stan	0.76	0.66	1.16	13.63% faster
Mean result: 1.03555431436

Jenkins Console Log
Blue Ocean
Commit hash: c4f30f6

Machine information

ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

stan-buildbot · 2021-02-11T03:44:17Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan	3.46	3.41	1.01	1.43% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.02	0.02	0.99	-1.26% slower
eight_schools/eight_schools.stan	0.11	0.11	1.0	-0.5% slower
gp_regr/gp_regr.stan	0.15	0.15	1.01	1.42% faster
irt_2pl/irt_2pl.stan	5.24	5.29	0.99	-1.09% slower
performance.compilation	89.84	89.14	1.01	0.79% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	7.81	7.78	1.0	0.36% faster
pkpd/one_comp_mm_elim_abs.stan	29.7	28.98	1.02	2.43% faster
sir/sir.stan	129.8	131.11	0.99	-1.01% slower
gp_regr/gen_gp_data.stan	0.04	0.04	0.98	-2.28% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan	3.01	3.0	1.0	0.27% faster
pkpd/sim_one_comp_mm_elim_abs.stan	0.39	0.39	1.0	0.15% faster
arK/arK.stan	2.48	1.8	1.38	27.42% faster
arma/arma.stan	0.72	0.61	1.17	14.77% faster
garch/garch.stan	0.65	0.59	1.09	8.49% faster
Mean result: 1.04354280135

Jenkins Console Log
Blue Ocean
Commit hash: c4f30f6

Machine information

ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

SteveBronder · 2021-03-08T15:21:30Z

The Cholesky is needed only once per transition which is a relatively small cost compared to the many gradient evaluations needed within each transition. Saving the Cholesky decomposition introduces an additional $\mathcal{O}(N^{2})$ memory burden which is much more than the $mathcal{O}(N)$ burden everywhere else, and becomes problematic for sufficiently large models. Without any explicit profiling demonstrating that the Cholesky is a substantial const for typical models I don't see any strong motivation for the change. I will add this point to the original issue.

Thinking about this again, don't you pay the O(N^2) memory burden still computing the cholesky but it's just temporary data? If a program failed because of N is too large it would fail either way wouldn't it?

bbbales2 · 2021-03-09T13:30:19Z

don't you pay the O(N^2) memory burden still

Oh yeah, the temporary would still be O(N^2) since you compute the Cholesky at one point or another. Good point. I'll add the forwarding real quick

…-dense-metric-decomposition

…4.1 (tags/RELEASE_600/final)

…-dense-metric-decomposition

…b.com:stan-dev/stan into feature/issue-2881-dense-metric-decomposition

stan-buildbot · 2021-03-15T21:18:04Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan	3.44	3.44	1.0	-0.11% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.02	0.02	0.97	-2.88% slower
eight_schools/eight_schools.stan	0.11	0.11	1.01	0.6% faster
gp_regr/gp_regr.stan	0.16	0.16	0.98	-2.41% slower
irt_2pl/irt_2pl.stan	5.23	5.34	0.98	-2.11% slower
performance.compilation	90.25	88.92	1.01	1.47% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.92	8.86	1.01	0.64% faster
pkpd/one_comp_mm_elim_abs.stan	29.33	30.56	0.96	-4.18% slower
sir/sir.stan	130.71	130.01	1.01	0.54% faster
gp_regr/gen_gp_data.stan	0.03	0.04	0.91	-10.4% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan	3.12	3.12	1.0	-0.03% slower
pkpd/sim_one_comp_mm_elim_abs.stan	0.37	0.39	0.96	-4.28% slower
arK/arK.stan	1.9	1.88	1.01	0.88% faster
arma/arma.stan	0.63	0.95	0.67	-49.87% slower
garch/garch.stan	0.53	0.64	0.83	-21.14% slower
Mean result: 0.952369574099

Jenkins Console Log
Blue Ocean
Commit hash: 7072224

Machine information

ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

…-dense-metric-decomposition

stan-buildbot · 2021-04-03T22:18:42Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan	3.36	3.38	1.0	-0.49% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.02	0.02	0.99	-1.25% slower
eight_schools/eight_schools.stan	0.12	0.12	1.01	1.32% faster
gp_regr/gp_regr.stan	0.16	0.16	1.01	0.54% faster
irt_2pl/irt_2pl.stan	6.03	5.95	1.01	1.22% faster
performance.compilation	91.54	89.45	1.02	2.28% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	9.53	8.6	1.11	9.67% faster
pkpd/one_comp_mm_elim_abs.stan	28.75	30.51	0.94	-6.13% slower
sir/sir.stan	122.72	125.39	0.98	-2.18% slower
gp_regr/gen_gp_data.stan	0.03	0.03	1.0	-0.44% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan	3.03	3.05	1.0	-0.48% slower
pkpd/sim_one_comp_mm_elim_abs.stan	0.4	0.38	1.03	2.62% faster
arK/arK.stan	1.87	2.52	0.74	-34.96% slower
arma/arma.stan	0.87	0.78	1.11	9.59% faster
garch/garch.stan	0.62	0.73	0.85	-17.14% slower
Mean result: 0.985575872844

Jenkins Console Log
Blue Ocean
Commit hash: 9093b0f

Machine information

ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

…4.1 (tags/RELEASE_600/final)

SteveBronder requested changes Mar 13, 2020

View reviewed changes

bbbales2 commented Mar 15, 2020

View reviewed changes

src/stan/mcmc/hmc/hamiltonians/dense_e_point.hpp Outdated Show resolved Hide resolved

Applying suggested changes from review (Issue #2881)

7582cce

SteveBronder mentioned this pull request Mar 17, 2020

Doing the decomposition for every momentum draw is inefficient #2881

Open

SteveBronder approved these changes Mar 17, 2020

View reviewed changes

bbbales2 and others added 10 commits July 15, 2020 18:30

Merge remote-tracking branch 'origin/develop' into feature/issue-2881…

548f992

…-dense-metric-decomposition

Added unit tests to diag_e_point/diag_e_metric interface (Issue #2881)

83ef803

Merge branch 'develop' into feature/issue-2881-dense-metric-decomposi…

d4dd953

…tion

Converted some other samplers over to use accessor functions on dense…

1da6121

… and diagonal metrics (Issue #2881)

Removed more direct metric accesses (Issue #2881)

a182f78

Fixed lint issues (Issue #2881)

6f165f9

Merge commit '84535f53ef5e75029273f063f753663aa74d85a9' into HEAD

ed71201

[Jenkins] auto-formatting by clang-format version 6.0.0-1ubuntu2~16.0…

7889b27

…4.1 (tags/RELEASE_600/final)

Fixed inv_metric access in test (Issue #2881)

f8b5be7

Fixed test (Issue #2881)

a67e9e5

Fixed order of updated inverse metric with stepsize (Issue #2881)

a8e7500

SteveBronder reviewed Nov 19, 2020

View reviewed changes

Merge remote-tracking branch 'origin/develop' into feature/issue-2881…

c4f30f6

…-dense-metric-decomposition

bbbales2 added 4 commits March 9, 2021 11:27

Merge remote-tracking branch 'origin/develop' into feature/issue-2881…

cfdb3a2

…-dense-metric-decomposition

Updated to use forwarding (Issue #2881)

6925d77

Only close over rng (Issue #2881)

13d139f

Fix universal references (Issue #2881)

83f551f

yashikno and others added 4 commits March 15, 2021 20:31

Merge commit 'f915220425003cfab85185fc0b6bf45b485545a8' into HEAD

6482ba4

[Jenkins] auto-formatting by clang-format version 6.0.0-1ubuntu2~16.0…

2977426

…4.1 (tags/RELEASE_600/final)

Merge remote-tracking branch 'origin/develop' into feature/issue-2881…

be91f7c

…-dense-metric-decomposition

Merge branch 'feature/issue-2881-dense-metric-decomposition' of githu…

7072224

…b.com:stan-dev/stan into feature/issue-2881-dense-metric-decomposition

bbbales2 added 2 commits March 21, 2021 15:11

Merge remote-tracking branch 'origin/develop' into feature/issue-2881…

479783e

…-dense-metric-decomposition

Merge remote-tracking branch 'origin/develop' into feature/issue-2881…

9093b0f

…-dense-metric-decomposition

bbbales2 and others added 2 commits April 3, 2021 15:56

Store matrixL of llt instead of not matrixU

fd36a26

[Jenkins] auto-formatting by clang-format version 6.0.0-1ubuntu2~16.0…

1c610a2

…4.1 (tags/RELEASE_600/final)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't do inverse metric decomposition every draw #2894

Don't do inverse metric decomposition every draw #2894

bbbales2 commented Mar 6, 2020 •

edited

Loading

SteveBronder left a comment

betanalpha commented Mar 17, 2020

SteveBronder left a comment

stan-buildbot commented Nov 1, 2020

stan-buildbot commented Nov 3, 2020

SteveBronder left a comment

SteveBronder Nov 19, 2020

bbbales2 Nov 19, 2020

SteveBronder Nov 19, 2020

bbbales2 Mar 9, 2021

stan-buildbot commented Nov 28, 2020

stan-buildbot commented Feb 11, 2021

SteveBronder commented Mar 8, 2021

bbbales2 commented Mar 9, 2021

stan-buildbot commented Mar 15, 2021

stan-buildbot commented Apr 3, 2021

Don't do inverse metric decomposition every draw #2894

Are you sure you want to change the base?

Don't do inverse metric decomposition every draw #2894

Conversation

bbbales2 commented Mar 6, 2020 • edited Loading

Submission Checklist

Copyright and Licensing

SteveBronder left a comment

Choose a reason for hiding this comment

betanalpha commented Mar 17, 2020

SteveBronder left a comment

Choose a reason for hiding this comment

stan-buildbot commented Nov 1, 2020

stan-buildbot commented Nov 3, 2020

SteveBronder left a comment

Choose a reason for hiding this comment

SteveBronder Nov 19, 2020

Choose a reason for hiding this comment

bbbales2 Nov 19, 2020

Choose a reason for hiding this comment

SteveBronder Nov 19, 2020

Choose a reason for hiding this comment

bbbales2 Mar 9, 2021

Choose a reason for hiding this comment

stan-buildbot commented Nov 28, 2020

stan-buildbot commented Feb 11, 2021

SteveBronder commented Mar 8, 2021

bbbales2 commented Mar 9, 2021

stan-buildbot commented Mar 15, 2021

stan-buildbot commented Apr 3, 2021

bbbales2 commented Mar 6, 2020 •

edited

Loading