Fix size-zero mass matrix #3071

nhuurre · 2021-10-20T14:35:27Z

Submission Checklist

Run unit tests: ./runTests.py src/test/unit
Run cpplint: make cpplint
Declare copyright holder and open-source license: see below

Summary

Fixes a couple of problems with size-zero dense or diagonal mass matrix in (nonadaptive) HMC samplers. (The unit_e metric was already fine.)
~~Does not fix step size adaptation so zero-parameter models still cannot use adaptive samplers.~~ Adaptive samplers also work. Step size adaptation simply sets the step size to NaN; the step size has no effect anyway.

Intended Effect

Allow HMC to run even if a model has no parameters.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Niko Huurre

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

stan-buildbot · 2021-10-20T15:23:13Z

Name	Old Result	New Result	Ratio	Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan	3.58	3.54	1.01	1.09% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan	0.02	0.02	0.97	-2.71% slower
eight_schools/eight_schools.stan	0.09	0.09	0.99	-1.29% slower
gp_regr/gp_regr.stan	0.15	0.14	1.04	3.53% faster
irt_2pl/irt_2pl.stan	5.09	5.15	0.99	-1.24% slower
performance.compilation	91.7	90.53	1.01	1.27% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan	8.7	8.05	1.08	7.45% faster
pkpd/one_comp_mm_elim_abs.stan	30.59	29.83	1.03	2.5% faster
sir/sir.stan	119.65	118.55	1.01	0.92% faster
gp_regr/gen_gp_data.stan	0.03	0.03	1.04	4.13% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan	2.97	2.97	1.0	0.19% faster
pkpd/sim_one_comp_mm_elim_abs.stan	0.38	0.37	1.02	1.73% faster
arK/arK.stan	2.8	2.09	1.34	25.33% faster
arma/arma.stan	0.27	0.23	1.21	17.05% faster
garch/garch.stan	0.65	0.57	1.13	11.33% faster
Mean result: 1.05732252726

Jenkins Console Log
Blue Ocean
Commit hash: e138b71

Machine information

ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

betanalpha · 2021-11-01T16:11:44Z

As discussed in stan-dev/cmdstan#1054 the zero-dimensional boundary is not well-defined. The changes here are fine if assuming an empty-set boundary behavior but not a single element or multiple finite element set boundary behavior (in which case there would be one element to the metric in each case with a somewhat arbitrary value). My strong preference is to avoid this ambiguity by defining the HMC implementations only for N > 0.

nhuurre · 2021-11-01T17:14:58Z

I think you mean "as discussed in stan-dev/cmdstan#1030 (comment)"

Anyway, the only assumption you need to make is what is manifest in the code: the point in the unconstrained parameter space is represented as Eigen::VectorXd. That means the only possible zero-dimensional space is the one with a single point. And the current sampler code handles that correctly (i.e. an expensive no-op).

rok-cesnovar · 2021-11-09T11:11:45Z

How do we resolve this one? To me, the changes look good, but I am not confident in my knowledge of the MCMC code.

Reversing the no-parameters hack in CmdStan and without this PR, models like this one https://github.com/stan-dev/stat_comp_benchmarks/blob/master/benchmarks/gp_regr/gen_gp_data.stan error in an ugly way if run with the default CmdStan arguments (which is a model we also run in out test suite). The error is:

https://github.com/stan-dev/stat_comp_benchmarks/blob/master/benchmarks/gp_regr/gen_gp_data.stan
Iteration:  300 / 2000 [ 15%]  (Warmup)
Iteration:  400 / 2000 [ 20%]  (Warmup)
Iteration:  500 / 2000 [ 25%]  (Warmup)
Iteration:  600 / 2000 [ 30%]  (Warmup)
Iteration:  700 / 2000 [ 35%]  (Warmup)
Iteration:  800 / 2000 [ 40%]  (Warmup)
Iteration:  900 / 2000 [ 45%]  (Warmup)
Iteration: 1000 / 2000 [ 50%]  (Warmup)

Assertion failed: (index >= 0 && index < size()), function operator(), file stan/lib/stan_math/lib/eigen_3.3.9/Eigen/src/Core/DenseCoeffsBase.h, line 425.

We all agree the current Cmdstan auto switch to fixed param is not great, this error is also bad.

I guess the two options are:
a) this PR
b) stop CmdStan in case of no parameters and HMC samplers?

nhuurre · 2021-11-09T11:39:42Z

How do we resolve this one?

Ideally, we'll wait for @betanalpha to figure it out. There's still plenty of time before the next release, I'm sure he can do it.

b) stop CmdStan in case of no parameters and HMC samplers?

That's what the behaviour used to be. It was annoying stan-dev/cmdstan#953 for IMHO no good reason.

We also have the option
c) Change the services to invoke the fixed_param sampler through services::util::run_sampler(), like it does all the other samplers. Then the output should be uniform enough for the interfaces to parse even if the sampler info is technically wrong.

betanalpha · 2021-11-10T18:04:08Z

I think you mean "as discussed in stan-dev/cmdstan#1030 (comment) <stan-dev/cmdstan#1030 (comment)>"

Yes, thanks.

Anyway, the only assumption you need to make is what is manifest in the code: the point in the unconstrained parameter space is represented as Eigen::VectorXd. That means the only possible zero-dimensional space is the one with a single point. And the current sampler code handles that correctly (i.e. an expensive no-op).

That’s inconsistent with the code. As mentioned in the other thread there are multiple ways to try to define R^{0}, for example as _any_ countable subset of R^{1}; that includes a single point but also multiple points. Regardless all of those definitions would be represented by an Eigen::VectorXd with one element not zero elements. The difference in going from R^{1} to these definitions of R^{0} is not the size of the Eigen::VectorXd but rather the values that the lone element can take. All of these definitions of R^{0} do not define manifolds so there’s also no way to well-pose a metric, let alone an inverse metric; as I have said in the other thread the Hamiltonian Monte Carlo implementation isn’t well-defined in this limit. That said if one defined the limit to ensure that the adaptation procedure is well-defined in the limit, and the inverse metric is set to the empirical variance, then the inverse metric would still take values in R and hence be defined by a one-element array. In the particular case where R^{0} is defined as a single point the empirical variance would always equal that (arbitrarily) chosen point which should still be printed. Defining R^{0} as a single point is one possible definition of many, but that definition would not result in empty inverse metrics.

betanalpha · 2021-11-10T18:06:01Z

b) stop CmdStan in case of no parameters and HMC samplers? That's what the behaviour used to be. It was annoying stan-dev/cmdstan#953 <stan-dev/cmdstan#953> for IMHO no good reason.

I continue to strongly prefer this original behavior. It’s not annoying when code doesn’t run in undefined circumstances; it’s safer.

nhuurre · 2021-11-10T22:16:12Z

As mentioned in the other thread there are multiple ways to try to define R^{0}, for example as any countable subset of R^{1}

As mentioned in the other thread R^N is defined (up to isomorphism) as the N-dimensional real vector space.
General N-dimensional topological manifolds are usually defined to be topological sets that are locally homeomorphic to R^N so you need to know what R^N is (up to homeomorphism) before you can even talk about arbitrary N-manifolds.

Finite subsets of R^1 (or R^2, or R^10) certainly are 0-dimensional spaces but arbitrary countable subsets are not. For instance, Q is a countable subset of R that, while totally disconnected, is not discrete.
Or if you're going to ignore the induced subspace topology and just impose the discrete topology then why limit it to countable subsets?
Or maybe your idea of "zero dimensional" means Hausdorff dimension? Not a topological invariant but ok...

Regardless all of those definitions would be represented by an Eigen::VectorXd with one element not zero elements.

It feels like you're arguing against not the actual code changes here but some convoluted misfeature you've invented in your head. There is no attempt to add one-element representation for empty models.

The unconstrained parameters are realized as an Eigen::VectorXd with model.num_params_r() elements.

stan/src/stan/mcmc/hmc/base_hmc.hpp

Lines 31 to 33 in 233d9bf

    
           base_hmc(const Model& model, BaseRNG& rng) 
        
               : base_mcmc(), 
        
                 z_(model.num_params_r()),

stan/src/stan/mcmc/hmc/hamiltonians/ps_point.hpp

Lines 17 to 21 in 233d9bf

    
           class ps_point { 
        
            public: 
        
             explicit ps_point(int n) : q(n), p(n), g(n) {} 
        
             Eigen::VectorXd q;

The sampler has always worked correctly when model.num_params_r() >= 1 and no changes are proposed for that case.
But the Stan language allows you to define a model with model.num_params_r() == 0.
Examples:

parameters {
}

parameters {
  vector[0] x;
}

parameters {
  simplex[1] x;
}

These models require fixed_param sampler because HMC code assumes that the unconstrained parameter vector has at least one element.
This PR only removes that assumption. The change affects only models with model.num_params_r() == 0. These models still have size zero unconstrained parameters. The first two models also produce size-zero output, i.e. only internal sampler parameters are printed. The third model produces a single output parameter x[1] whose value is defined by the simplex_constrain transform.

Another "zero-dimensional" model you may want to consider is

parameters {
  unit_vector[1] x;
}

The constrained parameter space is zero-dimensional but, because the unit_vector transform is many-to-one, this model actually has num_params_r() == 1 and consequently the PR makes no difference for this model. Nevertheless, it may be reassuring to know that sampling fails:

Exception: Found dimension size one in unit vector declaration. One-dimensional unit vector is discrete
but the target distribution must be continuous. variable=x; dimension size expression=1
(in 'unit_vec.stan', line 2, column 14 to column 15)

nhuurre · 2021-12-16T11:47:12Z

@betanalpha
I'd like to get this resolved before the next release, January 17th. If you don't have time to review this can you recommend someone else?

Also, if you insist that stan-dev/cmdstan#992 should be reverted instead then I have another question for you.
That PR removed an error message that said

Must use algorithm=fixed_param for model that has no parameters.

Do you think this message is adequate? As far as I can tell, your argument about alternative topologies for R^0 is a problem not only for HMC but also for sampling with fixed_param. It then seems to me that unconditional advice to use fixed_param is not appropriate. How should the user verify that fixed_param sampler works as desired for their model?

mitzimorris

lgtm!

betanalpha · 2021-12-21T16:45:52Z

@betanalpha <https://github.com/betanalpha> I'd like to get this resolved before the next release, January 17th. If you don't have time to review this can you recommend someone else?

Apologies for the delay and thanks for your patience in my reply. As I said before the code itself is fine given the choice of behavior, but I still disagree with the behavior. Right now there are lots of interacting aspects of the sampler, not only in terms of transitions but also adaptation and output. All of these are currently defined consistently when there is at least one variable defined in the `parameters` block and the model configuration space becomes R^{N} for N >= 1. This includes - Adapting the step size to the Hamiltonian error through the heuristic “acceptance” statistic. Because proper log probability density functions cannot be constant on R^{N} this is well-defined unless the target is not normalizable, hence the current warnings. - Adapting the inverse metric elements to the component variances of the model configuration space. - Transitioning with discretized Hamiltonian dynamics. - Outputting the current state after each iteration. I don’t disagree that each of these aspects can be extended to R^{0} and the empty set in various ways, but I do disagree that there is any canonical/non-confusing extension. For example what is the inverse metric for R^{0}? Modeling R^{0} as a single point the metric becomes trivial, but in the sense that it can take on _any_ value, not no value. In particular the variance is still well-defined and so that adaptation heuristic can still be used. At the same time since there is still a state (an arbitrary point in R) it should be output after every iteration; in order to implement this a default definition/parameterization/name of R^{0} would be required. In my opinion having no metric or state is compatible not with the modeling configuration space being R^{0} but rather the empty set. Indeed I think that most people interpret a program with an empty parameter block as defining an empty model configuration space. Moreover the step size adaptation can be conditionally modified to work with R^{N} for N >=0, but then the adaption algorithm has an edge case which complicates expectations for its behavior. In my opinion a much more robust design is to have a method that behaves uniformly for all acceptable inputs and appropriately errors out for non-acceptable inputs. This very much goes against R idioms, but it’s also those idioms that cause so much chaos. To summarize I think that there are two questions here: 1. What is the mathematical interpretation of an empty parameter block? 2. Can the standard HMC samplers be extended to that circumstance? 3. Should the standard HMC samplers be extended to that circumstance? I think that much of this discussion is due to there being multiple answers to (1) depending on one’s preferred formalisms or even more casual intuitions. Because of that ambiguity I think that choosing any one definition will be confusing to at least some people. I believe that the answer to (2) is yes for just about every possible definition of (1), but I also think that the answer to (3) should be no. Firstly there’s the ambiguity in (1) which would translate to ambiguity in the implementation of the sampler. Secondly there’s the resulting conditional logic which would make the API more complete and should also require a bunch more testing of all of the sampler components in those edge cases. In my mind by far the more robust option is to just restrict the use of HMC samplers to programs with at least one variable in the parameters block where all of these ambiguities vanish, and require users to use another method if they want to run programs with an empty parameters block.

Also, if you insist that stan-dev/cmdstan#992 <stan-dev/cmdstan#992> should be reverted instead then I have another question for you. That PR removed an error message that said Must use algorithm=fixed_param for model that has no parameters. Do you think this message is adequate? As far as I can tell, your argument about alternative topologies for R^0 is a problem not only for HMC but also for sampling with fixed_param. It then seems to me that unconditional advice to use fixed_param is not appropriate. How should the user verify that fixed_param sampler works as desired for their model?

I very much agree that all our warnings could use more nuance and precision! Something like “Hamiltonian Monte Carlo has not been implemented for programs with no parameters. The ‘fixed_param’ sampler will accept this program, evaluating all other blocks.” would be much better.

nhuurre · 2021-12-22T14:16:04Z

“Hamiltonian Monte Carlo has not been implemented for programs with no parameters.
The ‘fixed_param’ sampler will accept this program, evaluating all other blocks.”

There's a potentially confusing difference between "no constrained parameters" and "no unconstrained parameters". A model with simplex[1] in the parameters block arguably has a parameter (and fixed_param sampler does evaluate the block) but has no unconstrained parameters and therefore would not implement HMC.

For example what is the inverse metric for R^{0}?

It's a 0-by-0 matrix, ie. the matrix with no elements. Indeed, this PR is titled "Fix size-zero mass matrix" because originally the only change was to fix reading and printing a metric with no elements; and currently the only other change is that (in zero dimensions) stepsize initialization blows up the stepsize immediately without erroring. (The sampler still tries to adapt the stepsize as usual but that of course has no effect anymore. In positive dimensions, undefined stepsize would make every transition an instant divergence but in zero dimensions the integrator doesn't even notice.)
No changes to the inference algorithm are necessary; the expected behaviour follows naturally from Eigen's definition of zero-dimensional vector arithmetic and I take that to mean this extension to zero-dimensional HMC is reasonably "canonical."

since there is still a state (an arbitrary point in R) it should be output after every iteration; in order to implement this a default definition/parameterization/name of R^{0} would be required.

I think this is the crux of your confusion. You're thinking of R^0 as a subspace of R^1 and that does lead to ambiguity. But a similar ambiguity afflicts R^1 if you think of it only as an arbitrary subspace of R^2, and yet obviously that ambiguity is irrelevant. I'm saying that R^0, like R^1, can just be its own thing without the need to be embedded in some larger space.
The state/output after every iteration is empty but ... see the next point:

In my opinion having no metric or state is compatible not with the modeling configuration space being R^{0} but rather the empty set. Indeed I think that most people interpret a program with an empty parameter block as defining an empty model configuration space.

There's an important difference between

the configuration space is the empty set (ie contains nothing)
the configuration space contains the empty set (as a member, not a subset)

The configuration space is the set of all possible draws; a single draw from a model is a finite "set" of numbers. (a singleton for R^1, a pair for R^2, a triple for R^3, and, yes, the empty set for R^0)

In case (1) the configuration space has no members and so no draw is allowed. If we try to write a prior sampling model it must be

generated quantities {
    reject("No valid sample can exist.");
}

In case (2) the empty set is a valid draw and sampling from the prior is possible:

generated quantities {
    row_vector[0] x = []; // a sample exists, but is empty
}

The situation with an empty block is identical to (2), not (1). (If it was (1) then any model without a generated quantities block would be ill-formed.)

Alternatively, let's note that concatenating two models creates a configuration space that is the Cartesian product the of configuration spaces of the (sub)models.
The product of empty set with any set is always empty; and every set is isomorphic to its product with a singleton set. As theorems about configuration spaces these translate to observations about models: a single reject() anywhere in the model rejects the whole model; but adding an empty list of statements does not change the model.

betanalpha · 2021-12-24T17:25:13Z

There's a potentially confusing difference between "no constrained parameters" and "no unconstrained parameters". A model with simplex[1] in the parameters block arguably has a parameter (and fixed_param sampler does evaluate the block) but has no unconstrained parameters and therefore would not implement HMC.

Yeah, although the simplex and correlation matrix types are the only ones with this property, right?

It's a 0-by-0 matrix, ie. the matrix with no elements. Indeed, this PR is titled "Fix size-zero mass matrix" because originally the only change was to fix reading and printing a metric with no elements; and currently the only other change is that (in zero dimensions) stepsize initialization blows up the stepsize immediately without erroring.

This is one place here I disagree depending on the interpretation of “an empty parameter block”. Interpreting “an empty parameter block” as R^{0} with a single value implies that one still needs a variable to hold that single value. A well-defined metric would map two copies of the that variable to 0, and the matrix representation would reduce to a scalar real-valued variable that maps the singleton to zero under multiplication, i.e. a 1x1 matrix whose sole element is always zero. The corresponding inverse metric would be a 1x1 matrix whose sole element is 1/0; this clashes with the variance heuristic but again this due to the actual HMC implementation being well-defined for only R^{N >= 1}. It’s the same logic as a Nx1 array representing both a vector and a matrix. I agree that this behavior is appropriate when interpreting “an empty parameter block” as the empty set.

(The sampler still tries to adapt the stepsize as usual but that of course has no effect anymore. In positive dimensions, undefined stepsize would make every transition an instant divergence but in zero dimensions the integrator doesn't even notice.)

Right, although this adaptation behavior is itself due to the motivating heuristic (constant densities are non-normalizable) that holds for only R^{N >= 1}. This is one of the issues I have when trying to extend an HMC implementation beyond R^{N >= 1}. Each part can be extended in different ways, leading to multiple extension where the various parts of the implementation (adaptation, transition, input/output, etc) are no longer related to each other in the same way that they once were. Those are, in my mind, different sampler implementations entirely.

since there is still a state (an arbitrary point in R) it should be output after every iteration; in order to implement this a default definition/parameterization/name of R^{0} would be required. I think this is the crux of your confusion. You're thinking of R^0 as a subspace of R^1 and that does lead to ambiguity. But a similar ambiguity afflicts R^1 if you think of it only as an arbitrary subspace of R^2, and yet obviously that ambiguity is irrelevant. I'm saying that R^0, like R^1, can just be its own thing without the need to be embedded in some larger space.

I don’t think that ambiguity is irrelevant for N >= 1 at all; it’s the main source of confusion people have with the real numbers. If the real space R^{N} is interpreted as the equivalence class of all N-dimensional real spaces up to isomorphisms then there is no canonical projection from R^{N} to R^{N - 1}. Instead it’s a particular parameterization/coordinate system that defines a particular real space, and a projection has to be defined between particular parameterizations of R^{N} and R^{N - 1}. Alternatively if we want to think about single spaces and not equivalence classes then we have to accept that there is not a unique R^{N} but rather an infinite number of them.

There's an important difference between the configuration space is the empty set (ie contains nothing) the configuration space contains the empty set (as a member, not a subset) In case (1) the configuration space has no members and so no draw is allowed. If we try to write a prior sampling model it must be generated quantities { reject("No valid sample can exist."); } In case (2) the empty set is a valid draw and sampling from the prior is possible: generated quantities { row_vector[0] x = []; // a sample exists, but is empty } The situation with an empty block is identical to (2), not (1). (If it was (1) then any model without a generated quantities block would be ill-formed.) Alternatively, let's note that concatenating two models creates a configuration space that is the Cartesian product the of configuration spaces of the (sub)models. The product of empty set with any set is always empty; and every set is isomorphic to its product with a singleton set. As theorems about configuration spaces these translate to observations about models: a single reject() anywhere in the model rejects the whole model; but adding an empty list of statements does not change the model.

Yes, important point.

The configuration space is the set of all possible draws; a single draw from a model is a finite "set" of numbers. (a singleton for R^1, a pair for R^2, a triple for R^3, and, yes, the empty set for R^0)

This is where we disagree. My issues is that multiple interpretations for R^{0} have been thrown around — is it a set that contains only the empty set, or a set that contains only a single real number, or a set that contains an real number and the empty set? To me this ambiguity makes any extension of the current sampler to R^{0} too confusing to be worthwhile, and I strongly prefer avoiding it entirely. The entire motivation for the fixed parameter sampler was to provide a sampler that doesn’t care about the structure of the parameters block and avoid confusions like this in the first place.

nhuurre · 2021-12-24T17:54:13Z

Yeah, although the simplex and correlation matrix types are the only ones with this property, right?

Yes. The other weird one is unit_vector[1] but that fails in a different way.

I don’t think that ambiguity is irrelevant for N >= 1 at all; it’s the main source of confusion people have with the real numbers.

Irrelevant in the sense that we don't argue about whether HMC can be extended to R^1 in a canonical way.

My issues is that multiple interpretations for R^{0} have been thrown around

Thrown around only by you. As I think I've already said this PR (and stan-dev/cmdstan#1054) only touches the case where the unconstrained parameter vector has no elements i.e. the model configuration space contains the empty vector (or the "empty set") and nothing else.
Also, there's no change to how Stan models are transpiled; a model without a parameter block has always resulted in the same configuration space.

betanalpha · 2022-01-12T22:32:21Z

I don’t think that ambiguity is irrelevant for N >= 1 at all; it’s the main source of confusion people have with the real numbers. Irrelevant in the sense that we don't argue about whether HMC can be extended to R^1 in a canonical way.

Except that there is no canonical implementation of HMC on R^{1} — the standard coordinate-based implementation of HMC on R^{N >= 1} is unambiguous only after a parameterization has been fixed.

My issues is that multiple interpretations for R^{0} have been thrown around Thrown around only by you. As I think I've already said this PR (and stan-dev/cmdstan#1054 <stan-dev/cmdstan#1054>) only touches the case where the unconstrained parameter vector has no elements i.e. the model configuration space contains the empty vector (or the "empty set") and nothing else.

Let me try to describe my objections again based on the interpretations of R^{0} that you have given: Interpretation: R^{0} is a singleton Complication: There are still an infinite number of singleton subsets of R^{1}, i.e. each point. This is the same ambiguity that arises when selecting a R^{N - 1} from a R^{N}; resolve the ambiguity requires specifying a parameterization for both spaces. Equivalently a single point can be labeled with a different value depending on the coordinate system assumed. Interpretation: R^{0} is the empty set Complication: The empty set is a singleton, and a common model for a zero-dimensional vector space, but the association of a real space with a vector space depends on a parameterization. Once again the parameterization/coordinate system freedom raises an ambiguity about to which point/label the zero vector corresponds. I’m being very careful about the difference between a real space and its associated Euclidean vector space because that difference is fundamental to differential geometry on which the most general Hamiltonian Monte Carlo algorithm is built. Because a non-empty Stan program fixes a parameterization the Eigen::VectorXd’s in the implementation of HMC can be overloaded, and they are in Stan’s implementation. They represent both components of a vector (the momenta being interpreted as elements of the local cotangent space) _and_ the coordinates of a space (the momenta being interpreted as points on a cotangent fiber). In the zero-dimensional limit the cotangent spaces reduce to zero vectors, and empty vectors are appropriate, but the cotangent fibers would reduce to single points with an infinite number of possible labels. Because HMC is defined over the entire cotangent bundle and not just a single cotangent space at a time, the coordinate interpretation is to me prioritized over the vector interpretation. In the zero-dimensional limit we would still need a non-empty Eigen::VectorXd to hold the lone coordinate label (whatever convention might be chosen for what that value might be). This is also important because it would propagate to the output. For example in the diagnostic output empty vectors will read out nothing where as constant, single-element vectors would read out that arbitrary limiting coordinate value. Again there’s an ambiguity to the interpretation of that output (are they vector components or manifold coordinates?). In the end I don’t think it matters which interpretation one might one to prioritize (I certainly don’t expect everyone to agree with my more geometric perspective!) so much as there are multiple interpretations. To me the least confusing step we can take it not try to define an HMC implementation for empty model configuration spaces and err out, forcing users to run methods that are unambiguously defined in that setting.

Also, there's no change to how Stan models are transpiled; a model without a parameter block has always resulted in the same configuration space.

Sure; the disagreement is whether the HMC implementation (and really the entire MCMC toolchain including output) applies to this particular model configuration space.

nhuurre · 2022-01-15T12:31:52Z

Interpretation: R^{0} is a singleton

Interpretation: R^{0} is the empty set
Complication: The empty set is a singleton,

If the second interpretation is that R^{0} is a singleton then maybe the first interpretation should be described as "R^{0} contains a single real number." I would also rather say that "R^{0} is a singleton set that contains the empty set" instead of "R^{0} is the empty set". Programming language terminology and set theory disagree when is comes to the exact meaning of "singleton": set theorist say that a set is singleton if it contains a single element, programmers say that an object is a singleton if it is the unique object of its type; in other words, to a mathematician "singleton" is the type but to a programmer "singleton" is the instance of that type. I've already discussed the importance of "configuration space is the empty set" vs "configuration space contains the empty set".

I’m being very careful about the difference between a real space and its associated Euclidean vector space because that difference is fundamental to differential geometry on which the most general Hamiltonian Monte Carlo algorithm is built.

I'm not entirely sure what you mean by a "real space" here.
The most basic notion of an N-dimensional "real space" I can think of is the Cartesian product of N copies of the real numbers. Each member of this set is an N-tuple, an ordered list of N real numbers. (A C++ programmer might call these "length-N vectors" but they're not vectors in the mathematical sense.)
Let's call this basic notion the "N-tuple space." The other spaces are constructed from this N-tuple space by associating it with a group of structure-preserving automorphisms. The most common examples:

the real vector space, i.e. the tuples augmented with vector addition and scalar multiplication. The automorphism group is the general linear group GL(N)
the real affine space, basically a vector space without origin. The automorphisms form the affine group Aff(N)
the Euclidean metric space. The automorphism group is the group of Euclidean isometries E(N)
R^N as a smooth manifold. The automorphism group is the diffeomorphism group Diff(R^N)

For N=0 the tuple space is a singleton set that contains only the empty tuple, and all these automorphism groups are trivial.

They represent both components of a vector (the momenta being interpreted as elements of the local cotangent space) and the coordinates of a space (the momenta being interpreted as points on a cotangent fiber).

I don't see the distinction you're making. A cotangent fiber is the inverse image of the projection map, and thus a subspace of the cotangent bundle, which in turn is the (disjoint) union of all local cotangent spaces (with an appropriate topology). The fiber at a point is the local vector space. You could take an isomorphic vector space that's not part of the bundle but then what makes it the cotangent space?
"Components of a vector" is a coordinate system on the vector space, no?

In the zero-dimensional limit the cotangent spaces reduce to zero vectors, and empty vectors are appropriate, but the cotangent fibers would reduce to single points with an infinite number of possible labels.

So, your objection to this pull req is that by allowing an empty parameter vector it promotes the "cotangent vector" interpretation and rejects the "fiber coordinate" interpretation? And the only practical difference between these interpretations is that the latter requires a redundant coordinate in zero dimensional space?

For N>=1, a coordinate chart on N-dimensional manifold is an injective function from the N-tuple space to the manifold.
I take it you're saying that a coordinate chart on a zero-dimensional manifold is an injective partial function from the real numbers (or the 1-tuple space) to the manifold, with all the ambiguity that entails.
I'd say the natural notion of a coordinate chart in zero dimensions is a function from the 0-tuple space to the manifold. When the manifold is connected there's only one such function--the function that maps the empty tuple to the single point on the manifold.

Given the model

parameters {
  vector[2] x;
}

the diagnostic output CSV contains, in addition to the internal sampler parameters, the following columns

x.1, x.2, p_x.1, p_x.2, g_x.1, g_x.2

If I understand

the cotangent fibers would reduce to single points with an infinite number of possible labels.

In the zero-dimensional limit we would still need a non-empty Eigen::VectorXd to hold the lone coordinate label

For example in the diagnostic output empty vectors will read out nothing where as constant, single-element vectors would read out that arbitrary limiting coordinate value.

then you think the model

parameters {
  vector[0] x;
}

should generate diagnostic output with the column

p_x.1

and this column records the coordinate label on the cotangent fiber. (it's not clear to me if you expect the columns x.1 and g_x.1 to also be present.)

This does not make sense to me; compare

parameters {
  vector[0] x;
  vector[1] y;
}

which does not need p_x.1. But why does the presence of y matter here? You said

This is the same ambiguity that arises when selecting a R^{N - 1} from a R^{N}; resolve the ambiguity requires specifying a parameterization for both spaces.

and if you interpret this model as selecting R^1 subspace from R^2 then you must specify the coordinate p_x.1, no?

betanalpha · 2022-03-01T22:27:21Z

If the second interpretation is that R^{0} is a singleton then maybe the first interpretation should be described as "R^{0} contains a single real number." I would also rather say that "R^{0} is a singleton set that contains the empty set" instead of "R^{0} is the empty set". Programming language terminology and set theory disagree when is comes to the exact meaning of "singleton": set theorist say that a set is singleton if it contains a single element, programmers say that an object is a singleton if it is the unique object of its type; in other words, to a mathematician "singleton" is the type but to a programmer "singleton" is the instance of that type. I've already discussed the importance of "configuration space is the empty set" vs "configuration space contains the empty set".

Sure, I’m happy with “R^{0} is a singleton set” with the ambiguity about what single element that set contains.

I'm not entirely sure what you mean by a "real space" here. The most basic notion of an N-dimensional "real space" I can think of is the Cartesian product of N copies of the real numbers. Each member of this set is an N-tuple, an ordered list of N real numbers. (A C++ programmer might call these "length-N vectors" but they're not vectors in the mathematical sense.) Let's call this basic notion the "N-tuple space." The other spaces are constructed from this N-tuple space by associating it with a group of structure-preserving automorphisms. The most common examples: • the real vector space, i.e. the tuples augmented with vector addition and scalar multiplication. The automorphism group is the general linear group GL(N) • the real affine space, basically a vector space without origin. The automorphisms form the affine group Aff(N) • the Euclidean metric space. The automorphism group is the group of Euclidean isometries E(N) • R^N as a smooth manifold. The automorphism group is the diffeomorphism group Diff(R^N) For N=0 the tuple space is a singleton set that contains only the empty tuple, and all these automorphism groups are trivial.

For N >=1 I have no disagreements about these constructions (by default I take the smooth manifold perspective). At N=1 the construction is initialized with the a definition of the real line, N>1 begins with Cartesian product and then filled out by applying the appropriate transforms to generate either other spaces or different “parameterizations” of the initial space. Applying an appropriate transformation to the initial real line first defines the same spaces so everything commutes. This construction also motivates the projective structure. An N-dimensional real space projects down to N-different (N - 1)-dimensional real spaces that are related by appropriate automorphisms. At the same time we can apply an automorphism before projecting to give different (N - 1)-dimensional spaces, which formalizes the previous discussion about the ambiguity in how real subspaces “fit” into higher-dimensional spaces. My disagreement is with N=0 because I believe that it breaks with the projective structure for N >= 1. If all of the automorphisms are trivial then all projections from R^{1} to R^{0} must be equivalent which means that there is no longer any ambiguity into how the single point in R^{0} “fits” into R^{1}. In other words your tuple construction wouldn’t agree with a “projective” construction of R^{0} from R^{1}. I think that this all comes down to fact that while the definitions of R^{N>=1} are all consistent the definitions for R^{0} aren’t completely consistent. Because of these inconsistencies I don’t think it’s worth implementing behavior that prioritizes one particular definition.

They represent both components of a vector (the momenta being interpreted as elements of the local cotangent space) and the coordinates of a space (the momenta being interpreted as points on a cotangent fiber). I don't see the distinction you're making. A cotangent fiber is the inverse image of the projection map, and thus a subspace of the cotangent bundle, which in turn is the (disjoint) union of all local cotangent spaces (with an appropriate topology). The fiber at a point is the local vector space. You could take an isomorphic vector space that's not part of the bundle but then what makes it the cotangent space? "Components of a vector" is a coordinate system on the vector space, no?

The distinction is probably more clear from the projective picture discussed above — how is the base space interpreted as a subspace of the cotangent bundle? Applying the canonical projection operator collapses the fibers down to a single point in the base space, but the location of that point _relative_ to full cotangent bundle will define a single point in each fiber. Applying different diffeomorphisms to the cotangent bundle before applying the canonical projection operator traces out different subspaces and hence different points along the fibers to which the base space appears to be “attached”.

So, your objection to this pull req is that by allowing an empty parameter vector it promotes the "cotangent vector" interpretation and rejects the "fiber coordinate" interpretation? And the only practical difference between these interpretations is that the latter requires a redundant coordinate in zero dimensional space?

Ultimately my objection is that the two perspectives suggest different implementations, an empty vector or a one-dimensional vector fixed to a nominal value, and because of that ambiguity I don’t think we should be forcing an implementation at all. As I have said before I am much happier saying that our implementation of HMC is defined for real spaces with dimension greater than or equal to 1, so that these subtitles about R^{0} and compatible implementations aren’t relevant.

For N>=1, a coordinate chart on N-dimensional manifold is an injective function from the N-tuple space to the manifold. I take it you're saying that a coordinate chart on a zero-dimensional manifold is an injective partial function from the real numbers (or the 1-tuple space) to the manifold, with all the ambiguity that entails. I'd say the natural notion of a coordinate chart in zero dimensions is a function from the 0-tuple space to the manifold. When the manifold is connected there's only one such function--the function that maps the empty tuple to the single point on the manifold.

As discussed above I think the ultimate source of the disagreement is whether or not one considers how these zero-dimensional charts are configured relative to one-dimensional charts.

then you think the model parameters { vector[0] x ; } should generate diagnostic output with the column p_x.1 and this column records the coordinate label on the cotangent fiber. (it's not clear to me if you expect the columns x.1 and g_x.1 to also be present.)

In my opinion if we interpret R^{0} as an arbitrary point from R^{1} then we would need x.1 to denote that value along with p_x.1. I believe that I also mentioned this in my discussion on the `fixed_param` PR. I’m not sure about g_x.1. Is the derivative undefined or are there an infinite number of valid derivatives each of which would correspond to a different value for g_x.1?

This does not make sense to me; compare parameters { vector[0] x ; vector[1] y ; } which does not need p_x.1. But why does the presence of y matter here? You said This is the same ambiguity that arises when selecting a R^{N - 1} from a R^{N}; resolve the ambiguity requires specifying a parameterization for both spaces. and if you interpret this model as selecting R^1 subspace from R^2 then you must specify the coordinate p_x.1, no?

I think that we all agree if we interpret R^{0} as a singleton set containing the empty set then Y = R^{0} \cup X = { \emptyset } \cup X = X so we can ignore R^{0} whenever X is nontrivial which is why Stan can treat ``` parameters { vector[0] x; vector[1] y; } ``` and ``` parameters { vector[1] y; } ``` the same without any observable consequences. Treating R^{0} as singleton set with a single point from R^{1} we’d have instead Y = R^{0} \cup X = { x0 } \cup X, with x0 denoting how exactly X fits into Y as a subspace. One could (awkwardly) make this explicit in a Stan program as ``` parameters { vector[1] x; vector[1] y; } model { if (x[1] != x0) reject; ... } ``` When X is nontrivial we can ignoring that information to give a well-defined ambient space Y = X and acquire the same equivalences as above. When we don’t have a nontrivial X, however, then in the first case Y is just { \emptyset }. Y isn’t the empty set but a singleton set with a lone element that is no longer negligible. The same in the second case where Y = { x0 }. There we can’t get rid of the subspace information because it’s all that we have left. In other words no matter what element R^{0} contains it contains _something_. When the ambient space is just R^{0} why wouldn’t we print out that value? One way to rephrase the conversion is how exactly we do interpret Eigen::VectorXd x(0)? Do we interpret is as the empty set because the data structure contains no information? In which case wouldn’t it be inappropriate for modeling R^{0} = { \empytset } != \emptyset? Or is Eigen::VectorXd x(0) meant to define the zero vector denoted with some nontrivial label? If this is the case then why does `std::cout << x << std::endl;` print nothing instead of the nontrivial label?

Fix size-zero mass matrix

e138b71

nhuurre mentioned this pull request Oct 27, 2021

Don't enforce fixed_param sampler when model has no parameters stan-dev/cmdstan#1054

Open

2 tasks

rok-cesnovar requested a review from SteveBronder October 31, 2021 08:56

skip stepsize adaptation in zero dimensions

522a34f

mitzimorris approved these changes Dec 16, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix size-zero mass matrix #3071

Fix size-zero mass matrix #3071

nhuurre commented Oct 20, 2021 •

edited

stan-buildbot commented Oct 20, 2021

betanalpha commented Nov 1, 2021

nhuurre commented Nov 1, 2021

rok-cesnovar commented Nov 9, 2021

nhuurre commented Nov 9, 2021

betanalpha commented Nov 10, 2021 via email

betanalpha commented Nov 10, 2021 via email

nhuurre commented Nov 10, 2021

nhuurre commented Dec 16, 2021

mitzimorris left a comment

betanalpha commented Dec 21, 2021 via email

nhuurre commented Dec 22, 2021

betanalpha commented Dec 24, 2021 via email

nhuurre commented Dec 24, 2021

betanalpha commented Jan 12, 2022 via email

nhuurre commented Jan 15, 2022

betanalpha commented Mar 1, 2022 via email

Fix size-zero mass matrix #3071

Are you sure you want to change the base?

Fix size-zero mass matrix #3071

Conversation

nhuurre commented Oct 20, 2021 • edited

Submission Checklist

Summary

Intended Effect

Copyright and Licensing

stan-buildbot commented Oct 20, 2021

betanalpha commented Nov 1, 2021

nhuurre commented Nov 1, 2021

rok-cesnovar commented Nov 9, 2021

nhuurre commented Nov 9, 2021

betanalpha commented Nov 10, 2021 via email

betanalpha commented Nov 10, 2021 via email

nhuurre commented Nov 10, 2021

nhuurre commented Dec 16, 2021

mitzimorris left a comment

Choose a reason for hiding this comment

betanalpha commented Dec 21, 2021 via email

nhuurre commented Dec 22, 2021

betanalpha commented Dec 24, 2021 via email

nhuurre commented Dec 24, 2021

betanalpha commented Jan 12, 2022 via email

nhuurre commented Jan 15, 2022

betanalpha commented Mar 1, 2022 via email

nhuurre commented Oct 20, 2021 •

edited