-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sum_to_zero constraint, free, and check #3099
Conversation
@WardBrian it is perhaps better if we implement the transform from https://discourse.mc-stan.org/t/test-soft-vs-hard-sum-to-zero-constrain-choosing-the-right-prior-for-soft-constrain/3884/31?u=spinkney? It's unclear if this is better than what is in this pr from the post but it does give the math to get the correct uniform marginals. |
@spinkney -- this just implements the transform @bob-carpenter gave me, I'm not qualified to access it versus an alternative |
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few small comments but generally everything looks good
inline plain_type_t<Vec> sum_to_zero_constrain(const Vec& y) { | ||
const auto Km1 = y.size(); | ||
plain_type_t<Vec> x(Km1 + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for size 0 here
void check_sum_to_zero(const char* function, const char* name, const T& theta) { | ||
using std::fabs; | ||
auto&& theta_ref = to_ref(value_of_rec(theta)); | ||
if (!(fabs(theta_ref.sum()) <= CONSTRAINT_TOLERANCE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!(fabs(theta_ref.sum()) <= CONSTRAINT_TOLERANCE)) { | |
if (unlikely(!(fabs(theta_ref.sum()) <= CONSTRAINT_TOLERANCE))) { |
arena_y.adj().array() -= arena_x.adj_op()(N); | ||
arena_y.adj() += arena_x.adj_op().head(N); | ||
}); | ||
return ret_type(arena_x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ret_type(arena_x); | |
return arena_x; |
|
||
const auto N = y.size(); | ||
if (unlikely(N == 0)) { | ||
return ret_type(Eigen::VectorXd{{0}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ret_type(Eigen::VectorXd{{0}}); | |
return arena_t<ret_type>(Eigen::VectorXd{{0}}); |
* @return Zero-sum vector of dimensionality K. | ||
*/ | ||
template <typename T, require_rev_col_vector_t<T>* = nullptr> | ||
auto sum_to_zero_constrain(const T& y, scalar_type_t<T>& lp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto sum_to_zero_constrain(const T& y, scalar_type_t<T>& lp) { | |
inline auto sum_to_zero_constrain(const T& y, scalar_type_t<T>& lp) { |
@SteveBronder all addressed. I'd like to wait on the mathematical discussion above before actually merging |
I commented on the stanc3 issue and will repeat myself here: using the QR or SVD-based decompositions break our current abstraction that just treats the constraining and unconstraining functions as standalones that are not data-dependent. It wouldn't be absolutely impossible to shoehorn something with data into our current framework, but it'd be a lot of work. |
We don't need to have a data input. The only thing that needs to be known is the size of the vector, which we already get. This is similar to what is done in the Is there an issue with making a |
Looking at the forum thread again, the thing being computed is relatively cheap. The memory overhead (2*N for autodiff compared to this method) would probably be the bigger concern with such a scheme. The specific transform used is technically an implementation detail, so there's nothing holding us to one choice going forward if we wanted to proceed with the feature with this transform for now |
I believe this is what I'm suggesting. It's a bit concerning that summing all the values isn't exactly 0 but I believe that's a finite precision thing. functions {
vector inv_ilr_sum_to_zero_constrain_lp(vector y) {
int N = rows(y) + 1;
vector[N - 1] ns = linspaced_vector(N - 1, 1, N - 1);
vector[N - 1] w = y ./ sqrt(ns .* (ns + 1));
vector[N] z = append_row(reverse(cumulative_sum(reverse(w))), 0) - append_row(0, ns .* w);
return z;
}
}
data {
int<lower=0> N;
}
parameters {
vector[N - 1] y;
}
transformed parameters {
vector[N] x = inv_ilr_sum_to_zero_constrain_lp(y);
}
model {
x ~ normal(0, 10);
}
generated quantities {
real sum_x = sum(x);
vector[N] alpha = softmax(x);
}
|
Jenkins Console Log Machine informationNo LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focalCPU: G++: Clang: |
@spinkney when freeing, we only assume a tolerance of 1e-8, so being around 1e-18 is definitely "close enough" to zero if it's just floating point issues. Perhaps a naive question, but shouldn't the |
Thanks so much, @spinkney.
No worries---you're right at double machine precision there: 1e-18. Brian was asking about a Jacobian adjustment, but we don't need one here---this is a linear transform on this scale (the We can write out the constraining transform as a function in the math library in C++ so that we can write custom autodiff for it to make it efficient. In C++, it can be written without allocating the linearly spaced vector---it'll be an expression template or loop.
I initially read that the wrong way. We have |
Yes! There is one potential issue and that is that in the forum post they put the prior on the unconstrained values. You're right that we can mimic this like in the functions {
vector inv_ilr_sum_to_zero_constrain_lp(vector y) {
int N = rows(y) + 1;
vector[N - 1] ns = linspaced_vector(N - 1, 1, N - 1);
vector[N - 1] w = y .* inv_sqrt(ns .* (ns + 1));
vector[N] z = append_row(reverse(cumulative_sum(reverse(w))), 0) - append_row(0, ns .* w);
target += normal_lpdf(y | 0, inv_sqrt(1 - inv(N)));
return z;
}
}
data {
int<lower=0> N;
}
parameters {
vector[N - 1] y;
real z;
}
transformed parameters {
vector[N] x = inv_ilr_sum_to_zero_constrain_lp(y);
}
model {
z ~ std_normal();
}
generated quantities {
real sum_x = sum(x);
} We see that the margins of
The issue with the above is that a standard normal is fairly constrained. I think it would be better if we did something like a normal(0, 10) and then the user can put a more restrictive prior on We can make this more flexible by using the where Since this is just a normal distribution the multiplier divides that expression above by the variance of the normal that one wants, i.e., functions {
vector inv_ilr_sum_to_zero_constrain_lp(vector y, real val) {
int N = rows(y) + 1;
vector[N - 1] ns = linspaced_vector(N - 1, 1, N - 1);
vector[N - 1] w = y .* inv_sqrt(ns .* (ns + 1));
vector[N] z = append_row(reverse(cumulative_sum(reverse(w))), 0) - append_row(0, ns .* w);
target += -0.5 * z^2 * (N - 1) * inv(N) / val^2;
return z;
}
}
data {
int<lower=0> N;
real<lower=0> val;
}
parameters {
vector[N - 1] y;
real z;
}
transformed parameters {
vector[N] x = inv_ilr_sum_to_zero_constrain_lp(y, val);
}
model {
// x ~ normal(0, val);
z ~ normal(0, val);
}
generated quantities {
real sum_x = sum(x);
} where For example with variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 x[1] -0.00615 -0.0118 4.01 4.00 -6.59 6.60 1.00 505206. 148873.
2 x[2] 0.00602 0.0153 3.99 3.99 -6.55 6.55 1.00 493809. 150968.
3 x[3] -0.00767 -0.0136 4.01 4.01 -6.59 6.59 1.00 489828. 145973.
4 x[4] 0.00270 0.0117 4.00 4.01 -6.58 6.59 1.00 495086. 146852.
5 x[5] 0.00980 0.00630 4.00 3.98 -6.58 6.58 1.00 471650. 148247.
6 x[6] 0.00886 0.00170 4.00 4.00 -6.57 6.58 1.00 482777. 147873.
7 x[7] -0.000581 -0.00169 4.00 4.00 -6.59 6.58 1.00 496548. 149560.
8 x[8] -0.00186 -0.00476 3.99 4.00 -6.54 6.56 1.00 485656. 147287.
9 x[9] -0.00414 -0.00847 4.00 3.99 -6.58 6.57 1.00 493152. 146167.
10 x[10] -0.00545 -0.0167 3.98 3.97 -6.55 6.57 1.00 490037. 146549.
11 x[11] 0.00150 -0.00995 4.01 4.01 -6.57 6.60 1.00 469046. 145016.
12 x[12] 0.00110 -0.0101 4.00 4.00 -6.58 6.56 1.00 510886. 147406.
13 x[13] -0.00696 -0.00231 4.00 3.99 -6.60 6.60 1.00 484046. 148396.
14 x[14] -0.000128 -0.00496 4.00 4.01 -6.57 6.61 1.00 488574. 150718.
15 x[15] -0.00184 0.000714 4.01 4.01 -6.62 6.62 1.00 493949. 149995.
16 x[16] 0.00480 0.0135 3.99 3.98 -6.57 6.58 1.00 486966. 146357.
17 z -0.000141 0.00174 4.00 4.00 -6.58 6.57 1.00 471868. 144767. |
I find @spinkney's constraining transform easier to understand with vector inv_ilr_sum_to_zero_constrain_lp(vector y) {
int N = rows(y);
vector[N] ns = linspaced_vector(N, 1, N);
vector[N] w = y ./ sqrt(ns .* (ns + 1));
vector[N] u = reverse(cumulative_sum(reverse(w)));
vector[N] v = ns .* w;
vector[N + 1] z = append_row(u, 0) - append_row(0, v);
return z;
} Then we need to derive the unconstraining transform from
Going in the reverse direction, we know A. Starting with
and thus
For the inductive case,
where and thus
where
and hence
Whew. That's still not code, though---the code will have to efficiently keep the running |
@bob-carpenter I think there's either a mistake in your code or I'm missing something in translation.
The final element seems to be too big by a factor of |
You need |
Summary
This adds the functions necessary to implement a
sum_to_zero
constraint. This is a vector withsum(x) == 0
. The constraining transform is to add an element set to-sum(previous elements)
, and the freeing transform just takes the first N-1 elements out.Because this is linear, there is no jacobian adjustment
Tests
Added tests in the err and constraint folders, based on the existing tests for
simplex
.Side Effects
None
Release notes
Added
sum_to_zero_constrain
,sum_to_zero_free
, andcheck_sum_to_zero
.Checklist
Copyright holder: Simons Foundation
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested