Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parametrization of softmax-augmented #43

Open
mjhajharia opened this issue Jul 23, 2022 · 5 comments
Open

parametrization of softmax-augmented #43

mjhajharia opened this issue Jul 23, 2022 · 5 comments

Comments

@mjhajharia
Copy link
Owner

@sethaxen if i remember correctly you suggested using p=1/N for the augmented softmax, I can see that the RMSE plots for that version are near straight lines or weird curves in some parametrizations and alright in some, the error isn't high or something but yeah. they come out similar to the rest when i take p=0.5 or something.

image

image

image

in contrast with this for p=0.5. do you have any thoughts about which values of p we should we go for in the actual paper

image

@sethaxen
Copy link
Collaborator

How is RMSE computed here?

The reason behind the choice of $p=1/N$ is that it empirically decorrelates the $y_i$ values. However, what I didn't look at is the effect it has on position and variance of the marginals. The choice of $p$ doesn't seem to impact marginal variance, but it shifts the mean by a lot, which is probably making adaptation hard.
softmax_aug_pcomp_n100
Given this, I'm not surprised it's failing for large $N$.

The choice of $p=1$ seems to always center the draws around the origins regardless of $N$. In fact, increasing $N$ leaves the marginal distribution of $y_i$ completely unchanged:
softmax_aug_p1_ncomp

I'm trying to work out a more principled choice of $p$ using some of the ideas in #9 (comment).

@sethaxen
Copy link
Collaborator

I also plan to look into @spinkney's observation in #37, which is interesting.

@spinkney
Copy link
Collaborator

In fact, the augmented simplex and the ILR are very, very similar. If I remove the Helmert matrix thing I get out this transform, except that it's parameterized nicely for HMC.

Here's the code. I'll make a pr for both and we can discuss what we want to do. Since the ILR is just a linear scaling of the input vector, I don't see how it is any different.

The below Stan model seems to work for all N > 1. The main thing is to set the base to 0 and update the log-abs-determinant for this.

data {
 int<lower=0> N;
 vector<lower=0>[N] alpha;
}
transformed data {
  real half_logN = 0.5 * log(N);
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
 real<lower=0> logr = log_sum_exp(append_row(y, 0));
 simplex[N] x = exp(append_row(y, 0) - logr);
}
model {
 target += sum(y) - N * logr + half_logN;
// target += target_density_lp(x, alpha);
}

@mjhajharia
Copy link
Owner Author

mjhajharia commented Jul 25, 2022 via email

@spinkney
Copy link
Collaborator

spinkney commented Jul 25, 2022

Actually, this is pretty funny. What I just did is the softmax parameterization just with a more efficient log-abs-det calculation.

It is equivalent to this. Let me close that PR and make a new one that updates the softmax code.

data {
 int<lower=0> N;
 vector<lower=0>[N] alpha;
}
transformed data {
  real half_logN = 0.5 * log(N);
}
parameters {
 vector[N - 1] y;
}
transformed parameters {
 simplex[N] x = softmax(append_row(y, 0));
}
model {
 target += sum(y) - N * log_sum_exp(append_row(y, 0)) + half_logN;
// target += target_density_lp(x, alpha);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants