Optimizing seems to get stuck at boundaries, dependent on seed #1928

Open
opened this Issue Jun 17, 2016 · 3 comments

Projects
None yet
3 participants

steven-murray commented Jun 17, 2016

 Hi I'm using pystan version 2.9.0. I have a model based on a truncated incomplete gamma distribution. When running optimizing(), the results depend very heavily on which seed I choose, and it seems to have something to do with the boundary values. That is, for one seed (121010), the optimization terminates after 2 iterations with all parameters at their boundaries (there are four parameters). I was firstly getting "Convergence detected: gradient norm is below tolerance", then when I set `tol_grad=0`, I got "relative gradient magnitude is below tolerance" and then when I set `tol_rel_grad=0` I got "absolute change in objective function was below tolerance". I fear setting anything more to zero will mean there's nothing to terminate the iterations. When I set the algorithm to "Newton" it works (doesn't seem to hit the boundaries). When setting to "BFGS" it has the same outcome as LBFGS. When using seed of 1234 (without any of those options specified), the result comes out nicely. I've verified that the lp__ values are correct at several of the iterations (modulo constant). The LBFGS method seems to work every time when using the function from scipy. I can list my stan code: ``````functions { /** * gammainc() is the upper incomplete gamma function (not regularized) * @param real a, shape parameter * @param real x, position > 0 * @returns the non-regularized incomplete gamma * NOTES: uses a recursion relation to calculate values for negative a. */ real gammainc(real a, real x){ int n; real ap1; real ssum; if(a>=0) return gamma_q(a,x) * tgamma(a); ap1 <- a+1; //Get floor(-a) n<-0; while(n<-a){ n <- n+1; } //Get summed part { vector[n] sums; for(i in 0:n-1) sums[i+1] <- pow(x,i)/tgamma(ap1+i); ssum <- sum(sums); } return tgamma(a)*(gamma_q(a+n,x)-pow(x,a)*exp(-x)*ssum); } /** * truncated_logGGD_log gives the log PDF of a variate whose exponential has a * lower-truncated generalised gamma distribution. * @param vector y, variate (should be log10(m/Hs) in the usual TGGD) * @param real ymin, truncation in y-space. * @param real alpha, power-law slope * @param real beta, cut-off parameter */ real truncated_logGGD_log(vector y, real ymin, real h, real alpha, real beta, real lnA){ vector[num_elements(y)] x; real xmin; real z; z <- (alpha+1)/beta; x <- exp(log10()*y*beta); xmin <- exp(log10()*ymin*beta); return sum(lnA + log(beta) + log(log10()) + log10()*h + log10()*y*(alpha+1) - x) - exp(lnA)*pow(10,h)*gammainc(z,xmin); } } data { int N; // number of halos vector[N] log_m; // measured halo masses real V; // Volume of the survey // CONTROLS FOR PARAMETER BOUNDS real hs_min; // Lower bound of logHs real hs_max; // Upper bound of logHs real alpha_min; // Lower bound of alpha real alpha_max; // Upper bound of alpha real beta_min; // Lower bound of beta real beta_max; // Upper bound of beta real lnA_min; // lower bound of lnA real lnA_max; // upper bound of lnA } transformed data { real log_mmin; log_mmin <- min(log_m); } parameters { real logHs; // Characteristic halo mass real alpha; // Power-law slope real beta; // Cut-off parameter real lnA; // Normalisation } transformed parameters { real raw_lnA; raw_lnA <- lnA + log(V); } model { vector[N] y; real ymin; y <- log_m-logHs; ymin <- log_mmin-logHs; y ~ truncated_logGGD(ymin, logHs, alpha, beta, raw_lnA); } `````` I am using data with ~2e5 variates: samples.txt from a distribution with Expected Values: 'logHs': 14.294849118059398, 'alpha': -1.868803779752156, 'beta': 0.6976327799172357, 'lnA': -42.58691124225525 I set the bounds as logHs -- (13.5,15.5) alpha -- (-1.99,-1.7) beta -- (0.4,1.2) lnA -- (-48,-37). and V is 45794539.9853329. I was previously setting the bounds to be wider on each parameter, but contracting the support has not solved the problem. I'm hoping I've provided enough information. Unfortunately I haven't tested if this happens for simpler problems -- so I don't know if it is problem-specific, or a more general problem within Stan. Since it doesn't always happen for my own problem, I would be hesitant to try to randomly find it in a simpler problem.
Contributor

bob-carpenter commented Jul 7, 2016 • edited

 Thanks for reporting. It doesn't look like anyone has responded. Did the comparison with the other L-BFGS implementation use the same transformations implied by the constrained parameters? I marked this a feature and a bug, at least until someone does some more investigating.

steven-murray commented Jul 7, 2016

 So for the other L-BFGS implementation, I believe there is no transformation performed. In that case, it seems to evaluate first p0 which you give it, and then explicitly evaluates the edges, then works its way in (sorry I'm not too well acquainted with the exact procedure). I would love to be more helpful in this (constructing a more minimal WE and doing a proper comparison), but I haven't really got the time at the moment, as is frustratingly often the case. I'm more than happy to trial things for you on my end though.
Contributor

bob-carpenter commented Jul 7, 2016

 You can try removing the bounds from the Stan program and initializing at a value in support and see if that works better. Maybe not if the problem was boundary values. It'd be interesting to see if the Stan model would be optimizable by the other program or if it's truly a problem with our L-BFGS implementation not being robust enough. Bob On Jul 6, 2016, at 9:45 PM, Steven Murray notifications@github.com wrote: So for the other L-BFGS implementation, I believe there is no transformation performed. In that case, it seems to evaluate first p0 which you give it, and then explicitly evaluates the edges, then works its way in (sorry I'm not too well acquainted with the exact procedure). I would love to be more helpful in this (constructing a more minimal WE and doing a proper comparison), but I haven't really got the time at the moment, as is frustratingly often the case. I'm more than happy to trial things for you on my end though. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.