Is emcee converging? #11

kponder · 2015-03-03T22:02:00Z

@kbarbary had some worries about this at one point, says @drphilmarshall… What do you think?

rbiswas4 · 2015-03-03T23:30:01Z

I assume this is the emcee run to get parameters, correct? Maybe you guys are already doing this or have this stage covered, but if not I have a suggestion for this: Let us create a branch where we replace the SN light curve parameter posteriors with Gaussians, so that we don't have to run light curve sampling. Then the first stage for a large number of SN is trivial and we can see how well emcee converges and how many samples are required for the importance sampling to work (sort of #6). I also see suggestions from @kbarbary and further discussions of Gaussians there from @drphilmarshall on that issue thread. I can outline this a little further later this week. I was thinking of doing this, but it would be good to understand if people already have clear opinions/objections to this.

wmwv · 2015-03-03T23:36:02Z

I would suggest this be implemented as a test suite within the current main. This is generally an option we would like to have. Putting it as a branch implies we are just going to test it once.

rbiswas4 · 2015-03-03T23:40:34Z

I would suggest this be implemented as a test suite within the current main. This is generally an option we would like to have. Putting it as a branch implies we are just going to test it once.

@wmwv : I meant creating a suitable repository branch to experiment on. Once we are convinced that this works as desired, hopefully everyone would agree on merging it.

wmwv · 2015-03-04T00:24:35Z

Of, of course a branch to develop this makes complete sense.

I guess I didn't understand what the design changes would be to allow the testing in general. I anticipate that will be worked out by part of this experimental testing.

rbiswas4 · 2015-03-04T02:01:48Z

@wmwv : As far as I understand, currently, there are three steps:

gen_dataset: which gives us values of the true SN parameters
sample_lcs : which gives as samples of the posterior distributions of the unknown parameters of the SN model
Importance Sampling: which re-weights the samples of the light curve posterior distribution according to the values of mu corresponding to the cosmological parameters to build up the posterior.

Step 2 is time-consuming (though parallelizable) with more time required when n_SNE, and numSamples for each SNE is increased. Also, this means that the final answer depends on both the performance of the light curve sampler and the method itself, as would be the case in real life.

The suggestion is to temporarily replace Step 2 by a case where for each SN we write a file/array of samples (just as we would have done for posterior samples), but rather than using posterior samples these samples are drawn from a multivariate normal distribution with a mean and covariance that we will insert by hand. This will be fast since we can use numpy to get these samples and study scaling with different numbers of SN and samples. It will also pose a clearly defined problem only for step 3, without depending on the performance of the sampler in step 2. The really big downside is to decide
what means and covariances to put into the distribution function for these temporary stages. To start with, I would be happy to simplify by using a mean equal to the true parameter values and complicate later. For the covariance, I am more confused .... maybe we should sample light curve parameters for a few SN in each redshift bin, take the 'average' covariance in each redshift bin and put them on all 'SN' ? This could be done from the SN for which we did get light curves.

That was roughly what I had in mind ....

wmwv · 2015-03-04T17:03:15Z

This sounds most reasonable and well-thought-out.

I think my main suggestion is at one design level up from this.
What's the wrapping going to be that allows Step 2 to be modular?
If there is no such wrapping, what will it mean to merge this back into master?

I'm concerned by the phrase "temporarily". It still makes me think that this is planned as a dead-end branch. Are we still going to be able to do these tests when you merge back in to master? What will you be merging back in? As part of this added functionality, you (and others) should decide how it's going to fit back in to the main functionality.

For perhaps some relevant context, I'm implicitly assuming that we'll be adopting something like a Github Flow development model:

https://guides.github.com/introduction/flow/

wmwv · 2015-03-04T17:04:05Z

And, also, implicitly, that all features we think of will likely be useful and we want them in master.

drphilmarshall · 2015-03-04T17:44:27Z

Agreed on all workflow points. Being able to test emcee convergence and
possibly importance sampling performance using a high speed approximate PDF
sounds like a good idea, and can be implemented as an option (so that it
can be merged into master without affecting its default functionality.

@rbiswas4, I write "possibly" above because it's not yet clear to me that
the simple Gaussian substitution you are proposing will still allow
meaningful importance sampling testing. I think it would if the PGM is
still followed, in stage 1 as well as stage 2, and we derive all the
implications of the simple Gaussian sampling you want to perform. That is:
there is math to be written down first before implementing anything! :-)

On Wed, Mar 4, 2015 at 9:04 AM, wmwv notifications@github.com wrote:

And, also, implicitly, that all features we think of will likely be useful
and we want them in master.

—
Reply to this email directly or view it on GitHub
#11 (comment).

rbiswas4 · 2015-03-05T03:36:32Z

@wmwv : @kbarbary and @kponder can correct me, but I think, at present steps 1, 2, 3 are actually separate scripts, and 3 works by reading in files from disk created by 2. So, if this is true, there is not too much of a design question. Of course, we would want to do things better later, and this will be relevant there but 2 will have to be easily parallelizable for real stuff.

@drphilmarshall I don't understand your point completely and did not understand your previous comment at all when I read it the last time, but after thinking about it today, I may have a hint of what you are worried about. I need to think about a little more to see if what I was trying to suggest actually works or not. Once there, I could try to see if adding to the math document helps in phrasing what could be done.

kbarbary · 2015-03-09T20:16:42Z

Just now catching up on issues after being gone last week.

Regarding workflow and design decisions: Personally, I'm still operating under the assumption that this repo is just a sandbox and that we want to implement the "real thing" in sncosmo. (In sncosmo we follow the GitHub Flow development model for any contributions, as @wmwv suggested.) It sounds like people want to start working on the "real thing" rather than this toy repo. After I review and merge #9, I'll write down a concrete proposal for how to move this work into sncosmo.

While I think it makes sense to move this to sncosmo, others may disagree... let me know if you do! I love the active discussion here and hope it can continue in sncosmo.

rbiswas4 · 2015-03-10T02:55:01Z

@kbarbary I am not sure if you are suggesting moving this as a branch to SNCosmo now, or integrating it later. Maybe your proposal would convince me otherwise, but I at this stage, it is nicer to have outside of SNCosmo. In fact, since this solves a general problem, it may be worth keeping it outside and importing. On the other hand, too many groups have not developed light curve samplers, so maybe this is not as important to make it as useful as possible?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is emcee converging? #11

Is emcee converging? #11

kponder commented Mar 3, 2015

rbiswas4 commented Mar 3, 2015

wmwv commented Mar 3, 2015

rbiswas4 commented Mar 3, 2015

wmwv commented Mar 4, 2015

rbiswas4 commented Mar 4, 2015

wmwv commented Mar 4, 2015

wmwv commented Mar 4, 2015

drphilmarshall commented Mar 4, 2015

rbiswas4 commented Mar 5, 2015

kbarbary commented Mar 9, 2015

rbiswas4 commented Mar 10, 2015

Is emcee converging? #11

Is emcee converging? #11

Comments

kponder commented Mar 3, 2015

rbiswas4 commented Mar 3, 2015

wmwv commented Mar 3, 2015

rbiswas4 commented Mar 3, 2015

wmwv commented Mar 4, 2015

rbiswas4 commented Mar 4, 2015

wmwv commented Mar 4, 2015

wmwv commented Mar 4, 2015

drphilmarshall commented Mar 4, 2015

rbiswas4 commented Mar 5, 2015

kbarbary commented Mar 9, 2015

rbiswas4 commented Mar 10, 2015