Refactor to enable different models to be run with the same importance_sampling code? #3

drphilmarshall · 2015-02-09T21:51:32Z

Looks like the lnlike function in importance_sampling.py could be refactored to call out to one of any number (currently two) model's likelihoods. Let's do this refactoring so that we can explore different model choices without changing the way emcee is called at the hyper level. @kponder could do this safely from her MultiPop branch, before merging to her master and submitting a pull request.

The nice thing about this scheme is that then we will be able to define models by PGMs, and then implement them as single separated likelihood functions.

The text was updated successfully, but these errors were encountered:

drphilmarshall · 2015-02-09T22:09:33Z

Looks like there are other pieces of code that need refactoring in the same way, eg gen_dataset.py. Let's use a self-consistent naming scheme for models throughout the code and docs.

kbarbary · 2015-02-10T17:35:52Z

@drphilmarshall Passing in a custom lnlike function sounds like a good level of abstraction. If @kponder could do this refactoring as you suggest, that would be great. Eventually I think we'll want to merge a lot of this work into sncosmo and refactoring now will make that easier.

kponder · 2015-02-11T00:58:35Z

I wanted some input on the structure of the code.

I have a python code that contains all the information for the current two models. It creates a class for each type of population. Within each class it:

Defines a set of initialized parameters. For single population: nsne, alpha, beta, x0_0, dim, initial positions, labels (for triangle plots), global bounds.
Defines a naming scheme to put on to the end of all SN files. Plus, with the way the code is structured, it creates different testdata and samples directory for each set of populations and number of supernovae

**Any preference on naming scheme? I currently implemented one that tacks on a two letter description of the population plus the number of SNe in the model "testdata_SP_200" for Single Population with 200 total SNe

Defines a function that gives arrays for z_true, x1_true, c_true, x0_true to use in gen_dataset.py .
Defines the log-likelihood function

How do you want the gen_dataset, sample_lcs (for naming only), importance_sampling, plot_global to handle having multiple choices? I'm curious what kind of interface is most useful right now.

There are a few options.

I can turn them each into modules and create another file to just run it all at once.
---probably this one is more useful in the future after it's been developed.
Use command line inputs. For example, type "python importance_sampling.py MultiPop" and it will run with the right likelihood model
--- you would have to watch what you're doing carefully when mixing generated datasets and fitting for hyper parameters from different models
Or change the files manually through a definition at the top of the file
--- I guess this is probably best for right now

drphilmarshall · 2015-02-11T03:35:45Z

In answer to your last question:

I usually keep classes and functions in separate files, with matching
names, and then keep them all in a folder named after the repo:

mkdir snpgm
mv *.py snpgm

Then, you can turn this into an importable module by

touch snpgm/__init__.py

and editing this file so that it contains a list of import statements like:

from importance_sampler.py import *

When you do an

import snpgm

from python, you then get access to all the classes and functions in all
the files listed in the init.py file, as for example

sampler = snpgm.ImportanceSampler()

(if ImportanceSampler was a class defined in importance_sampler.py)

Most of my python repos have this structure, if you want to browse around.

On Tue, Feb 10, 2015 at 4:58 PM, Kara Ponder notifications@github.com
wrote:

I wanted some input on the structure of the code.

I have a python code that contains all the information for the current two
models. It creates a class for each type of population. Within each class
it:

Defines a set of initialized parameters. For single population: nsne,
alpha, beta, x0_0, dim, initial positions, labels (for triangle plots),
global bounds.
2.

Defines a naming scheme to put on to the end of all SN files. Plus,
with the way the code is structured, it creates different testdata and
samples directory for each set of populations and number of supernovae

**Any preference on naming scheme? I currently implemented one that tacks
on a two letter description of the population plus the number of SNe in the
model "testdata_SP_200" for Single Population with 200 total SNe

Defines a function that gives arrays for z_true, x1_true, c_true,
x0_true to use in gen_dataset.py .
2.

Defines the log-likelihood function

How do you want the gen_dataset, sample_lcs (for naming only),
importance_sampling, plot_global to handle having multiple choices? I'm
curious what kind of interface is most useful right now.

There are a few options.

I can turn them each into modules and create another file to just run
it all at once.
---probably this one is more useful in the future after it's been
developed.

Use command line inputs. For example, type "python
importance_sampling.py MultiPop" and it will run with the right likelihood
model
--- you would have to watch what you're doing carefully when mixing
generated datasets and fitting for hyper parameters from different models

Or change the files manually through a definition at the top of the
file
--- I guess this is probably best for right now

—
Reply to this email directly or view it on GitHub
#3 (comment).

kbarbary · 2015-02-11T17:07:50Z

I would probably just go with the third option: setting a few parameters at the top of the main script. It's not a permanent solution, but that's OK: this is still just a sandbox!

The naming scheme for directories sounds fine.

When you feel like it, throw you code up on github and we can comment in a bit more detail on particulars, before merging. One of the nice things about github is that you can update a previously created Pull Request in response to comments, simply by making more commits on your branch, and pushing the branch to github again.

kbarbary · 2015-02-11T17:14:41Z

BTW, @rbiswas4 and I discussed a more permanent API (that might go into sncosmo) yesterday here in Berkeley. We ran into a difficulty that I'll describe in a separate Issue.

drphilmarshall mentioned this issue Feb 10, 2015

MultiPop testing: run SinglePop model on MultiPop data and vice versa #7

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to enable different models to be run with the same importance_sampling code? #3

Refactor to enable different models to be run with the same importance_sampling code? #3

drphilmarshall commented Feb 9, 2015

drphilmarshall commented Feb 9, 2015

kbarbary commented Feb 10, 2015

kponder commented Feb 11, 2015

drphilmarshall commented Feb 11, 2015

kbarbary commented Feb 11, 2015

kbarbary commented Feb 11, 2015

Refactor to enable different models to be run with the same importance_sampling code? #3

Refactor to enable different models to be run with the same importance_sampling code? #3

Comments

drphilmarshall commented Feb 9, 2015

drphilmarshall commented Feb 9, 2015

kbarbary commented Feb 10, 2015

kponder commented Feb 11, 2015

drphilmarshall commented Feb 11, 2015

kbarbary commented Feb 11, 2015

kbarbary commented Feb 11, 2015