Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to enable different models to be run with the same importance_sampling code? #3

Open
drphilmarshall opened this issue Feb 9, 2015 · 6 comments

Comments

@drphilmarshall
Copy link

Looks like the lnlike function in importance_sampling.py could be refactored to call out to one of any number (currently two) model's likelihoods. Let's do this refactoring so that we can explore different model choices without changing the way emcee is called at the hyper level. @kponder could do this safely from her MultiPop branch, before merging to her master and submitting a pull request.

The nice thing about this scheme is that then we will be able to define models by PGMs, and then implement them as single separated likelihood functions.

@drphilmarshall
Copy link
Author

Looks like there are other pieces of code that need refactoring in the same way, eg gen_dataset.py. Let's use a self-consistent naming scheme for models throughout the code and docs.

@kbarbary
Copy link
Collaborator

@drphilmarshall Passing in a custom lnlike function sounds like a good level of abstraction. If @kponder could do this refactoring as you suggest, that would be great. Eventually I think we'll want to merge a lot of this work into sncosmo and refactoring now will make that easier.

@kponder
Copy link
Contributor

kponder commented Feb 11, 2015

I wanted some input on the structure of the code.

I have a python code that contains all the information for the current two models. It creates a class for each type of population. Within each class it:

  1. Defines a set of initialized parameters. For single population: nsne, alpha, beta, x0_0, dim, initial positions, labels (for triangle plots), global bounds.
  2. Defines a naming scheme to put on to the end of all SN files. Plus, with the way the code is structured, it creates different testdata and samples directory for each set of populations and number of supernovae

**Any preference on naming scheme? I currently implemented one that tacks on a two letter description of the population plus the number of SNe in the model "testdata_SP_200" for Single Population with 200 total SNe

  1. Defines a function that gives arrays for z_true, x1_true, c_true, x0_true to use in gen_dataset.py .
  2. Defines the log-likelihood function

How do you want the gen_dataset, sample_lcs (for naming only), importance_sampling, plot_global to handle having multiple choices? I'm curious what kind of interface is most useful right now.

There are a few options.

  1. I can turn them each into modules and create another file to just run it all at once.
    ---probably this one is more useful in the future after it's been developed.
  2. Use command line inputs. For example, type "python importance_sampling.py MultiPop" and it will run with the right likelihood model
    --- you would have to watch what you're doing carefully when mixing generated datasets and fitting for hyper parameters from different models
  3. Or change the files manually through a definition at the top of the file
    --- I guess this is probably best for right now

@drphilmarshall
Copy link
Author

In answer to your last question:

I usually keep classes and functions in separate files, with matching
names, and then keep them all in a folder named after the repo:

mkdir snpgm
mv *.py snpgm

Then, you can turn this into an importable module by

touch snpgm/__init__.py

and editing this file so that it contains a list of import statements like:

from importance_sampler.py import *

When you do an

import snpgm

from python, you then get access to all the classes and functions in all
the files listed in the init.py file, as for example

sampler = snpgm.ImportanceSampler()

(if ImportanceSampler was a class defined in importance_sampler.py)

Most of my python repos have this structure, if you want to browse around.

On Tue, Feb 10, 2015 at 4:58 PM, Kara Ponder notifications@github.com
wrote:

I wanted some input on the structure of the code.

I have a python code that contains all the information for the current two
models. It creates a class for each type of population. Within each class
it:

Defines a set of initialized parameters. For single population: nsne,
alpha, beta, x0_0, dim, initial positions, labels (for triangle plots),
global bounds.
2.

Defines a naming scheme to put on to the end of all SN files. Plus,
with the way the code is structured, it creates different testdata and
samples directory for each set of populations and number of supernovae

**Any preference on naming scheme? I currently implemented one that tacks
on a two letter description of the population plus the number of SNe in the
model "testdata_SP_200" for Single Population with 200 total SNe

Defines a function that gives arrays for z_true, x1_true, c_true,
x0_true to use in gen_dataset.py .
2.

Defines the log-likelihood function


How do you want the gen_dataset, sample_lcs (for naming only),
importance_sampling, plot_global to handle having multiple choices? I'm
curious what kind of interface is most useful right now.

There are a few options.

  1. I can turn them each into modules and create another file to just run
    it all at once.
    ---probably this one is more useful in the future after it's been
    developed.
  2. Use command line inputs. For example, type "python
    importance_sampling.py MultiPop" and it will run with the right likelihood
    model
    --- you would have to watch what you're doing carefully when mixing
    generated datasets and fitting for hyper parameters from different models
  3. Or change the files manually through a definition at the top of the
    file
    --- I guess this is probably best for right now


Reply to this email directly or view it on GitHub
#3 (comment).

@kbarbary
Copy link
Collaborator

I would probably just go with the third option: setting a few parameters at the top of the main script. It's not a permanent solution, but that's OK: this is still just a sandbox!

The naming scheme for directories sounds fine.

When you feel like it, throw you code up on github and we can comment in a bit more detail on particulars, before merging. One of the nice things about github is that you can update a previously created Pull Request in response to comments, simply by making more commits on your branch, and pushing the branch to github again.

@kbarbary
Copy link
Collaborator

BTW, @rbiswas4 and I discussed a more permanent API (that might go into sncosmo) yesterday here in Berkeley. We ran into a difficulty that I'll describe in a separate Issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants