Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATMCMC2 #1569

Closed
wants to merge 20 commits into from
Closed

ATMCMC2 #1569

wants to merge 20 commits into from

Conversation

hvasbath
Copy link
Contributor

@hvasbath hvasbath commented Dec 1, 2016

finally, updated ATMCMC

follow up to
#1264
#1163

includes test and example

Copy link
Contributor

@springcoil springcoil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Can you fix that merge conflict? Needs a rebase

return x[(n_steps - 1)::n_steps]


def two_gaussians(x):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the new NormalMixture here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it just now, but I cannot get it running. Could please someone update the help of the shape of the arrays that are expected? My two mu1 and mu2 are shape (1 x n) how is the total mu matrix expected to be? 2 x n? 1 x 2*n? How are the weights and taus expected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.:

    pm.NormalMixture('mixture', 
                     w=np.array([.8, .2]), 
                     mu=np.array([-3., 5.]), 
                     sd=np.array([1., 1.]))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you insist on using that? ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not working?

@hvasbath
Copy link
Contributor Author

hvasbath commented Dec 2, 2016

Did the rebase ...

@hvasbath
Copy link
Contributor Author

hvasbath commented Dec 2, 2016

How can I start the test the way Travis does? To test if it is running?

Did manually:
nosetests --with-coverage test_atmcmc.py

Was working and showed:
Ran 1 test in 64.455s

Ok Travis is now crashing because per default no cPickle - shall I use only Pickle? Will be slower ...
I thought cPickle is almost a system package meanwhile? Was using 6 different machines lately and didnt have to install it extra.

@twiecki
Copy link
Member

twiecki commented Dec 9, 2016

======================================================================
ERROR: Failure: ImportError (No module named 'cPickle')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/minconda3/envs/testenv/lib/python3.5/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)
  File "/home/travis/minconda3/envs/testenv/lib/python3.5/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/home/travis/minconda3/envs/testenv/lib/python3.5/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/home/travis/minconda3/envs/testenv/lib/python3.5/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/travis/minconda3/envs/testenv/lib/python3.5/imp.py", line 244, in load_module
    return load_package(name, filename)
  File "/home/travis/minconda3/envs/testenv/lib/python3.5/imp.py", line 216, in load_package
    return _load(spec)
  File "<frozen importlib._bootstrap>", line 693, in _load
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 665, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/home/travis/build/pymc-devs/pymc3/pymc3/__init__.py", line 8, in <module>
    from .sampling import *
  File "/home/travis/build/pymc-devs/pymc3/pymc3/sampling.py", line 11, in <module>
    from .step_methods import (NUTS, HamiltonianMC, Metropolis, BinaryMetropolis,
  File "/home/travis/build/pymc-devs/pymc3/pymc3/step_methods/__init__.py", line 21, in <module>
    from .atmcmc import ATMCMC
  File "/home/travis/build/pymc-devs/pymc3/pymc3/step_methods/atmcmc.py", line 32, in <module>
    from ..backends import atmcmc_text as atext
  File "/home/travis/build/pymc-devs/pymc3/pymc3/backends/atmcmc_text.py", line 22, in <module>
    import cPickle as pickle
ImportError: No module named 'cPickle'

@hvasbath Seems like tests are still failing.

@hvasbath
Copy link
Contributor Author

hvasbath commented Dec 9, 2016

If you would be so kind to answer my questions I could fix the test ;) .

@springcoil
Copy link
Contributor

You could try running this with Docker - that'll replicate running the tests with travis.

@hvasbath
Copy link
Contributor Author

hvasbath commented Dec 10, 2016

Maybe I wasnt clear. I can run the test no problem.
My question is if I should downgrade to Pickle instead of using cPickle? Because, definitely everyone has Pickle.
Or do you have to change something in the Travis setup that it has cPickle as well?

...just changed it to pickle

@hvasbath
Copy link
Contributor Author

hvasbath commented Dec 10, 2016

Damn now no Queue module ... I need courses in python 3 ...

@hvasbath
Copy link
Contributor Author

hvasbath commented Dec 10, 2016

Uff I have no clue why this is failing there. If I run it locally with nose its working. Guess will have to try this docker thing ... but next week I will have to prepare for an interview so can be that this will take some time until I find the time ...

@ferrine
Copy link
Member

ferrine commented Dec 10, 2016

@hvasbath do you use conda or pure python?

@ColCarroll
Copy link
Member

Wait, this is pretty strange -- looks like your tests are failing because of some python3 things that changed from python2. For example, cPickle as renamed to just pickle in python3 because there weren't good reasons to keep using the old pickle library.

What is extra confusing is that the python2 tests are failing as well: in this stack I see

$ python --version
Python 2.7.12

but then the stack trace looks like it is using python3.5

  File "/home/travis/minconda3/envs/testenv/lib/python3.5/site-packages/nose/failure.py", line 39, in runTest
    raise self.exc_val.with_traceback(self.tb)

@ColCarroll
Copy link
Member

In particular, it may be that the setup script is no longer grabbing the python version from the environment correctly. I will take care of that, then at least half your tests will pass!

@hvasbath
Copy link
Contributor Author

I am using pure python 2.7. no conda.

@hvasbath
Copy link
Contributor Author

hvasbath commented Dec 11, 2016

Ok 2.7 passing, so thats clearly a python 3 related thing. Anyone any comment? Where it fails- it creates a list of booleans to hand to the sampling to determine whether to show the progressbar or not.
Somehow creating this list does not work as intended. line. 794-801 in atmcmc.py. I really have no clue. Sorry for that!

Copy link
Member

@ColCarroll ColCarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a really good contribution! Still not sure where that IndexError on travis is coming from -- let me know if you're out of time and I can run the tests locally to figure it out.

More generally, I'm most worried about the large functions being hard for anyone except you to refactor/maintain/test. On the other hand, it will not break any existing stuffYou did a good job keeping it all decoupled enough from everything else I think it is ok being merged. I suspect we could then work on sharing some of this machinery with the rest of the library so it is a little more manageable and well tested.


def setUp(self):
super(TestATMCMC, self).setUp()
self.test_folder = mkdtemp(prefix='ATMIP_TEST')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have a matching tearDown method that deletes the test_folder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right!

Containing all the information that is unique for each Markov Chain
i.e. [:class:'ATMCMC', chain_number(int),
sampling index(int), start_point(dictionary)]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a doc for pshared

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k


pm._log.info('Sampling ...')

pshared = dict(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this could be a namedtuple, so you can make sure all the fields are filled in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will have to look into that. I didnt know that that exists ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can put whatever is meant not to be forked, so this differs for each function that is supposed to be parallelised, so if I understood correct this wont work with a named tuple ...

work = [(step, chain, idx, step.population[step.resampling_indexes[chain]])
for chain, idx in zip(chains, idxs)]

for chain in tqdm(atext.parimap(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

underscore indicates that you aren't using the variable (for _ in tqdm(...))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

step.n_chains / n_jobs has to be an integer number!
tune : int
Number of iterations to tune, if applicable (defaults to None)
trace : string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this just be called homepath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! will do!


return a_list

def srmap(self, tarray):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is used in some of my models. As I converted the code from the beat package to here, I missed to remove that. I will kill it.

array[slc] = list_arrays[list_ind].ravel()
return array

def f3map(self, list_arrays):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will kill as well ..

More efficient Queueing compared to Pool in multiprocessing.
Adapted from the pyrocko package. http://www.pyrocko.org
"""
assert all(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertions can be turned off when a library is run -- you might be better off manually raising an AssertionError or TypeError (or custom error!) instead.

@hvasbath
Copy link
Contributor Author

Thanks a lot @ColCarroll for all the comments! And I am happy that you like it. It took a good part of my last year to put that together, so I would be very happy if it becomes more useful.
Unfortunatley, I wont have time to work on your comments until next year I guess, as I have to prepare for an interview and then is christmas ... Maybe some evening...
Yes @twiecki and I have been discussing quite some time about that being seperate from the core code and how to integrate it more, but that will evolve naturally prbably. Yes, I totally agree there need to be much more tests, but so far I had no time to implement them. I am however constantly running it with models on my computer so it is defenitely tested. Just not in a suite. If you have ideas or notice particular things of how to design a test I am happy to take the your suggestions!

@twiecki
Copy link
Member

twiecki commented Dec 12, 2016

Need to rebase after #1592.

@hvasbath
Copy link
Contributor Author

@twiecki or someone else can you please post the github command line, I am still not so familiar with many of the functions...

@twiecki
Copy link
Member

twiecki commented Dec 15, 2016

@hvasbath http://stackoverflow.com/questions/7244321/how-do-i-update-a-github-forked-repository (in the last line you would need to do that from your PR branch instead of master).

@twiecki
Copy link
Member

twiecki commented Dec 19, 2016

@hvasbath Any luck?

@springcoil
Copy link
Contributor

@hvasbath Any luck on updating this. I think all you have to do is follow the instructions that Thomas directed you to.

@hvasbath
Copy link
Contributor Author

Sorry guys for being so inactive here- was swamped with my main job. Will look into incoorporating things right now ...

@twiecki
Copy link
Member

twiecki commented Jan 21, 2017

Can you also look at #1689?

@hvasbath
Copy link
Contributor Author

hvasbath commented Jan 21, 2017

Did that @twiecki . However, I wont have time to read the paper. I dont know about emcee so far. Looks like this would create an additional dependency emcee, which I am personally not a big fan of. So I would support @fonnesbeck s comment.
Do you want me to look at something in particular?

@twiecki
Copy link
Member

twiecki commented Jan 21, 2017 via email

@hvasbath
Copy link
Contributor Author

Yes in terms of backend it seems quite elegant! However, it is only a numpy array backend and keeps everything in memory. I wonder how feasible that is if the chains get long with many variables? But again this is rather related to the algorithm-maybe it doesnt need to run a long time?

In terms of using it for ATMCMC I think it is not straight forward, as it still relies as well on rerunning the theano logp model function (iter_fn). Which is fine if you only want to record the input variables. But in ATMCMC we need to safe the model likelihoods to evaluate the coefficient of variation in the transition step, which would result in recalculating the model twice.

@hvasbath
Copy link
Contributor Author

If the python3 version still fails I am going to need help that someone executes it as @ColCarroll suggested ;)
It works locally on my python2.7 version run with nosetest.

@hvasbath
Copy link
Contributor Author

As expected so I am going to need help here. The python 2 test is running through ...

======================================================================

ERROR: test_sample (pymc3.tests.test_atmcmc.TestATMCMC)


Traceback (most recent call last):

File "/home/travis/build/pymc-devs/pymc3/pymc3/tests/test_atmcmc.py", line 72, in test_sample

rm_flag=False)

File "/home/travis/build/pymc-devs/pymc3/pymc3/step_methods/atmcmc.py", line 635, in ATMIP_sample

_iter_parallel_chains(**sample_args)

File "/home/travis/build/pymc-devs/pymc3/pymc3/step_methods/atmcmc.py", line 820, in _iter_parallel_chains

total=len(chains)):

File "/home/travis/miniconda3/envs/testenv/lib/python3.6/site-packages/tqdm/_tqdm.py", line 830, in iter

for obj in iterable:

File "/home/travis/build/pymc-devs/pymc3/pymc3/backends/atmcmc_text.py", line 107, in parimap

yield function(*args, **kwargs)

File "/home/travis/build/pymc-devs/pymc3/pymc3/step_methods/atmcmc.py", line 766, in work_chain

progressbar = progressbars[idx]

IndexError: list index out of range

-------------------- >> begin captured logging << --------------------

pymc3: INFO: Init new trace!

pymc3: INFO: Sample initial stage: ...

pymc3: INFO: Beta: 0.000000 Stage: 0

pymc3: INFO: Initialising chain traces ...

pymc3: INFO: Sampling ...

--------------------- >> end captured logging << ---------------------

@hvasbath
Copy link
Contributor Author

Finally, I found a reference that you statisticians @twiecki @ColCarroll might know better. The algorithm here is actually a sequential monte carlo algorithm. DelMoral et al. 2006 Sequential Monte Carlo Samplers J. R. Statist. Soc. B. I guess I will rename it this way...

@springcoil
Copy link
Contributor

I'm all for you renaming this to Sequential Monte Carlo. Is there anything you need help with to get this merged?

@hvasbath
Copy link
Contributor Author

Created a new PR here:
#1923

@hvasbath hvasbath closed this Mar 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants