# SMC #1923

Closed
wants to merge 14 commits into
from

## Conversation

Projects
None yet
9 participants
Contributor

### hvasbath commented Mar 20, 2017

 Sequential Monte Carlo - renamed from ATMCMC here: #1569 Sampling speed improvements, rebased on master ...

### hvasbath added some commits Mar 5, 2017

``` renaming to smc and improved paralell sampling, random seed ```
``` d2736fa ```
``` changed also init ```
``` bb68b82 ```
``` added back list bijections ```
``` 8a677cf ```
``` fixed example and proposal distribution ```
``` 0009ca8 ```

Closed

Closed

Contributor

### hvasbath commented Mar 20, 2017

 One test is failing I have no clue why. What is different there to the others ...?

### twiecki reviewed Mar 20, 2017

 return x[(n_steps - 1)::n_steps] def two_gaussians(x):

#### twiecki Mar 20, 2017

Member

This can be simplified to:

``````pm.NormalMixture('two_gaussians', mu=[mu1, mu2], sigma=[dsigma, isigma], w=[w1, w2])
``````

#### hvasbath Mar 20, 2017

Contributor

Will try to do that, last time I tried it was absolutely not straight forward, as mu is n-dimensional .

#### twiecki Mar 20, 2017

Member

Oh, I missed that. So mu1 and mu2 are not scalars? In that case it might be too much trouble. @AustinRochford Do you think that should be possible?

#### AustinRochford Mar 20, 2017

Member

@twiecki yes, right now `Mixture` only supports 1d mixtures. There are some subtleties in supporting multidimensional distributions and weights that vary with observed data points (necessary for DDR, for example) that I am thinking about but have not resolved.

Member

### twiecki commented Mar 20, 2017

 @hvasbath Can you convert the example to an IPython NB, with description of the what and why?
Member

### twiecki commented Mar 20, 2017

 ``````====================================================================== ERROR: test_sample (pymc3.tests.test_smc.TestSMC) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/travis/build/pymc-devs/pymc3/pymc3/tests/test_smc.py", line 72, in test_sample rm_flag=False) File "/home/travis/build/pymc-devs/pymc3/pymc3/step_methods/smc.py", line 691, in ATMIP_sample step.select_end_points(mtrace) File "/home/travis/build/pymc-devs/pymc3/pymc3/step_methods/smc.py", line 360, in select_end_points n_steps = len(mtrace) File "/home/travis/build/pymc-devs/pymc3/pymc3/backends/base.py", line 316, in __len__ chain = self.chains[-1] IndexError: list index out of range `````` Looks like a python 3 issue. Perhaps a iterator is used where you expect a list?
Contributor

### hvasbath commented Mar 20, 2017

 It looks like the sampling hasnt been executed, as the mtrace is empty. So there is maybe a problem with the atext.paripool function regarding python3? Can you/someone look at that function if you see a python 3 problem there? I have absolutely no experience with notebooks. They cannot be opened with a texteditor. What do I have to do to create a notebook? If it is difficult with system crashing installations to python3 etc I will likely have no time for that...
Contributor

### hvasbath commented Mar 20, 2017

 It can also be that the outfolder that is defined in the example causes the problem, because it is being created in the execution path? Maybe that crashes Travis? What do you think? Shall I also use temporal directories here? @twiecki

### hvasbath reviewed Mar 20, 2017

 import theano.tensor as tt from matplotlib import pyplot as plt test_folder = ('ATMIP_TEST')

#### hvasbath Mar 20, 2017

Contributor

This test folder here ...

Member

### twiecki commented Mar 20, 2017

 Shall I also use temporal directories here? Definitely, there's nice python API for that, I think.
Contributor

### hvasbath commented Mar 20, 2017 • edited

 Ok created the notebook and removed the example.
``` added ipython notebook removed example ```
``` cdf977b ```
Member

### twiecki commented Mar 20, 2017

 Seems like NB did not run to completion.
``` notebook finished running ```
``` a4d8180 ```
Contributor

### hvasbath commented Mar 20, 2017 • edited

 I dont know why the figure is not being displayed in the notebook. You used exactly the same syntax in the getting started case... Must be some local problem on my computer I guess. K figured it out needs %matplotlib inline command
Contributor

### hvasbath commented Mar 20, 2017

 Ok it definitely was not the temporal directory ... So we need a python3 expert here ...
``` traceplot shows inline ```
``` f959993 ```
Member

### twiecki commented Mar 21, 2017

 @hvasbath The trick is to use `%matplotlib inline` as the first cell. Then you don't need the plt.shows.

### twiecki reviewed Mar 21, 2017

 "pm.traceplot(mtrace, transform=last_sample, combined=True);\n" "axs = pm.traceplot(mtrace, transform=last_sample, combined=True)\n", "\n", "plt.show()"

#### twiecki Mar 21, 2017

Member

shouldn't need this.

``` removed plt.show() ```
``` a553d51 ```
Contributor

### hvasbath commented Mar 21, 2017 • edited

 Ok did a python3 installation on another computer and was playing around. Somehow the work list that is being created in smc.py line. 857 -860 has a length of zero. I have no clue why, probably because of some changes with zip or maybe the map that has an influence on creating the progressbar lists. Then in the smc_text backend in the paripool function there is a usage of map and apparently this function has been changed a lot from python2 to 3. To conclude it will be absolutely not straight forward to get it running for both python versions simultaneously with the same code. For that I am lacking too much knowledge about python3 and I wont have time looking into it for the next monthes ...
Member

### twiecki commented Mar 21, 2017

 from six import zip … On Mar 21, 2017 12:12 PM, "Hannes Vasyura-Bathke" ***@***.***> wrote: Ok did a python3 installation on another computer and was playing around. Somehow the work list that is being created in line. 857 -860 has a length of zero. I have no clue why, probably because of some changes with zip or maybe the map that has an influence on creating the progressbar lists. Then in the smc_text backend in the paripool function there is a usage of map and apparently this function has been changed a lot from python2 to 3. To conclude it will be absolutely not straight forward to get it running for both python versions simultaneously with the same code. For that I am lacking too much knowledge about python3 and I wont have time looking into it for the next monthes ... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1923 (comment)>, or mute the thread .
Contributor

### hvasbath commented Mar 21, 2017

 cannot import name 'zip'
Member

### twiecki commented Mar 21, 2017

 Sorry, `from six.moves import zip` I think you can do the same with map.
Contributor

### hvasbath commented Mar 21, 2017 • edited

 Thanks for the hints @twiecki !Just tried it, also putting list() around the map. Now it runs with njobs=1, but still not for n>1 Somehow it doesnt run through the generator. Only does the first iteration and then leaves the loop.
``` python3 single core running, to be fixed mutlicore ```
``` 5d69327 ```
Contributor

### hvasbath commented Apr 6, 2017

 Sorry for being unresponsive. I just returned from a vacation. I am glad it also runs for you @ColCarroll within reasonable times! Putting an EXPERIMENTAL label on it is perfectly fine for me. I am sorry for the amount of code, I would have been happy if it was less, probably it could have been. But I am programming in python only since 1 1/2 years so I guess there might be a lot of room for improvement in terms of code style and factorization. Anywhere a "proper" programmer by profession? I am keen to get your remarks @aloctavodia !
Member

### twiecki commented Apr 6, 2017

 We should also test how well SMC does. It should do pretty poorly in high dimensions, but perhaps for mixture models it works well?
Member

### aloctavodia commented Apr 6, 2017

 @twiecki I guess it should perform the same as Metropolis, except that it should do better for multimodal posteriors, right?. One possible extension of SMC could be to use NUTS instead of Metropolis. @hvasbath I am not a "proper" programmer, but I would like to help, from improving the code to testing the performance/suitability to different scenarios. How should I proceed to help? Should I wait for the merge under "experimental conditions" ?
Member

### ColCarroll commented Apr 6, 2017

 I can make a PR today on @hvasbath's repo adding warnings (other option is to develop on a branch, which might make sense here since it is quite decoupled). Any good references to read up on this?
Contributor

### hvasbath commented Apr 6, 2017

 @twiecki No it does very well so far under any circumstances, so far I had no rpoblem with any model. It does especially well in high dimensions thats the whole point of it. It is just a matter of number of chains you use to sample. The higher the dimensions the higher the number of chains should be defined. There are two references given in the code, another one probably the most original one: http://www.stats.ox.ac.uk/~doucet/delmoral_doucet_jasra_sequentialmontecarlosamplersJRSSB.pdf
Member

### aseyboldt commented Apr 6, 2017

 You could also use the fixtures in `sampler_fixtures.py` to test if posterior samples for a couple of distributions are consistent with what we expect. This helped me a lot when I was working on NUTS. The tests seemed to detect even relatively small biases. For this you need to add a new class in `sampler_fixtures` like `BaseSampler` and add a couple of tests that use this in `test_posteriors.py`
Member

### fonnesbeck commented Apr 9, 2017

 Perhaps rather than a EXPERIMENTAL tag, we could adopt a `pymc3.sandbox` submodule as a staging area for algorithms that we would like merged but are not yet ready for use in production.
Member

### twiecki commented Apr 10, 2017

 @fonnesbeck I like the idea in principle but worry about name-space issues. This PR, for example, adds code in various parts, including the backends, would those live in the sandbox too?
Member

### ferrine commented Apr 10, 2017 • edited

 I like sandbox idea, all changes outside it should be just reviewed and approved by pymc-team. That points on our not perfect enough inner API or architecture that is not extensible sometimes.
Member

### fonnesbeck commented Apr 10, 2017 • edited

 The sandbox would be more for methodologically experimental code, rather than for infrastructure like backends, I should think. More complicated changes would have to remain in their own PR until they are ready.
Contributor

### hvasbath commented Apr 13, 2017

 I like the sandbox idea! Somehow I am lost, is there anything that I should do next?
Member

### aloctavodia commented Apr 13, 2017

 I have found another reference that could be useful to read. I just gave it a quick look, but it seems like an improved version of the Del Moral's paper. https://arxiv.org/abs/1504.05753, http://ieeexplore.ieee.org/document/7339702/ BTW, I have been confused all this time thinking ATMCMC/SMC was a variant of Replica Exchange/Parallel Tempering, but they are really different algorithms!
Member

### twiecki commented Apr 13, 2017

 @hvasbath Seems like there are still failing tests, have you looked at those?
Contributor

### hvasbath commented Apr 13, 2017

 They are all running. As @ColCarroll stated somehow in travis the tests take much longer then when run locally. So thats a problem in the travis setup, which cannot take care of. @aloctavodia thanks for the reference I will check it out!
Contributor

### hvasbath commented Apr 13, 2017

 Oh that paper seems really cool. I always thought what a waste to keep only the last samples! They mainly improved on that. Thats great! However, I wont be able to implement that soon as I have to focus on writing my articles. But I will do it at some point- if no one else did take care of it in the meanwhile ...
Member

### twiecki commented Apr 13, 2017 • edited

 I think this PR is close to being merged. I would add an experimental warning (we can always move it to a sandbox separately before 3.1). However, the timeouts in the unittest are a blocker, not sure how we can resolve that. Notably, it only happens in python 2.
Member

### ColCarroll commented Apr 13, 2017

 Ah yeah, this was frustrating: it was suspiciously close to taking twice as long in python 2 as python 3. I'll take another look. … On Thu, Apr 13, 2017 at 5:40 AM Thomas Wiecki ***@***.***> wrote: I think this PR is close to being merged. I would add an experimental warning (we can always add the sandbox separately before 3.1). However, the timeouts in the unittest are a blocker, not sure how we can resolve that. Notably, it only happens in python 2. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1923 (comment)>, or mute the thread .
Member

### ColCarroll commented Apr 13, 2017

 Super strange. I ran the file `run_test.py`: ``````import pytest if __name__ == '__main__': pytest.main(['-vx', 'pymc3/tests/test_smc.py']) `````` Both `python run_test.py` and `python -m cProfile -o time.out run_test.py` take ~6 minutes on my machine, but `cprofilev run_test.py` takes ~80seconds (cprofilev). Luckily, both give a profile output, and the big difference is: normal run: ``````ncalls tottime percall cumtime percall filename:lineno(function) 11577 300.409 0.026 300.409 0.026 {method 'acquire' of 'thread.lock' objects} `````` cprofilev run: `````` ncalls tottime percall cumtime percall filename:lineno(function) 131 0.000 0.000 0.000 0.000 {method 'acquire' of 'thread.lock' objects} `````` I don't know what this means yet, but it accounts for all of the timing difference. Will check some more after work.
Contributor

### hvasbath commented Apr 13, 2017 • edited

 Wow! Thanks a lot for investigating that! I thought we use no threading as it is mutliprocessing it should be forking? Then I wouldnt be surprised. Is that the GIL he's wainting for, because he uses threading instead of forking? In my pre-restructuring version anything to python3 it also has this short runtime ... Somehow joblib does something we dont want ;) ...
Member

### junpenglao commented Apr 16, 2017

 Great work @hvasbath ! Just wondering what is the status on this? I have a model which doesnt work very well in NUTS (the geometry is not continuous) and using Metropolis returns broadcast error #1983. I am curious to try SMC on it.
Member

### ColCarroll commented Apr 16, 2017

 Working on getting tests to pass- both he and I have an `smc` fork you could install and use. I hope to have this merged experimentally today... … On Sun, Apr 16, 2017, 8:34 AM Junpeng Lao ***@***.***> wrote: Great work @hvasbath ! Just wondering what is the status on this? I have a model which doesnt work very well in NUTS (the geometry is not continuous) and using Metropolis returns broadcast error #1983 <#1983>. I am curious to try SMC on it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1923 (comment)>, or mute the thread .

Merged

Member

### ColCarroll commented Apr 17, 2017

 #2045 was merged with these changes -- thanks again @hvasbath !

Member

### twiecki commented Apr 17, 2017

 Congrats @hvasbath! … On Mon, Apr 17, 2017 at 10:05 PM, Colin ***@***.***> wrote: #2045 <#2045> was merged with these changes -- thanks again @hvasbath ! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1923 (comment)>, or mute the thread .
Contributor

### hvasbath commented Apr 17, 2017

 Great finally done! Thanks for helping to finish including this to pymc3! Although it is kind of sad that this didnt increase my contribution stats and it went all to @ColCarrol. I know this is kind of nerdy but I like these stats ;) .
Member

### twiecki commented Apr 18, 2017

 Ah good catch, we need to amend that. … On Apr 17, 2017 10:50 PM, "Hannes Vasyura-Bathke" ***@***.***> wrote: Great finally done! Thanks for helping to finish including this to pymc3! Although it is kind of sad that this didnt increase my contribution stats and it went all to @ColCarrol. I know this is kind of nerdy but I like these stats ;) . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1923 (comment)>, or mute the thread .
Member

### twiecki commented Apr 18, 2017

 @hvasbath OK, the cleanest solution was to revert and then re-commit as you: f4803ce Sorry about that.
Contributor

### hvasbath commented Apr 18, 2017

 Wow thanks @twiecki I didnt know such a thing is possible. Cool!
Member

### twiecki commented Apr 18, 2017

 `git commit --amend --author='...'` ;)