Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strict_can_append #473

Merged
merged 14 commits into from
May 4, 2016
Merged

Conversation

dwhswenson
Copy link
Member

@dwhswenson dwhswenson commented Apr 26, 2016

Resolves #472. See that issue for a detailed description of the problem. In discussion below, I use the terminology for the can_append analog; the modifications for can_prepend are what you would expect.

Since the problem is that can_append accepts trajectories that are any subtrajectory, not just the start of a trajectory in the ensemble, the solution is to have new functions that require that the given trajectory actually be the start of a trajectory in the ensemble.

Several name options I’ve considered for these functions:

  • strict_can_append / strict_can_prepend
  • can_prefix / can_suffix
  • can_be_prefix / can_be_suffix

Currently, I’m using strict_*, because that seemed most natural to me. But if there’s a preference for a different name, that can be changed.

For many of the “atomic” ensembles we use to assemble more complicated ensembles (e.g., VolumeEnsembles, LengthEnsembles) this is exactly the same as can_append. For most other ensembles (Wrapped, Combination, etc), it behaves just like can_append, but with strict_can_append in place of can_append. However, it does require a separate function for the SequentialEnsemble. This new function will disallow the possibility of starting from anything other than the first subensemble of the sequence.

  • Implement strict_can_append/strict_can_prepend for ensembles other than sequential
  • Tests for strict_can_* in ensembles other than sequential
  • Implement SequentialEnsemble.strict_can_*
  • Tests for SequentialEnsemble.strict_can_*
  • Check speed improvement in analysis
  • Check coverage in ensemble.py

@dwhswenson dwhswenson self-assigned this Apr 26, 2016
@dwhswenson dwhswenson added this to the 1.0 milestone Apr 26, 2016
There was an error in the so-called "minus" ensemble used there, so the
results it would give did not match what we'd expect from the real minus
ensemble for can_prepend. The real minus ensemble is thoroughly tested in
another test class, so that's what we should be using. Namespace clashes
(two tests with same name) were causing us to not see that one of the tests
was wrong.

This also includes some starts on the proper strict_can_* for SeqEns.
Still need similar tests for some of the other test classes that build
atop SeqEns.
@dwhswenson
Copy link
Member Author

Going to do further checks, but just looking at a simple test, this change means that instead find_valid_slices calling its ensemble's can_append 5402 times, it calls the strict_can_append 299 times. Later in the call graph, this translates to AllInXEnsemble.can_append being called 631 times instead of 31838.

In other words, massive speed improvements for analysis.

@dwhswenson
Copy link
Member Author

Tested on the analysis notebook where I’d noticed that this was going very badly. Previous timings:

CPU times: user 8h 55min 59s, sys: 6min 55s, total: 9h 2min 54s
Wall time: 8h 58min 50s

New timings:

CPU times: user 1min 55s, sys: 876 ms, total: 1min 56s
Wall time: 1min 56s

So yeah, that’s a little bit faster. Worth the coding effort.

Note that these speed-ups are really only relevant when doing analysis using ensemble.split. It won't have an effect in the generation of trajectories, because the sequential ensemble caching mechanism should have had the same effect already.

Also added fname for debug function info in _generic_short_circuit
There were some buggy parts of the SeqEns caching. This is also worth
checking later that we're not taking a speed hit. But at least the answers
are right if you do them twice in a row, unlike before.
@jhprinz
Copy link
Contributor

jhprinz commented May 4, 2016

Phantastic. This seems really to be a huge improvement. I realized in the alanine example that this step was surprisingly slow (not as slow that it would have been a problem though) and this was only a single call to split.

@dwhswenson dwhswenson changed the title [WIP] strict_can_append strict_can_append May 4, 2016
@dwhswenson
Copy link
Member Author

Now that it passes tests (and even increases coverage!) I'll call this ready for review.

How big of an improvement really depends on how many frames (and how much the trajectory is like the worst-case), since this makes an algorithm that scales as $N^2$ worst-case into one that scales as $N$. The example I showed had trajectories of something like 10000 frames, so that makes a huge difference!

This also includes some cleanup stuff, and notes for further cleanup, in ensemble.py. We’ll need to complete the cleanup before release.

I'm still having frequent problems passing alanine.ipynb due to timing out. Specifically, cell 18 (generating the first trajectory) often takes a very long time. I tried in on my (5-year-old) laptop, and it ran quickly with no problems. When I checked the call graph from gprof2dot, I didn't see anything that looked particularly slow, so my guess is that the problem might be just that the processor resources on Travis are pretty weak. (I don't think my old laptop had GPU acceleration, but that might also account for it.)

@dwhswenson dwhswenson assigned jhprinz and unassigned dwhswenson May 4, 2016
@jhprinz
Copy link
Contributor

jhprinz commented May 4, 2016

Seems to have been more work than I thought. Very good. Merging...

@jhprinz
Copy link
Contributor

jhprinz commented May 4, 2016

Yes, alanine.ipynb not sure. it always runs on my machine without problems. We could just make it much smaller or use smaller interfaces. This examples might need an overhaul anyway.

@jhprinz jhprinz merged commit 70cf687 into openpathsampling:master May 4, 2016
@dwhswenson dwhswenson deleted the ensemble_speed branch January 12, 2017 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faster analysis: new can_append-like functions
2 participants