strict_can_append #473

dwhswenson · 2016-04-26T11:50:37Z

Resolves #472. See that issue for a detailed description of the problem. In discussion below, I use the terminology for the can_append analog; the modifications for can_prepend are what you would expect.

Since the problem is that can_append accepts trajectories that are any subtrajectory, not just the start of a trajectory in the ensemble, the solution is to have new functions that require that the given trajectory actually be the start of a trajectory in the ensemble.

Several name options I’ve considered for these functions:

strict_can_append / strict_can_prepend
can_prefix / can_suffix
can_be_prefix / can_be_suffix

Currently, I’m using strict_*, because that seemed most natural to me. But if there’s a preference for a different name, that can be changed.

For many of the “atomic” ensembles we use to assemble more complicated ensembles (e.g., VolumeEnsembles, LengthEnsembles) this is exactly the same as can_append. For most other ensembles (Wrapped, Combination, etc), it behaves just like can_append, but with strict_can_append in place of can_append. However, it does require a separate function for the SequentialEnsemble. This new function will disallow the possibility of starting from anything other than the first subensemble of the sequence.

Implement strict_can_append/strict_can_prepend for ensembles other than sequential
Tests for strict_can_* in ensembles other than sequential
Implement SequentialEnsemble.strict_can_*
Tests for SequentialEnsemble.strict_can_*
Check speed improvement in analysis
Check coverage in ensemble.py

Name might change, but I can start adding the code.

There was an error in the so-called "minus" ensemble used there, so the results it would give did not match what we'd expect from the real minus ensemble for can_prepend. The real minus ensemble is thoroughly tested in another test class, so that's what we should be using. Namespace clashes (two tests with same name) were causing us to not see that one of the tests was wrong. This also includes some starts on the proper strict_can_* for SeqEns.

Still need similar tests for some of the other test classes that build atop SeqEns.

dwhswenson · 2016-05-02T15:18:59Z

Going to do further checks, but just looking at a simple test, this change means that instead find_valid_slices calling its ensemble's can_append 5402 times, it calls the strict_can_append 299 times. Later in the call graph, this translates to AllInXEnsemble.can_append being called 631 times instead of 31838.

In other words, massive speed improvements for analysis.

dwhswenson · 2016-05-02T15:43:13Z

Tested on the analysis notebook where I’d noticed that this was going very badly. Previous timings:

CPU times: user 8h 55min 59s, sys: 6min 55s, total: 9h 2min 54s
Wall time: 8h 58min 50s

New timings:

CPU times: user 1min 55s, sys: 876 ms, total: 1min 56s
Wall time: 1min 56s

So yeah, that’s a little bit faster. Worth the coding effort.

Note that these speed-ups are really only relevant when doing analysis using ensemble.split. It won't have an effect in the generation of trajectories, because the sequential ensemble caching mechanism should have had the same effect already.

Also added fname for debug function info in _generic_short_circuit

Still need strict_can_*

There were some buggy parts of the SeqEns caching. This is also worth checking later that we're not taking a speed hit. But at least the answers are right if you do them twice in a row, unlike before.

jhprinz · 2016-05-04T09:45:18Z

Phantastic. This seems really to be a huge improvement. I realized in the alanine example that this step was surprisingly slow (not as slow that it would have been a problem though) and this was only a single call to split.

dwhswenson · 2016-05-04T12:22:09Z

Now that it passes tests (and even increases coverage!) I'll call this ready for review.

How big of an improvement really depends on how many frames (and how much the trajectory is like the worst-case), since this makes an algorithm that scales as $N^2$ worst-case into one that scales as $N$. The example I showed had trajectories of something like 10000 frames, so that makes a huge difference!

This also includes some cleanup stuff, and notes for further cleanup, in ensemble.py. We’ll need to complete the cleanup before release.

I'm still having frequent problems passing alanine.ipynb due to timing out. Specifically, cell 18 (generating the first trajectory) often takes a very long time. I tried in on my (5-year-old) laptop, and it ran quickly with no problems. When I checked the call graph from gprof2dot, I didn't see anything that looked particularly slow, so my guess is that the problem might be just that the processor resources on Travis are pretty weak. (I don't think my old laptop had GPU acceleration, but that might also account for it.)

jhprinz · 2016-05-04T15:12:59Z

Seems to have been more work than I thought. Very good. Merging...

jhprinz · 2016-05-04T15:14:34Z

Yes, alanine.ipynb not sure. it always runs on my machine without problems. We could just make it much smaller or use smaller interfaces. This examples might need an overhaul anyway.

dwhswenson added 3 commits April 25, 2016 16:51

Start to adding strict_can_append

a54a4ba

Name might change, but I can start adding the code.

Merge remote-tracking branch 'upstream/master' into ensemble_speed

b5c0002

The last of adding strict_can_append? Needs tests

c422ad6

dwhswenson self-assigned this Apr 26, 2016

dwhswenson added this to the 1.0 milestone Apr 26, 2016

dwhswenson added 7 commits April 30, 2016 23:28

Tests for (strict_)can* in PartOutX and InX

30567fe

can_append/strict_can_app on remaining VolumeEns

76d53cf

Drafts of SeqEns.strict_can_* (generalized orig)

3918e0b

Full tests for SeqEns.strict_can_*

993fccc

Still need similar tests for some of the other test classes that build atop SeqEns.

testOptionalEns.test_strict_can_*

556ce35

testMinusInterfaceEns.test_strict_can_*

703aa34

dwhswenson added 4 commits May 3, 2016 13:06

Generalized short circuit; cleanup of ensembles

0cbca8b

Tests for Prefix/Suffix strict_can_*

885ec43

Also added fname for debug function info in _generic_short_circuit

EnsCombo of SeqEns tests for call, can_*.

311ce19

Still need strict_can_*

Working tests for ComboSeqEns.strict_can_*

3656cd8

There were some buggy parts of the SeqEns caching. This is also worth checking later that we're not taking a speed hit. But at least the answers are right if you do them twice in a row, unlike before.

dwhswenson changed the title ~~[WIP] strict_can_append~~ strict_can_append May 4, 2016

dwhswenson assigned jhprinz and unassigned dwhswenson May 4, 2016

jhprinz merged commit 70cf687 into openpathsampling:master May 4, 2016

dwhswenson deleted the ensemble_speed branch January 12, 2017 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strict_can_append #473

strict_can_append #473

dwhswenson commented Apr 26, 2016 •

edited

Loading

dwhswenson commented May 2, 2016

dwhswenson commented May 2, 2016

jhprinz commented May 4, 2016

dwhswenson commented May 4, 2016

jhprinz commented May 4, 2016

jhprinz commented May 4, 2016

strict_can_append #473

strict_can_append #473

Conversation

dwhswenson commented Apr 26, 2016 • edited Loading

dwhswenson commented May 2, 2016

dwhswenson commented May 2, 2016

jhprinz commented May 4, 2016

dwhswenson commented May 4, 2016

jhprinz commented May 4, 2016

jhprinz commented May 4, 2016

dwhswenson commented Apr 26, 2016 •

edited

Loading