ReactiveFlux preserve order of input sets #1005
Conversation
… plots * To preserve the order we use a ordered set implementation. Note that this also increases speed a bit. * The first and the last state of F are always A and B, so we use that info in plotting.
Note that the input and the output sets can be different because if your
definition of A and B cut through set boundaries, the sets must be splitted.
Thus, I don't know what "preserve order of input sets" means. It's in
general not possible to preserve the order because not even the sets
themselves are preserved.
Am 06/12/16 um 19:41 schrieb Martin K. Scherer:
…
* To preserve the order we use a ordered set implementation.
Note that this also increases speed a bit.
* The first and the last state of F are always A and B, so we use
that info in
plotting.
Fixes #1004 <#1004>
------------------------------------------------------------------------
You can view, comment on, or merge this pull request online at:
#1005
Commit Summary
* [reactive_flux] preserve order of intermediate sets and label A, B
in plots
* [circle] use two omp threads
File Changes
* *M* circle.yml
<https://github.com/markovmodel/PyEMMA/pull/1005/files#diff-0> (1)
* *A* pyemma/_ext/orderedset/.gitignore
<https://github.com/markovmodel/PyEMMA/pull/1005/files#diff-1> (1)
* *A* pyemma/_ext/orderedset/LICENSE
<https://github.com/markovmodel/PyEMMA/pull/1005/files#diff-2> (84)
* *A* pyemma/_ext/orderedset/__init__.py
<https://github.com/markovmodel/PyEMMA/pull/1005/files#diff-3> (5)
* *A* pyemma/_ext/orderedset/_orderedset.pyx
<https://github.com/markovmodel/PyEMMA/pull/1005/files#diff-4> (512)
* *M* pyemma/msm/models/reactive_flux.py
<https://github.com/markovmodel/PyEMMA/pull/1005/files#diff-5> (2)
* *M* pyemma/plots/networks.py
<https://github.com/markovmodel/PyEMMA/pull/1005/files#diff-6> (9)
* *M* setup.py
<https://github.com/markovmodel/PyEMMA/pull/1005/files#diff-7> (7)
Patch Links:
* https://github.com/markovmodel/PyEMMA/pull/1005.patch
* https://github.com/markovmodel/PyEMMA/pull/1005.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1005>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGMeQoYfLrngY6mZlBf-R4ThQux1q-eiks5rFaxRgaJpZM4LFv4q>.
--
----------------------------------------------
Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354
Web: research.franknoe.de
Mail: Arnimallee 6, 14195 Berlin, Germany
----------------------------------------------
|
Current coverage is 89.37% (diff: 100%)@@ devel #1005 diff @@
==========================================
Files 172 173 +1
Lines 16589 16598 +9
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 14824 14834 +10
+ Misses 1765 1764 -1
Partials 0 0
|
Thanks for addressing #1004 ! I understand that its probably not possible to find a general solution for the order of output sets since there are many potential use cases. However, I think the most important use case is using the PCCA sets for coarse graining, either with A and B being macrostates or with A and B being microstates belonging to different PCCA states. (This is also what we show in the BPTI notebook.) In this case I think it helps to have a defined order of the output states which can easily be connected to the input sets, as well as having A and B clearly marked, since in the end the main interest is connecting the observed transition states to macrostates and their specific properties. |
If you have coarse-grained the flux to PCCA states and then use
microstates as A/B you will have exactly the problem that I described.
For example if we have the coarse-graining
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
and now define A=7 and B=4, we will get the sets.
[7] = A
[0, 1, 2]
[3, 5]
[6, 8]
[4] = B
So what's the right order here? And what does Martin's code do in this case?
Am 06/12/16 um 20:48 schrieb nsplattner:
…
Thanks for addressing #1004
<#1004> !
I understand that its probably not possible to find a general solution
for the order of output sets since there are many potential use cases.
However, I think the most important use case is using the PCCA sets
for coarse graining, either with A and B being macrostates or with A
and B being microstates belonging to different PCCA states. (This is
also what we show in the BPTI notebook.) In this case I think it helps
to have a defined order of the output states which can easily be
connected to the input sets, as well as having A and B clearly marked,
since in the end the main interest is connecting the observed
transition states to macrostates and their specific properties.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1005 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGMeQkRaC20iDyyFNYN6R6FCfRb2FYhXks5rFbwUgaJpZM4LFv4q>.
--
----------------------------------------------
Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354
Web: research.franknoe.de
Mail: Arnimallee 6, 14195 Berlin, Germany
----------------------------------------------
|
As far as I understand the order as indicated above was not preserved before. Also, states A and B were not marked. So what I expect the code to do (I have not tested it yet) is to mark A and B in the flux plot and preserve the order of states otherwise. This should make it easier to connect the transition states to the PCCA sets of the model. |
This makes sense, but A and B could consist of multiple sets if they had
been divided before. I think the easiest would be if A and B sets are
marked in an additional variable.
The part "preserve the order of states otherwise" is ill-defined I
think, there is no general way of preserving order. But I think that is
also less important. The most important issue is that you can recover A
and B.
Note that if you have defined your sets a priori such that A and B are
clear-cut (i.e. the are either sets or assemblies of sets that contain
no other states than A/B states), the sets should be unchanged by TPT.
Then I think it also makes sense to keep the order (but that's the
nicest case) - I'm not sure what the current code does. Probably it
places A in the beginning and B in the end.
Am 06/12/16 um 21:03 schrieb nsplattner:
…
As far as I understand the order as indicated above was not preserved
before. Also, states A and B were not marked. So what I expect the
code to do (I have not tested it yet) is to mark A and B in the flux
plot and preserve the order of states otherwise. This should make it
easier to connect the transition states to the PCCA sets of the model.
Please correct me if this is wrong and comment if you think there is a
better way of solving this problem.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1005 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGMeQuxiXQLr__uI1fQo8HvbXnsqr-STks5rFb-ngaJpZM4LFv4q>.
--
----------------------------------------------
Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354
Web: research.franknoe.de
Mail: Arnimallee 6, 14195 Berlin, Germany
----------------------------------------------
|
That's exactly what the current code does (placing A in the beginning and B in the end). This is fine if A and B are clearly marked as in the code submitted here, but its confusing in the current release in the case your input is a defined set of PCCA states and you get the same number of states as an output, but with a different numbering compared to your input. I agree that preserving the order of states is not meaningful in case states are divided. However, I think in these cases users should be aware of dealing with a new state definition/ordering anyways. |
OK, so how about the following:
- if A/B are chosen well, i.e. no states need to be split up, leave
order as is.
- if states are being split up, raise a Warning, notifying the user that
the state definition has changed and care needs to be taken when
indexing states.
Am 06/12/16 um 21:51 schrieb nsplattner:
…
That's exactly what the current code does (placing A in the beginning
and B in the end). This is fine if A and B are clearly marked as in
the code submitted here, but its confusing in the current release in
the case your input is a defined set of PCCA states and you get the
same number of states as an output, but with a different numbering
compared to your input.
I agree that preserving the order of states is not meaningful in case
states are divided. However, I think in these cases users should be
aware of dealing with a new state definition/ordering anyways.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1005 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGMeQlBiIyCKN0LTxhO_n_exV1aDTFciks5rFcrPgaJpZM4LFv4q>.
--
----------------------------------------------
Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354
Web: research.franknoe.de
Mail: Arnimallee 6, 14195 Berlin, Germany
----------------------------------------------
|
I think that would be a good solution! |
ok, @marscher can you do that?
Am 06/12/16 um 22:02 schrieb nsplattner:
…
I think that would be a good solution!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1005 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGMeQi2ePpAeNy-GYJOxki8ZHrLbX8LIks5rFc13gaJpZM4LFv4q>.
--
----------------------------------------------
Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354
Web: research.franknoe.de
Mail: Arnimallee 6, 14195 Berlin, Germany
----------------------------------------------
|
The changes in #1005 reflect exactly what have been discussed here:
|
ok, can you provide an additional class variable which indicates where A
and B are (not just the visualization)?
Am 07/12/16 um 14:56 schrieb Martin K. Scherer:
…
The changes in #1005 <#1005>
reflect exactly what have been discussed here:
1. The order is preserved if A and B are already disjoint to the sets
to coarse grain to.
2. If the set definition changes, the user will get a warning that it
had to be changed.
3. Plotting the flux will label A and B
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1005 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGMeQhiICZaVgMwbTM375NCpvBIhWZbeks5rFrsUgaJpZM4LFv4q>.
--
----------------------------------------------
Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354
Web: research.franknoe.de
Mail: Arnimallee 6, 14195 Berlin, Germany
----------------------------------------------
|
On 12/07/2016 03:47 PM, Frank Noe wrote:
ok, can you provide an additional class variable which indicates where A
and B are (not just the visualization)?
Do you mean to provide these within the NetworkPlot class?
|
ReactiveFlux.
We are talking about the ReactiveFlux, which does TPT/coarse-graining, no?
Am 07/12/16 um 15:50 schrieb Martin K. Scherer:
… On 12/07/2016 03:47 PM, Frank Noe wrote:
> ok, can you provide an additional class variable which indicates where A
> and B are (not just the visualization)?
Do you mean to provide these within the NetworkPlot class?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1005 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGMeQrBn4sou4Q5YB5UxP7lTUf1ppBidks5rFsfQgaJpZM4LFv4q>.
--
----------------------------------------------
Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354
Web: research.franknoe.de
Mail: Arnimallee 6, 14195 Berlin, Germany
----------------------------------------------
|
On 12/07/2016 03:53 PM, Frank Noe wrote:
ReactiveFlux.
We are talking about the ReactiveFlux, which does TPT/coarse-graining, no?
I just asked because A, B and I (intermediates) are already properties of this class.
|
The question is does this help to identify how states were split up if
they are split up. I don't remember right now, but think A/B are the
input A/B, i.e. microstate sets.
Am 07/12/16 um 15:55 schrieb Martin K. Scherer:
… On 12/07/2016 03:53 PM, Frank Noe wrote:
> ReactiveFlux.
>
> We are talking about the ReactiveFlux, which does
TPT/coarse-graining, no?
I just asked because A, B and I (intermediates) are already properties
of this class.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1005 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGMeQsNkxtBeRjfwOdqt7buSeVxV2EmHks5rFsj3gaJpZM4LFv4q>.
--
----------------------------------------------
Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354
Web: research.franknoe.de
Mail: Arnimallee 6, 14195 Berlin, Germany
----------------------------------------------
|
The coarse_grain method returns a new ReactiveFlux instance, which will have the new A and B sets as properties.
|
Yeah, but does that show in a simple way how macrostates were splitted
up? If that's the same functionality before, I guess it doesn't solve
Nuria's assignment problem. Nuria, please comment.
Am 07/12/16 um 16:12 schrieb Martin K. Scherer:
… The coarse_grain method returns a new ReactiveFlux instance, which
will have the new A and B sets as properties.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1005 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGMeQveb7g0LB5BfCWlX-hCvddVcfrYvks5rFszEgaJpZM4LFv4q>.
--
----------------------------------------------
Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354
Web: research.franknoe.de
Mail: Arnimallee 6, 14195 Berlin, Germany
----------------------------------------------
|
The main problem I have with the current release is that the numbering is confusing in case the macrostates are not split. This is solved by the changes above. For the case where macrostate are split I'm not sure what information should be provided. On one hand this could be treated as a new macrostate definition. In this case the user would just need the indices of the new microstates in each macrostates. However, if the new macrostates can clearly be related to the input macrostates then it would be good to have this information. But I'm not sure if this can be done in a generally applicable way. |
OK, then merge and try if this helps.
Am 07/12/16 um 16:26 schrieb nsplattner:
…
The main problem I have with the current release is that the numbering
is confusing in case the macrostates are not split. This is solved by
the changes above.
For the case where macrostate are split I'm not sure what information
should be provided. On one hand this could be treated as a new
macrostate definition. In this case the user would just need the
indices of the new microstates in each macrostates. However, if the
new macrostates can clearly be related to the input macrostates then
it would be good to have this information. But I'm not sure if this
can be done in a generally applicable way.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1005 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGMeQm8_wi87WdQJsGLnqYONFaniyT2Yks5rFtASgaJpZM4LFv4q>.
--
----------------------------------------------
Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin
Phone: (+49) (0)30 838 75354
Web: research.franknoe.de
Mail: Arnimallee 6, 14195 Berlin, Germany
----------------------------------------------
|
Note that this also increases speed a bit.
plotting.
Fixes #1004