Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

[ITS] fix lag setter and avoid recomputation #1030

Merged

Conversation

marscher
Copy link
Member

  • added a setter for for lags
  • If the lags array is extended, or a lag time is removed, we compute only those models which are needed. Upon removal, the computed arrays are truncated correctly. If the input data is changed, all models are discarded.
  • Fixed a bug in estimate_param_scan (have to clone estimators if they are models, otherwise everything is overwritten).

These changes make it possible to avoid heave re-estimation, if one one lag time is added. @gph82 guess this is what you've desired. Please go ahead and test

@gph82
Copy link
Contributor

gph82 commented Jan 27, 2017

Thanks, will do

@marscher
Copy link
Member Author

the tests passed locally. Don't know whats going on there ... will check on Monday.

 values are indeed wrong, if the estimator is not cloned, because in case of MLMSM, the RDL decomposition is cached between estimations. This seems to be a more serious bug.
@codecov-io
Copy link

codecov-io commented Jan 30, 2017

Codecov Report

Merging #1030 into devel will increase coverage by -0.02%.

@@            Coverage Diff            @@
##           devel    #1030      +/-   ##
=========================================
- Coverage   84.5%   84.48%   -0.02%     
=========================================
  Files        189      189              
  Lines      18813    18913     +100     
=========================================
+ Hits       15898    15979      +81     
- Misses      2915     2934      +19
Impacted Files Coverage Δ
pyemma/util/numeric.py 66.66% <ø> (+50%)
pyemma/coordinates/clustering/interface.py 88.78% <100%> (-0.82%)
pyemma/coordinates/data/featurization/util.py 87.17% <100%> (-1.96%)
pyemma/msm/tests/test_its.py 99.31% <100%> (+0.21%)
pyemma/_base/estimator.py 66.4% <100%> (-0.8%)
pyemma/_ext/sklearn/base.py 38.93% <50%> (+0.53%)
pyemma/msm/estimators/implied_timescales.py 71.64% <78.26%> (+6.4%)
pyemma/_base/parallel.py 77.35% <80.95%> (-7.02%)
pyemma/msm/tests/test_its_oom.py 97.93% <85.71%> (-0.97%)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4aa5201...497b889. Read the comment docs.

@gph82
Copy link
Contributor

gph82 commented Jan 31, 2017

Thanks @marscher.

I think the functionality is something like the attached notebook and seems to do what it's supposed to. Also, I took a look at the test test_insert_lag_time, everything looks sane.

What would you say about adding a method like "add_lagtimes?" that does both the adding and the estimating?
snap8

@marscher
Copy link
Member Author

Hi thanks for the feedback. If we ensure the lags property is a list we could also write something like this:

its.lags.append(x)
its.estimate(dtrajs)

I'm not really sure, if we want another method for this to save just one line of code.

@marscher
Copy link
Member Author

or its.lags.extend([1,2,3]), but you got the idea.

@gph82
Copy link
Contributor

gph82 commented Jan 31, 2017

i like more lags.extend, because appending to the list does not sort them, but, isn't extend a new method, still?

@franknoe
Copy link
Contributor

franknoe commented Jan 31, 2017 via email

@marscher
Copy link
Member Author

marscher commented Jan 31, 2017

I've red the code of sklearn, they do not support partial estimation/avoid re-estimating the same thing in the GridSearch based classes. If you want to change your hyper-plane, you have to call fit again. Also the estimators need to implement the scoring interface, in order to select the best model to delegate the public predict/transform methods to. So this is really different from ITS, because it does GridSearch does not maintain the list of all Estimators, but only the "best" one.

So we can think of having a general solution for this (eg. get the parameters for the already estimated estimators, do a set difference to what has been calculated and only perform the missing ones).

@franknoe
Copy link
Contributor

franknoe commented Jan 31, 2017 via email

@marscher
Copy link
Member Author

marscher commented Jan 31, 2017 via email

@franknoe
Copy link
Contributor

franknoe commented Jan 31, 2017 via email

@gph82
Copy link
Contributor

gph82 commented Jan 31, 2017

BTW I was talking to @marscher about this, it's really not that important, just a shorthand that is more intuitive, but in the end we're talking about two lines of code

@marscher
Copy link
Member Author

marscher commented Feb 2, 2017

I'm currently working on the concept of an abstract ParameterSearch estimator. In the meantime: can we merge this, because it also contains other useful fixes and make ITS use the ParameterSearch class later on?

@franknoe
Copy link
Contributor

franknoe commented Feb 2, 2017

Since this doesn't seem to change existing API but only extend it, yes.

@franknoe franknoe merged commit 56a100b into markovmodel:devel Feb 2, 2017
@marscher marscher deleted the its_fix_lag_setter_avoid_recomputation branch February 2, 2017 13:43
@marscher
Copy link
Member Author

marscher commented Feb 2, 2017

Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants