-
Notifications
You must be signed in to change notification settings - Fork 119
Serialization add version interpolation #849
Serialization add version interpolation #849
Conversation
…del/PyEMMA into models_estimators_serializeable
That's a good idea. But this needs to be updated manually after a rename, right? I think we should clean up class attributes and their names before releasing this feature, such that we're in a situation where attribute names appear stable. |
Am 30.06.2016 um 10:53 schrieb Frank Noe:
I agree that we should tidy up things beforehand to minimize the need to use this feature. Do you have any suggestion which modules need some attention? |
Am 30/06/16 um 12:20 schrieb Martin K. Scherer:
Generally, it may be useful to order variables into principle variables
Prof. Dr. Frank Noe Phone: (+49) (0)30 838 75354 Mail: Arnimallee 6, 14195 Berlin, Germany |
I agree that it would be easier to implement this if we stick to the sklearn pattern of storing everything as plain attribute. However our classes are much more complex than a simple sklearn estimator and we are relying on some logic, which requires private attributes. If we want to have a fully operational restored object, e.g. which behaves exactly the same way prior saving it, I think we have to store these internal parameters too. For TRAM I'd suggest to omit very large arrays, though the data format is at least space efficient, since it uses Numpys npz format for arrays. But I'd guess these arrays are still too large to store them. Maybe we should store the code to a function which generates/restores the user data in this case (which might be error prone)? I agree that we should discuss is in detail in a meeting. |
I see generally two usecases:
Am 30/06/16 um 14:34 schrieb Martin K. Scherer:
Prof. Dr. Frank Noe Phone: (+49) (0)30 838 75354 Mail: Arnimallee 6, 14195 Berlin, Germany |
I think this choice can be made by a simple parameter in Estimator.save, eg. save_estimation_data=False. Then we introduce another field in the Estimator to declare which fields are "estimation data". Since most models are currently encapsulated within the Estimator (eg. the plain model is not trivially usable), I do not see the need to distinguish between them. If we want to store the model separately the user can simply invoke: tica = pyemma.coordinates.tica(data, ...)
tica.model.save("tica_model") However it is more difficult to handle, since I'm not sure if just setting the model in the Estimator will fully restore the desired state. |
In order to facilitate the loading of older class versions, which might have new, renamed and deleted fields, we define a interpolation mapping like this:
The key of the dict indicate the version number for which this mapping will be applied. So for version 2 of the class the mapping 1 will be applied.
Currently these operations work:
This way we can track (all?) changes made to classes via versioning. To track class rename refactorings, we need to provide a renaming dictionary in the load function to translate the class name to its actual location. This just requires a simple rename in the JSON string.