New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON model import and export #469
Conversation
Add JSON model import and export capabilities. JSON is a text-serializable format, which is good for storing models within other objects, or for purposes like HTTP APIs.
Replaces pickling the model
Codecov Report
@@ Coverage Diff @@
## master #469 +/- ##
=========================================
Coverage ? 80.08%
=========================================
Files ? 101
Lines ? 10343
Branches ? 0
=========================================
Hits ? 8283
Misses ? 2060
Partials ? 0
Continue to review full report at Codecov.
|
This looks promising! One of the issues though, is that often, one of the main rationales behind pickling a PySB model is to save the result of having run bng.generate_equations on the model (which for certain models can be quite slow). The model can then be quickly reloaded and simulated without having to regenerate the reaction network. This is not immediately possible with the JSON approach, though perhaps could be optionally handled by encoding the equations in JSON as well. |
Applies to both JSON export and import. Useful for caching rxn net to avoid network re-generation.
@bgyori No problem, I've added an option for this: |
# restore the 'model' weakrefs on all components | ||
self.__dict__.update(state) | ||
# Set "tags" attribute for older, pickled models | ||
self.__dict__.setdefault('tags', ComponentSet()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only thing we would have to patch to support all potential old pickles that might be out there in saved SimulationResult hdf5 files, or are there other attributes we've added since? We should either fix everything or fix nothing, but I fear fixing everything might get out of hand.
This won't address the use case that brought up #467 anyway because they are doing explicit pickling/unpickling not saving SimulationResults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only change required to unpickle models or load HDF5 files, at least as far back as PySB v1.6.0 where the HDF5 format was introduced.
For the standalone pickle cases, the following snippet works. Let me know if you think I should remove the underscore prefix and/or document this somehwere explicitly.
from pysb.simulator.base import _patch_model_setstate
with open('earm_1_0_pysb1.6.pkl', 'rb') as f:
with _patch_model_setstate():
pickle.load(f)
pysb/tests/test_exporters.py
Outdated
# Check network generation and single-step integrator | ||
if model.name not in ('pysb.examples.tutorial_b', | ||
'pysb.examples.tutorial_c'): | ||
ScipyOdeSimulator(m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just initializes the integrator. Based on the comment on line 112, did you mean to actually run it a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integrator RHS is run for t=0
during initialization, at least when using Cython (line 219 of scipyode.py). But I'll change this to enforce Cython, using compiler='cython'
kwarg.
pysb/export/json.py
Outdated
|
||
|
||
class PySBJSONEncoder(json.JSONEncoder): | ||
FORMAT = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write some documentation, perthaps on this class and/or the decoder, about what we need to think about as Model and the components gain new attributes in order to maintain proper compatibility here or bump the version number?
Also, to be a bit pedantic, is "format" the right name? I would think "version" or even "protocol" (to mimic pickle) would be more apt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switched to protocol
, rather than format
.
I've added some documentation to cover the basics of the format. The protocol number should get bumped when new features are added which affect model simulation, or prevent it being loaded by the decoder. I don't think it's a bad thing if this happens, so long as the decoder maintains backwards compatibility.
pysb/tests/test_exporters.py
Outdated
# Check network generation and single-step integrator | ||
if model.name not in ('pysb.examples.tutorial_b', | ||
'pysb.examples.tutorial_c'): | ||
ScipyOdeSimulator(m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a reasonable smoke test to make sure the model isn't broken, but it doesn't demonstrate we got the same model back. We ought to actually check that the model is the same, but that's probably infeasible on every example unless we leverage some other exporter (and hope we don't end up with parallel bugs in both exporters). The first thing that comes to mind is to run each model and its re-imported version through the pysb_flat exporter and compare the strings. Of course the pysb_flat exporter doesn't have a round-trip fidelity test either, does it?
Or maybe create a minimal model that exercises every feature once, and explicitly check all the components before and after re-import? That might be a big job, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed - I used PySBFlatExporter
to manually check things were working as expected, but symbols within expressions don't appear in a deterministic order. I agree the best solution is probably to create a toy model that uses every feature once and check equality manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up using check_model_against_component_list
on all of the example models after round-tripping to JSON, which seems to do the job.
This is to avoid e.g. component names getting converted from strings to unicode. This is merely a test artifact, and doesn't seem worth fixing given we're dropping Python 2 support in under 2 months' time.
Component.__init__(self, name, _export) | ||
|
||
@property | ||
def value(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this new getter/setter for? I'm guessing round-trip testing exposed a code path where the normalization of .value
to a float was getting bypassed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in the earm_1_3.py model where .value
is used with integers, but these values were getting converted to floats after round-tripped to JSON.
@@ -39,7 +39,7 @@ | |||
|
|||
|
|||
Monomer('EGF', ['R']) | |||
Monomer('EGFR', ['L','CR1','Y1068'], {'Y1068':('U','P')}) | |||
Monomer('EGFR', ['L','CR1','Y1068'], {'Y1068': ['U', 'P']}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Monomer should just normalize sites
and site_states.values()
to list
, but that can be done outside of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, we should normalize in core.py
.
pysb/tests/test_exporters.py
Outdated
@@ -112,7 +114,12 @@ def check_convert(model, format): | |||
# Check network generation and single-step integrator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you pointed out in your previous comment, we actually just need to evaluate the RHS once which is different from stepping the integrator. This comment would be more accurate as something like "Check network generation and force RHS evaluation".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment fixed as requested
Add JSON model import and export capabilities. JSON
is a text-serializable format, which is good for storing
models within other objects, or for purposes like HTTP
APIs.
SimulationResult.save() now saves models in JSON
rather than pickle.
See discussion on #467 which prompted the need for
a durable and serializable model storage format.