Experimental Storage fails at reloading #1108

gianmarcolazzeri · 2022-03-22T14:13:55Z

Hello! When restoring a "db" storage, function tuple_keys_from_dict in openpathsampling.experimental.dict_searization_helpers doesn't treat properly an input dictionary in which the key it attempts to tuple is a string. I don't know if it is because the input dictionary shouldn't be like this, but found an easy fix by replacing:

    def tuple_keys_dict_from_dict(name, dct):
        keys = dct[name + "_tuple_keys"]
        values = dct[name + "_values"]
        return {tuple(k): v for k, v in zip(keys, values)}

with

    def tuple_keys_dict_from_dict(name, dct):
        keys = dct[name + "_tuple_keys"]
        values = dct[name + "_values"]
        return {(tuple(k) if type(k) is not str else k): v for k, v in zip(keys, values)}

The text was updated successfully, but these errors were encountered:

dwhswenson · 2022-03-22T15:35:48Z

Hi @gianmarcolazzeri, thanks for the report!

I'm curious about the situation where this is happening -- tuple_keys_dict_from_dict is a workaround for a few cases where OPS stores a dictionary that uses tuples for keys, so it shouldn't be used on dicts with strings as keys. I have a couple questions:

Is this happening somewhere within normal usage of the core OPS library, or is this something that's popping up in custom code you've written?
If it happens in the core library, then there's a bug somewhere that I need to fix; could you give more details on how it happens? If it is in custom code, this may mean that there's an easier way to accomplish what you're trying to do -- I'm happy to give advice on that if you'd like.

Some details on the tuple_keys_dict_*_dict methods (where * is from or to): The problem here is that, to convert to JSON (which we use to save to disk), we need our dictionaries to have strings as keys. In a small number of cases, OPS uses dictionaries that map tuple keys to some value (mostly mapping a pair of states to a transition, like {(stateA, stateB): transition_between_states}). For those specific cases, we use tuple_keys_dict_*_dict in order to save these objects. But this should only be used where it is explicitly requested (I think all cases are in the monkey_patch_saving and monkey_patch_loading methods in openpathsampling.experimental.storage.monkey_patches, but over time this should move into the actual to_dict/from_dict for specific objects -- I think analysis results, in particular, will also need something like this. For example, fluxes are returned as a dict with tuple keys like {(state, interface): flux_value}).

gianmarcolazzeri · 2022-03-22T16:19:20Z

Hello David W.H. Swenson,

thank you for the fast reply.

I am a PhD student under the supervision of Roberto Covino, working on a method called AIMMD (basically, the selector in OpenPathSampling gets controlled by a NN).

Now I'm trying to switch to the new "db" storage with a 2d toy system on the OpenMM engine.
I'll try to isolate the relevant part of the code.

So here's the monkey patched openpathsampling that I'm using

# patched openpathsampling (new database)
import openpathsampling as paths
from openpathsampling.experimental.storage import monkey_patch_all
paths = monkey_patch_all(paths)
paths.InterfaceSet.simstore = True
from openpathsampling.experimental.storage.collective_variables import CoordinateFunctionCV, CollectiveVariable
from openpathsampling.experimental.storage import Storage
from openpathsampling.experimental.simstore import Processor, StorableFunctionConfig

And here the collective variable determining the two state volumes, and the description of the system that I input to the NN :

def circle(snapshot, center):
    import math
    return math.sqrt((snapshot.coordinates[0, 0]._value-center[0])**2
                     + (snapshot.coordinates[0, 1]._value-center[1])**2)

def descriptor_fn(snapshot):
    return snapshot.coordinates[0, :2]._value

descriptor = CoordinateFunctionCV(
    descriptor_fn,
).named('descriptor')  # otherwise not reloading correctly
distA = CoordinateFunctionCV(func=circle, center=xA).named("distA")
distB = CoordinateFunctionCV(func=circle, center=xB).named("distB")
descriptor.mode = 'production'
distA.mode = 'production'
distB.mode = 'production'

stateA = paths.CVDefinedVolume(distA, 0.0, 0.05).named("A")
stateB = paths.CVDefinedVolume(distB, 0.0, 0.05).named("B")

I guess the problem arises when creating a dictionary for the "transitions" (paths.PathSampling initialisation).

# strategy
strategy = paths.strategies.TwoWayShootingStrategy(
    modifier=modifier,
    selector=selector,
    engine=engine,
    group='TwoWayShooting'
)

# scheme
scheme = paths.MoveScheme(
    network=paths.TPSNetwork.from_states_all_to_all([stateA, stateB])
)
scheme.append(strategy)
scheme.append(paths.strategies.OrganizeByMoveGroupStrategy())
scheme.build_move_decision_tree()

# sample set
sample_set = scheme.initial_conditions_from_trajectories([trajectory])

# storage
storage = Storage(f'{folder}/paths.db', 'w')

# sampler
sampler = paths.PathSampling(
    storage,
    sample_set=sample_set,
    move_scheme=scheme
)

In fact, two subsequent calls of tuple_keys_to_dict produce the following output:

{'transitions': {'transitions_tuple_keys': [(<openpathsampling.volume.CVDefinedVolume object at 0x7fe8d920a1c0>, <openpathsampling.volume.CVDefinedVolume object at 0x7fe8d9bb55b0>), (<openpathsampling.volume.CVDefinedVolume object at 0x7fe8d9bb55b0>, <openpathsampling.volume.CVDefinedVolume object at 0x7fe8d920a1c0>)], 'transitions_values': [<openpathsampling.high_level.transition.TPSTransition object at 0x7fe8b81222b0>, <openpathsampling.high_level.transition.TPSTransition object at 0x7fe8a86157f0>]}, 'x_sampling_transitions': [<openpathsampling.high_level.transition.TPSTransition object at 0x7fe8a86afa00>], 'special_ensembles': {None: {}}, 'initial_states': [<openpathsampling.volume.CVDefinedVolume object at 0x7fe8d920a1c0>, <openpathsampling.volume.CVDefinedVolume object at 0x7fe8d9bb55b0>], 'final_states': [<openpathsampling.volume.CVDefinedVolume object at 0x7fe8d920a1c0>, <openpathsampling.volume.CVDefinedVolume object at 0x7fe8d9bb55b0>]}

{'transitions': {'transitions_tuple_keys': ['transitions_tuple_keys', 'transitions_values'], 'transitions_values': [[(<openpathsampling.volume.CVDefinedVolume object at 0x7fe8d920a1c0>, <openpathsampling.volume.CVDefinedVolume object at 0x7fe8d9bb55b0>), (<openpathsampling.volume.CVDefinedVolume object at 0x7fe8d9bb55b0>, <openpathsampling.volume.CVDefinedVolume object at 0x7fe8d920a1c0>)], [<openpathsampling.high_level.transition.TPSTransition object at 0x7fe8b81222b0>, <openpathsampling.high_level.transition.TPSTransition object at 0x7fe8a86157f0>]]}, 'x_sampling_transitions': [<openpathsampling.high_level.transition.TPSTransition object at 0x7fe8a86afa00>], 'special_ensembles': {None: {}}, 'initial_states': [<openpathsampling.volume.CVDefinedVolume object at 0x7fe8d920a1c0>, <openpathsampling.volume.CVDefinedVolume object at 0x7fe8d9bb55b0>], 'final_states': [<openpathsampling.volume.CVDefinedVolume object at 0x7fe8d920a1c0>, <openpathsampling.volume.CVDefinedVolume object at 0x7fe8d9bb55b0>]}

the latter raising an issue when saving while attempting to tuplify the "transitions_tuple_keys" entry. The fix that I proposed simply keeps "transitions_tuple_keys" intact preserving the structure of the dictionary.

I hope what I wrote is understandable. There are other pieces of code that involve the implementation of AIMMD, but I don't think they impact to the issue.

dwhswenson · 2022-03-25T00:10:40Z

Thanks for the more detailed info. I haven't been able to reproduce this issue locally, but it's possible that you monkey-patched twice -- that could cause this problem. I think that's essentially what you observed when you called tuple_keys_to_dict twice -- internally, the monkey patch is adding a wrapper around an existing function, and if you add that wrapper twice, it would be like calling it a second time on the output of the original patch.

Calling the monkey patch twice is really easy in a notebook (I've had it happen many times when I re-ran the notebook without restarting the kernel), and in any case, it's easy to have the monkey patch listed twice. I've started a PR (#1109) to prevent that from causing problems in the future.

If it wasn't a matter of double-patching, then try just directly saving your network and then reloading it (in a different Python process): storage.save(scheme.network), network = storage.networks[0]. The problem is almost certainly in the process of saving the network; seeing what happens when you try to save it and reload it (without the changes you made to the code) can tell us a lot more.

dwhswenson mentioned this issue Mar 24, 2022

Prevent double monkey-patching experimental storage #1109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental Storage fails at reloading #1108

Experimental Storage fails at reloading #1108

gianmarcolazzeri commented Mar 22, 2022

dwhswenson commented Mar 22, 2022

gianmarcolazzeri commented Mar 22, 2022

dwhswenson commented Mar 25, 2022

Experimental Storage fails at reloading #1108

Experimental Storage fails at reloading #1108

Comments

gianmarcolazzeri commented Mar 22, 2022

dwhswenson commented Mar 22, 2022

gianmarcolazzeri commented Mar 22, 2022

dwhswenson commented Mar 25, 2022