Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General cleanup of ObjectStore and Naming #409

Merged
merged 23 commits into from Feb 8, 2016

Conversation

jhprinz
Copy link
Contributor

@jhprinz jhprinz commented Jan 7, 2016

Overview

This PR includes PR #406 and #398 and simplifies the mess in the code. Especially in object.py.
All relevant changes are in files from netcdfplus. All other are just renames that are related and corrections to docstrings, etc. So for review you need to look at objects.py mostly.

Important changes

  • Instead of having a flag has_name in object store that sets, if .name should be saved you now use
    NamedObjectStore instead of ObjectStore. This has two important advantages: 1. The code is much cleaner since NamedObjectStore is not subclassed and only contains the code that handles naming. And 2. The code is faster since I removed unnecessary checks for handling names in objects that do not support naming.
  • There is a subclass UniqueNamedObjectStore that add on top of naming that names must be unique. This is probably useful for CVs etc.
  • There are two more store types DictStore and ImmutableDictStore. The first implements a dict like structure that can be used for tags, see PR A dictionary like store #406 and the second add that keys can only be set once.
  • There is 6th basic ObjectStore called VariableStore. See PR Use a generic object store for simple (enough) objects #398. This is a shortcut to create a store that does not store using JSON but creates tables in the netcdf and handles saving and storing of the object. This is very useful if you have a simple class there you just want to store member variables which are either native variables (bumpy array, str, int, etc) or storable objects. This reduces code redundancy and makes the code much better maintainable and it is MUCH easier for users to create their own Store class
  • Cleanup of handling of named objects.
  • Added lots of better error messages when something cannot be saved as wanted.
  • Docstring cleanup (using of sphinx :class: directive)
  • Add a large test section to test_netcdfplus.ipynb that checks all the different stores and acts at the same time as a limited guide through all the features.
  • Some little convenience improvements, nicer defaults for usage of netcdfplus files.

To Do

  • More docstring cleanup
  • Cleanup of test_netcdfplus

This should close #398, #406, #372.

This was referenced Jan 8, 2016
@jhprinz jhprinz changed the title [WIP] General cleanup of ObjectStore and Naming General cleanup of ObjectStore and Naming Jan 11, 2016
@jhprinz jhprinz added this to the 1.0 milestone Jan 11, 2016
@dwhswenson dwhswenson self-assigned this Jan 13, 2016
@dwhswenson
Copy link
Member

While you're cleaning up the related stuff, you might fix the docstrings for StorableObject and StorableNamedObject. They're currently identical.

@jhprinz
Copy link
Contributor Author

jhprinz commented Jan 20, 2016

Done.

@jhprinz
Copy link
Contributor Author

jhprinz commented Jan 25, 2016

undid a rename of a function that was not necessary and change 4 files.

@dwhswenson
Copy link
Member

This all looks good. Before merging, can you just summarize some of the behaviors in this thread (for documentation purposes)? We've discussed some of it in various threads, but a single place for what finally got merged in would be good. The kinds of things I'd like to have covered in that include:

  • what happens when things get saved with the same name (what comes out when loading from that) -- and how to access previous items with that name (since they're not actually overwritten, right?)
  • can you store the same object with two different names?
  • the scope of unique names when they're required to be unique (it's only within the individual store, right, not globally?)
  • how to name objects (we still support both object.name = new_name and object.named(new_name), right? But now there's also named_store[new_name] = object, as described in A dictionary like store #406?) and also how to load by name
  • how to override to do things that are generally not recommended (and why those things are not recommended)

Those are just a few that come to mind immediately. Anything else you can think of, it would be very useful to document it now, in this PR, so we can find it later for reference. (After doing that, please also mark this PR with the "ref docs" tag.)

Once that's done, I'll give it one more glance over, but I'm pretty sure it's good to merge.

@jhprinz jhprinz added the ref docs issues/PRs with discussion that might be added to docs label Feb 4, 2016
@jhprinz
Copy link
Contributor Author

jhprinz commented Feb 4, 2016

Summary

There are now several new version of the ObjectStore. Instead of setting flag for nameable or not we not use different stores that implement the desired behaviour

Currently there are

  • ObjectStore

    Implements the standard load/save for objects that are subclassed from StorableObject. Saving cannot have an index. .save will just append new objects to the store and return the used index. Loading works by integer index only. In essence an ObjectStore implements a stored list() for objects of a certain type.

  • NamedObjectStore

    Implements a load/save for objects subclasses from StorableNamedObject. These objects have a .name property that is automatically stored with the object although no name= parameter is used for init. These objects can be named and renamed until they are saved. These objects can be named by setting the .name attribute like obj.name = 'new_name'. You can used the .named('new_name') method which supports chaining like this my_obj = Object(...).named('xyz') or you renaming when saving (see below).

    Loading works as before by integer index but also by using a string. If an object with that name is present it will be returned. if multiple objects are present the LAST object will be returned. This way we support mutable object by replacing. In essense this implements an OrderedDict() for objects of certain type.

    Saving can have an additional parameter which is a string. if use like store.save(obj, new_name) the object will be renamed before storing. If renaming is not possible an error will be thrown. Saving can also we done using __setitem__ like store[new_name] = my_object. In this case the object will be renamed if necessary and stored.

    There is a dictionary that contains a list of all existing names .name_idx pointing to all integer indices. You can also use .find('name') to find the last object with a certain name and .find_indices('name') to get a list of all integer indices of objects with a certain name in order.

  • UniqueNamedObjectStore

    Works as NamedObjectStore but ensures that per store (e.g. ensembles, volumes) are only used once.

    Saving in addition checks if the optional new name is also taken and returns an appropriate error message. Loading works the same as before.

  • DictStore

    A DictStore should implement a Python dict for arbitrary content. Saving is done using __setitem__ like store[key] = value as you would with any other dictionary. Allow values is everything that can by turned into a JSON string by the dictifier which includes all storable objects, numpy arrays (although very inefficient), classes, etc. Functions do not work yet since the dictifier cannot autodetect if this is a function of another storable object. Loading works by index (which is not encouraged) and by name. If multiple objects are stored under the same name (which is not possible in a dict!) always the last stored is returned. This effectively mimics the behaviour of a dict. Still all previously stored values are present. In essence it implements a python dict for arbitrary objects (with versioning)

    You can the list of all names objects which is stored in a dictionary name_idx. Just use store.keys(). And you can use it in iterators like for names in store. It will iterate over all names, not over all objects!

    DictStores will not be referenced by other objects in the storage since the meaning of a name may change and hence the referenced objects is not immutable. This dicts are mostly to store information about a project or allow the user to reference certain object by names of his own liking.

  • ImmutableDictStore

    As the DictStore but you can only store objects once under a specific name. There cannot be two objects wth the same name.

Immutability

Changing immutable objects is highly not recommended if not forbidden by law :) The underlying netcdf of course allows you to change objects, but the storage itself does not provide highlevel functions to do so. The main reason is that objects use references to point to other already stored objects. If you were tl change an object this will affect all referencing objects at once. The big question always is, if this is intented or not. You could in principle allow chaning an object but it would be seen as if that object was always a different one. If you now load an object that references this object its constructor might create a different object which could lead to more changes in other objects, etc. To play devils advocat there is no way to predict what will happen. Or what the loaded objects will look like. Hence: Do no change immutable objects.

If you want to do that anyway. You can (in most stores) by using _save(obj, idx) and explicitely giving an index where to store and possible replacing an existing object. Note that you will also circumvent all the caching mechanism that ensures that stored objects will not be stored again. It will look to the store as if you would store a (deep) copy of the object. Consider the following example

vst = store.volumes
v = Volume()

vs._save(v, 0) # store the object at index #0

vst.save(v)
>>> 1 # returns 1 since it was stored again with index #1

v2 = Volume()

vs._save(v2, 1) # replace the just saved object at index #1

@dwhswenson
Copy link
Member

Thanks for adding the docs to this! Merging, and I'm sure we'll refer to this when writing official documentation!

dwhswenson added a commit that referenced this pull request Feb 8, 2016
General cleanup of ObjectStore and Naming
@dwhswenson dwhswenson merged commit e14ccea into openpathsampling:master Feb 8, 2016
@jhprinz jhprinz deleted the newnaming branch July 12, 2016 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ref docs issues/PRs with discussion that might be added to docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants