Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-13358: Initial composites implementation #12

Merged
merged 49 commits into from Feb 20, 2018
Merged

DM-13358: Initial composites implementation #12

merged 49 commits into from Feb 20, 2018

Conversation

timj
Copy link
Member

@timj timj commented Feb 5, 2018

No description provided.

elif cls._pytypeName == "dict":
pytype = dict
elif cls._pytypeName == "list":
pytype = list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have these as special cases? If we really need that, can't we use builtins somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a special case because the else clause tries to import them and you can't import dict. I hadn't thought about using dir(builtins) and seeing if the item exists there and checking it's a type and not a builtin_function_or_method.

def __init__(self, name, bases, dct):
super().__init__(name, bases, dct)
if hasattr(self, "name"):
StorageClass.subclasses[self.name] = self

@property
def pytype(cls): # noqa N805
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I had misunderstood how numpydoc worked with properties. I hadn't realized that I put a mention in the class docs but then the actual docs in the property.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this is in the metaclass lazy loading so I wasn't sure how to document the metaclass at all.


Raises
------
e : `KeyError`
If a storage class has already been registered with
storageClassName.
"""
newtype = makeNewStorageClass(storageClassName, pytype, components)
newtype = makeNewStorageClass(storageClassName, **classAttr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that makeNewStorageClass already creates the class, I think I'd rather call it outside registerStorageClass and pass the newly created class to it, rather than forwarding its constructor arguments.

So having: def registerStorageClass(storageClass) (or perhaps the type, instead of the instance).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely the type, since the mappingFactory returns the instance.

# Fill in other items
scItems["components"] = components

self.storageClassFactory.registerStorageClass(name, scItems)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply pass **info? After swapping out the components. Perhaps renaming it storageClassKwargs or something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a real shame we aren't using underscores rather than camel case 😄

refType = scComps[comp].pytype

if refType and not isinstance(result, refType):
raise TypeError("Got type {} from formatter but expected {}".format(type(result), refType))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is necessary, but probably also insufficient. It fails if the component happens to be of the same type as the parent. How do we know if it actually read the component, rather than the parent?
One possible solution would be to do an (md5) hash or something, but this cannot be done in the Datastore unless it keeps a registry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the parent has composites that are of the same class then this check will not be sufficient, but that is conceptually quite hard to spot.

requested.remove(c)
sc = storageClass.components[c]
if hasattr(self, c):
components[c] = (getattr(self, c), sc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can't make this dissasembler more general? Seems like extracting components based on attributes is probably almost always what we want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. daf_persistence has a generic disassembler that checks for standard get methods as well as properties.

@@ -1,7 +1,7 @@
#
# LSST Data Management System
#
# Copyright 2008-2018 AURA/LSST.
# Copyright 2018 AURA/LSST.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the new default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new file written from scratch can't have a copyright year from before it existed.

self.assertEqualMetrics(metrics, metricsOut)

# Get a component even though it was written without
summary = datastore.get("{}#{}".format(uri, "summary"), storageClass=storageClass)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is ok I think.

with self.assertRaises(ValueError):
# invalid storage class
datastore.get(uri="file:///non_existing.fits", storageClass=object, parameters=None)
datastore.get(uri="file:///non_existing.json", storageClass=object, parameters=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be careful that we don't start relying on file extensions anywhere (e.g. object stores don't have them).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not looking at file extensions anywhere at the moment. That get will fail regardless.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do now support file extensions inside formatters because some I/O libraries get upset if you don't have .fits (for example). The hint is treated as a hint and you get back a URI with the right extension.

@@ -28,11 +28,51 @@

class StorageClassMeta(type):

# Caches of imported code objects.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still actually need the metaclass and its registry of subclasses? If not, we should remove it, and move its methods to regular StorageClass class methods, to reduce complexity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a metaclass to support lazy loading of class attributes. The subclasses stuff is not used at the moment but I have wondered if it might be useful (will we ever need to locate a storage class that hasn't been put in the storage class registry). Can remove it easily enough once the code settles down.

The python type is now materialized on demand rather than during
class creation. As part of this change the code is now more generic
to allow more simpler adding of additional class attributes
to StorageClass.
Added assembler and disassebler implemntations for a simple
test class. Put and get work.
Uses builtins to look up the string.
Rather than hand-crafting.
Rather than being test code, the assembler and disassembler
can now attempt to run based on knowing the component names
and checking the composite to see if it has associated getters
or setters.
Mainly so that we can access components in formatters.
The formatter should know the type that it is expected to create
when a component is being requested from a composite.
Remove the example version and create a generic one that can
attempt to create objects of the correct type.
This allows components of a composite to be written with paths
that include the same root and provides a predictable scheme
for constructing file names for components.
This allows components to be removed individually.
All the file-based formatters are doing the same thing so
refactor with base class that knows about files in general and
subclasses that know the specifics of serializing to FITS/JSON/Pickle.
Also allow for afw to be missing.
"""
requested = set(storageClass.components.keys())
expItems = requested & EXPOSURE_COMPONENTS
expInfoItems = requested & EXPOSURE_INFO_COMPONENTS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a sanity check for requested components that are not in the union of the supported sets?

cls = storageClass.pytype

# Should we check that the storage class components match or are a superset
# of the items described in the supplied components?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the former I would say.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Has to be a superset though since you should be allowed to assemble an exposure from N of the M components if the pytype can support that.


if failed:
raise ValueError("Unhandled components during assembly ({})".format(failed))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that the decomposition returns the components again? Might be expensive to check though. So perhaps better done at the unittest level for each (dis)assembler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think asking the assembler to call the disassembler to compare components is going to be more than we need and we should have unittests for each assembler/disassembler that does that.


extension = None
"""Default file extension to use for writing files. None means that no
modifications will be made to the supplied file extension."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document this in the class level attributes section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return inMemoryDataset

def read(self, fileDescriptor):
"""Read a `Exposure` from a FITS file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this specific to Exposure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because during the code refactoring I didn't notice that. Will fix.

"""Metaclass used by `StorageClass`.

Implements lazy loading of class attributes, allowing datastores to
delay loading of external code until it is needed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still don't understand why this can't be handled in StorageClass itself and has to be in the metaclass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a class attribute. You have to be able to lazy load the class type without instantiating a StorageClass instance. If you want it to be an instance attribute then I can switch it to being a standard @property lazy loader.

def disassembler(self):
"""Function object to use to disassembler a type into components."""
return type(self).disassembler

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more elegant if e.g. SourceCatalog and the like were instances of StorageClass.
Then we won't need the metaclass. Additionally we could have assemble and disassemble be instance methods on a separate object that is passed to the StorageClass constructor (as discussed above). Right now the structure feels a bit forced with too many layers of inheritance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are instances of StorageClass (that's what Factory does). If you don't need StorageClass.pytype to work then this can be simplified.

Object contents in the form of a dict with keys corresponding
to object attributes.
"""
return self.exportAsDict()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have both?

Copy link
Member Author

@timj timj Feb 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

History. Although if this was production code and not a test class, it might be useful to have the public interface and the private one.


def testConstructor(self):
datastore = PosixDatastore(config=self.configFile)
self.assertIsNotNone(datastore)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably test also that the factory function returns the proper PosixDatastore instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is your original test code so I haven't looked at it at all.

# Simple check of WCS
bbox = lsst.afw.geom.Box2I(lsst.afw.geom.Point2I(0, 0),
lsst.afw.geom.Extent2I(9, 9))
self.assertWcsAlmostEqualOverBBox(wcs, exposure.getWcs(), bbox)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the actual component values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing the Wcs values. I don't necessarily know how to test every single Exposure component but I can default to seeing if == works.

None of the code is used because the tests currently use
the standard formatters.
Significantly simplifies formatters because they can now rely
on Assemblers doing the right thing for the given StorageClass.

Since monolithic files vs separate files is no longer known
to the YAML config file, for now use special Assembler classes
with disabled disassembler methods to decide whether to write
a composite as a single file.
This allows us to remove the metaclass
Most of the Assembler methods needed to know the storage class.
Therefore StorageClass now returns an assembler with an instance
of itself attached. This simplifies the interface.
@timj timj merged commit 75fb57f into master Feb 20, 2018
@ktlim ktlim deleted the tickets/DM-13358 branch August 25, 2018 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants