New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-13358: Initial composites implementation #12
Conversation
elif cls._pytypeName == "dict": | ||
pytype = dict | ||
elif cls._pytypeName == "list": | ||
pytype = list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have these as special cases? If we really need that, can't we use builtins
somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a special case because the else
clause tries to import them and you can't import dict
. I hadn't thought about using dir(builtins)
and seeing if the item exists there and checking it's a type and not a builtin_function_or_method.
def __init__(self, name, bases, dct): | ||
super().__init__(name, bases, dct) | ||
if hasattr(self, "name"): | ||
StorageClass.subclasses[self.name] = self | ||
|
||
@property | ||
def pytype(cls): # noqa N805 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I had misunderstood how numpydoc worked with properties. I hadn't realized that I put a mention in the class docs but then the actual docs in the property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this is in the metaclass lazy loading so I wasn't sure how to document the metaclass at all.
|
||
Raises | ||
------ | ||
e : `KeyError` | ||
If a storage class has already been registered with | ||
storageClassName. | ||
""" | ||
newtype = makeNewStorageClass(storageClassName, pytype, components) | ||
newtype = makeNewStorageClass(storageClassName, **classAttr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that makeNewStorageClass
already creates the class, I think I'd rather call it outside registerStorageClass
and pass the newly created class to it, rather than forwarding its constructor arguments.
So having: def registerStorageClass(storageClass)
(or perhaps the type, instead of the instance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely the type, since the mappingFactory returns the instance.
# Fill in other items | ||
scItems["components"] = components | ||
|
||
self.storageClassFactory.registerStorageClass(name, scItems) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply pass **info
? After swapping out the components. Perhaps renaming it storageClassKwargs
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a real shame we aren't using underscores rather than camel case 😄
refType = scComps[comp].pytype | ||
|
||
if refType and not isinstance(result, refType): | ||
raise TypeError("Got type {} from formatter but expected {}".format(type(result), refType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is necessary, but probably also insufficient. It fails if the component happens to be of the same type as the parent. How do we know if it actually read the component, rather than the parent?
One possible solution would be to do an (md5
) hash or something, but this cannot be done in the Datastore
unless it keeps a registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if the parent has composites that are of the same class then this check will not be sufficient, but that is conceptually quite hard to spot.
tests/examplePythonTypes.py
Outdated
requested.remove(c) | ||
sc = storageClass.components[c] | ||
if hasattr(self, c): | ||
components[c] = (getattr(self, c), sc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can't make this dissasembler more general? Seems like extracting components based on attributes is probably almost always what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. daf_persistence
has a generic disassembler that checks for standard get methods as well as properties.
@@ -1,7 +1,7 @@ | |||
# | |||
# LSST Data Management System | |||
# | |||
# Copyright 2008-2018 AURA/LSST. | |||
# Copyright 2018 AURA/LSST. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the new default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new file written from scratch can't have a copyright year from before it existed.
tests/test_posixDatastore.py
Outdated
self.assertEqualMetrics(metrics, metricsOut) | ||
|
||
# Get a component even though it was written without | ||
summary = datastore.get("{}#{}".format(uri, "summary"), storageClass=storageClass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is ok I think.
with self.assertRaises(ValueError): | ||
# invalid storage class | ||
datastore.get(uri="file:///non_existing.fits", storageClass=object, parameters=None) | ||
datastore.get(uri="file:///non_existing.json", storageClass=object, parameters=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be careful that we don't start relying on file extensions anywhere (e.g. object stores don't have them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not looking at file extensions anywhere at the moment. That get will fail regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do now support file extensions inside formatters because some I/O libraries get upset if you don't have .fits
(for example). The hint is treated as a hint and you get back a URI with the right extension.
@@ -28,11 +28,51 @@ | |||
|
|||
class StorageClassMeta(type): | |||
|
|||
# Caches of imported code objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still actually need the metaclass and its registry of subclasses? If not, we should remove it, and move its methods to regular StorageClass
class methods, to reduce complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need a metaclass to support lazy loading of class attributes. The subclasses
stuff is not used at the moment but I have wondered if it might be useful (will we ever need to locate a storage class that hasn't been put in the storage class registry). Can remove it easily enough once the code settles down.
eda4e70
to
fb1446b
Compare
The python type is now materialized on demand rather than during class creation. As part of this change the code is now more generic to allow more simpler adding of additional class attributes to StorageClass.
Added assembler and disassebler implemntations for a simple test class. Put and get work.
Uses builtins to look up the string.
Rather than hand-crafting.
Rather than being test code, the assembler and disassembler can now attempt to run based on knowing the component names and checking the composite to see if it has associated getters or setters.
Mainly so that we can access components in formatters.
The formatter should know the type that it is expected to create when a component is being requested from a composite.
Remove the example version and create a generic one that can attempt to create objects of the correct type.
This allows components of a composite to be written with paths that include the same root and provides a predictable scheme for constructing file names for components.
This allows components to be removed individually.
All the file-based formatters are doing the same thing so refactor with base class that knows about files in general and subclasses that know the specifics of serializing to FITS/JSON/Pickle.
Also allow for afw to be missing.
0ffd68d
to
6ec10d6
Compare
""" | ||
requested = set(storageClass.components.keys()) | ||
expItems = requested & EXPOSURE_COMPONENTS | ||
expInfoItems = requested & EXPOSURE_INFO_COMPONENTS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do a sanity check for requested components that are not in the union of the supported sets?
cls = storageClass.pytype | ||
|
||
# Should we check that the storage class components match or are a superset | ||
# of the items described in the supplied components? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the former I would say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Has to be a superset though since you should be allowed to assemble an exposure from N of the M components if the pytype can support that.
|
||
if failed: | ||
raise ValueError("Unhandled components during assembly ({})".format(failed)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check that the decomposition returns the components again? Might be expensive to check though. So perhaps better done at the unittest level for each (dis)assembler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think asking the assembler to call the disassembler to compare components is going to be more than we need and we should have unittests for each assembler/disassembler that does that.
|
||
extension = None | ||
"""Default file extension to use for writing files. None means that no | ||
modifications will be made to the supplied file extension.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document this in the class level attributes section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what this is (meant) to do: https://developer.lsst.io/docs/py_docs.html?highlight=numpydoc#placement-of-constant-and-class-attribute-docstrings
return inMemoryDataset | ||
|
||
def read(self, fileDescriptor): | ||
"""Read a `Exposure` from a FITS file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this specific to Exposure
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because during the code refactoring I didn't notice that. Will fix.
"""Metaclass used by `StorageClass`. | ||
|
||
Implements lazy loading of class attributes, allowing datastores to | ||
delay loading of external code until it is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still don't understand why this can't be handled in StorageClass
itself and has to be in the metaclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a class attribute. You have to be able to lazy load the class type without instantiating a StorageClass instance. If you want it to be an instance attribute then I can switch it to being a standard @property
lazy loader.
def disassembler(self): | ||
"""Function object to use to disassembler a type into components.""" | ||
return type(self).disassembler | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more elegant if e.g. SourceCatalog
and the like were instances of StorageClass
.
Then we won't need the metaclass. Additionally we could have assemble
and disassemble
be instance methods on a separate object that is passed to the StorageClass
constructor (as discussed above). Right now the structure feels a bit forced with too many layers of inheritance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are instances of StorageClass (that's what Factory does). If you don't need StorageClass.pytype
to work then this can be simplified.
Object contents in the form of a dict with keys corresponding | ||
to object attributes. | ||
""" | ||
return self.exportAsDict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
History. Although if this was production code and not a test class, it might be useful to have the public interface and the private one.
|
||
def testConstructor(self): | ||
datastore = PosixDatastore(config=self.configFile) | ||
self.assertIsNotNone(datastore) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably test also that the factory function returns the proper PosixDatastore
instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is your original test code so I haven't looked at it at all.
# Simple check of WCS | ||
bbox = lsst.afw.geom.Box2I(lsst.afw.geom.Point2I(0, 0), | ||
lsst.afw.geom.Extent2I(9, 9)) | ||
self.assertWcsAlmostEqualOverBBox(wcs, exposure.getWcs(), bbox) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test the actual component values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is testing the Wcs values. I don't necessarily know how to test every single Exposure component but I can default to seeing if ==
works.
None of the code is used because the tests currently use the standard formatters.
Significantly simplifies formatters because they can now rely on Assemblers doing the right thing for the given StorageClass. Since monolithic files vs separate files is no longer known to the YAML config file, for now use special Assembler classes with disabled disassembler methods to decide whether to write a composite as a single file.
This allows us to remove the metaclass
0995e37
to
89e6cbb
Compare
Most of the Assembler methods needed to know the storage class. Therefore StorageClass now returns an assembler with an instance of itself attached. This simplifies the interface.
No description provided.