New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-15377: More robust parameter handling #83
Conversation
timj
commented
Aug 30, 2018
- Parameters must be defined in a storage class
- Parameters for virtual composites are forwarded to the component
- Parameters are handled by InMemoryDatastore via assemblers.
- Assemblers are called for any parameters not supported by a formatter.
dbad9e1
to
81659ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Only non-trivial comments are about whether we could simplify the implementation a bit by being less flexible (and musing about what kinds of flexibility we actually we need).
if parameters: | ||
unusedParams = {k: v for k, v in parameters.items() if k not in usedParams} | ||
else: | ||
unusedParams = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know whether we should be splitting up responsibility for parameters like this, as opposed to:
- requiring the combination of all component getters to use all parameters
- passing even used parameters to the assembler, assuming they're no-ops when applied repeatedly.
Any of these approaches actually works for the only concrete parameters we have for images (because anything that handles bbox
should also handle origin
, so those are never split up). Row-slicing parameters for catalogs (which don't exist, but could) probably rule out applying the same parameters repeatedly, unless we have some xy0-like way of assigning absolute indices to rows. Column-selection parameters for catalogs (which also don't exist) could probably work with any of these approaches.
So, I don't have a reason not to take this approach. But the second (require all parameters to be used by at least one component) might be the safer approach (just because it's more restrictive) until/unless we have a use case for more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pass all parameters to get()
then datastore can no longer verify against the list of known parameters for that component without every component having to know about the composites that it might at some point be part of. I don't think we want to require that components know all possible composites they could belong to. Passing all parameters to the assembler could work if we know for sure that applying them twice will do no harm (another option is to pass all in to the assembler but also provide the list of those already applied to components, allowing the assembler class to know which might be relevant). I also wondered whether the assembler itself should take parameters rather than doing the assembly and then applying the parameters.
The main problem we have is that parameters have to be applicable to concrete composites and virtual composites with or without formatters being involved. Therefore the assembler has to be able to handle all parameters. handleParameters
is called by InMemoryDatastore and for virtual composite assembly, and PosixDatastore calls it for parameters that have explicitly been rejected by the formatters (otherwise generic formatters such as YAMLFormatter have to know about parameters).
ie I think it's a tricky problem. For now would you be happy if I changed handleParameters so that it had an "already applied" argument that could be used for virtual composites?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually unhappy with how things are now; I just wanted to start a discussion about other possibilities. If you think the code as-is is still the best option, I'm happy to go with that.
I certainly agree that we don't want components to know about all of the composites they could be a part of. But if I understand it correctly, I think datastore validation of parameters is more of a "nice to have" than a necessity - that's just about failing earlier with a better error message, right, not correctness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, only the Exposure formatter tries to validate parameters, but then it's really the only formatter that uses parameters. I'm happy to merge as is and let it evolve a bit. It's hard to get it completely right with only one formatter using parameters and only a couple of tests for them.
inMemoryDataset : `object` | ||
Updated form of supplied in-memory dataset, after parameters | ||
have been used. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe assert that parameters is None
or an empty dict?
def knownParameters(self): | ||
"""Return set of all parameters known to this `StorageClass` | ||
|
||
The set includes parameters understood by components of a composite. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably get away with requiring the parameters of a composite to be a superset of the parameters of its components, and hence make knownParameters
always the same as parameters
. At least I can't come up with a kind of parameter for which that would not be true with the StorageClasses and parameters I envision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For virtual composites the current behavior is exactly what you'd want (components can have their own parameters distinct from the parent). Given the constraint that concrete composite assemblers have to be able to deal with all the parameters you might be right. This automated collecting of child parameters seems like the right approach in general though.
tests/test_storageClass.py
Outdated
@@ -73,6 +76,22 @@ def testCreation(self): | |||
self.assertIsInstance(sc2.pytype(), StorageClassFactory) | |||
self.assertIn("butler.core", repr(sc2)) | |||
|
|||
def testParameters(self): | |||
"""Test the we can set parameters and validate them""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: the
-> that
Does not use them yet. Parameters understood by a StorageClass are now defined in the storageClasses.yaml config.
YAML, JSON and Pickle do not support any parameters.
Assemblers have to be able to process parameters not known to formatters and also parameters used for an in memory datastore.