New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-25691: Some documentation on storage classes and formatters #327
Conversation
65f8349
to
7680665
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of good information here, and a lot to digest! I think a little more introduction and summary up top would be warranted for those coming at this fresh, and a smattering of questions/suggestions throughout.
I have read through the documentation update. Overall it looks fine to me although I am sure there are many nuances that are basically lost on me (this is not a criticism just a statement to place my feedback in context). One item that raised a question was in the section of File vs Bytes. Is there any construct that limits the use of Bytes (besides common sense)? I am basically asking because I suspect this is the place where many an unwary user can run into problems (that might affect others than just theirselves). |
@rgruendl the files vs bytes area is still a bit unexplored. If a formatter writer only ever reads and writes files everything is fine. The bytes part is an optimization for S3 (and related) where the local form of the file is transient and the bytes have to be uploaded somewhere else. As I say in a note, we will need to finesse this a little. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understood much of it but I left few comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I didn't do a super close read (I figure the readers who are newer to this stuff will be better judges of pedagogy anyway), but I left a few comments mostly on design issues the docs raised, since I gather that was one of your goals here as well.
doc/lsst.daf.butler/formatters.rst
Outdated
image: ImageI | ||
mask: MaskX | ||
|
||
If this approach is used the `~lsst.daf.butler.StorageClass` Python class created by `~lsst.daf.butler.StorageClassFactory` will inherit from the specific parent class and not the generic `~lsst.daf.butler.StorageClass`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StorageClass
is a regular type, not a metaclass, so its instances are just regular instances, and can't inherit in a Python-type-system sense. I suspect that's not what you meant to imply, but I think that's how it reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm implying that they are subclasses. If you use inheritsFrom the storage class you get back is a subclass. For example the ImageI storage class has this inheritance:
| Method resolution order:
| StorageClassImageI
| StorageClassImage
| StorageClass
| builtins.object
if you create a StorageClass directly then it is not a subclass, but StorageClassFactory works a bit harder than that.
|
||
Formatters are responsible for serializing a Python type to a storage system and for reconstructing the Python type from the serialized form. | ||
A formatter has to implement at minimum a `~lsst.daf.butler.Formatter.read()` method and a `~lsst.daf.butler.Formatter.write()` method. | ||
The ``write()`` method takes a Python object and serializes it somewhere and the ``read()`` method is optionally given a component name and returns the matching Python object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bigger than docs, of course, but I have always been a bit uncomfortable with Formatter
as a single class with state being responsible for both reading and writing. The base class state (a FileDescriptor
and a data ID) is totally reasonable for both cases, but any subclass-specific state has got to be different between read-mode and write-mode. I don't know if there's any concrete harm (other than just potential harm) from having a dual state object like that, and it's undeniably useful to have one name to refer to both, but it just doesn't feel right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we start having different formatter classes for reading and writing isn't that going to be really easy to get out of sync? You also end up with having to track twice as many classes and have to have separate entries in the config for them.
.. warning:: | ||
|
||
The formatter system has only been used to write datasets to files or to bytes that would be written to a file. | ||
The interface may evolve as other types of datastore become available and make use of the formatter system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I had actually imagined that Formatters
would simply not be used by some datastores, or that they would have an analogous concept with a totally different interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, early on we planned to experiment with different styles of formatters and so I always assumed that Formatter was a generic concept. That's why the base class is not quite right for files. The lack of prototyping in this area has caused this problem. Cleaning up the base Formatter to be an actually FileFormatter is something we can look at but should not be on this ticket.
9ba0160
to
03852a3
Compare
read-only is now derived
This is driven by the fact that the class no longer just does assembly but also handles derived components and read parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
No description provided.