Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-25691: Some documentation on storage classes and formatters #327

Merged
merged 11 commits into from Sep 10, 2020

Conversation

timj
Copy link
Member

@timj timj commented Jul 16, 2020

No description provided.

@timj timj force-pushed the tickets/DM-25691 branch 4 times, most recently from 65f8349 to 7680665 Compare July 16, 2020 22:45
@timj timj marked this pull request as ready for review July 16, 2020 22:45
Copy link
Contributor

@erykoff erykoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of good information here, and a lot to digest! I think a little more introduction and summary up top would be warranted for those coming at this fresh, and a smattering of questions/suggestions throughout.

doc/lsst.daf.butler/datastores.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/datastores.rst Show resolved Hide resolved
doc/lsst.daf.butler/datastores.rst Show resolved Hide resolved
doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/datastores.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/datastores.rst Show resolved Hide resolved
doc/lsst.daf.butler/formatters.rst Show resolved Hide resolved
doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved
@rgruendl
Copy link

I have read through the documentation update. Overall it looks fine to me although I am sure there are many nuances that are basically lost on me (this is not a criticism just a statement to place my feedback in context). One item that raised a question was in the section of File vs Bytes. Is there any construct that limits the use of Bytes (besides common sense)? I am basically asking because I suspect this is the place where many an unwary user can run into problems (that might affect others than just theirselves).

@timj
Copy link
Member Author

timj commented Jul 17, 2020

@rgruendl the files vs bytes area is still a bit unexplored. If a formatter writer only ever reads and writes files everything is fine. The bytes part is an optimization for S3 (and related) where the local form of the file is transient and the bytes have to be uploaded somewhere else. As I say in a note, we will need to finesse this a little.

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understood much of it but I left few comments.

doc/lsst.daf.butler/datastores.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/datastores.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/datastores.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved
Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I didn't do a super close read (I figure the readers who are newer to this stuff will be better judges of pedagogy anyway), but I left a few comments mostly on design issues the docs raised, since I gather that was one of your goals here as well.

doc/lsst.daf.butler/datastores.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/datastores.rst Outdated Show resolved Hide resolved
doc/lsst.daf.butler/datastores.rst Show resolved Hide resolved
image: ImageI
mask: MaskX

If this approach is used the `~lsst.daf.butler.StorageClass` Python class created by `~lsst.daf.butler.StorageClassFactory` will inherit from the specific parent class and not the generic `~lsst.daf.butler.StorageClass`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StorageClass is a regular type, not a metaclass, so its instances are just regular instances, and can't inherit in a Python-type-system sense. I suspect that's not what you meant to imply, but I think that's how it reads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm implying that they are subclasses. If you use inheritsFrom the storage class you get back is a subclass. For example the ImageI storage class has this inheritance:

 |  Method resolution order:
 |      StorageClassImageI
 |      StorageClassImage
 |      StorageClass
 |      builtins.object

if you create a StorageClass directly then it is not a subclass, but StorageClassFactory works a bit harder than that.

doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved

Formatters are responsible for serializing a Python type to a storage system and for reconstructing the Python type from the serialized form.
A formatter has to implement at minimum a `~lsst.daf.butler.Formatter.read()` method and a `~lsst.daf.butler.Formatter.write()` method.
The ``write()`` method takes a Python object and serializes it somewhere and the ``read()`` method is optionally given a component name and returns the matching Python object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bigger than docs, of course, but I have always been a bit uncomfortable with Formatter as a single class with state being responsible for both reading and writing. The base class state (a FileDescriptor and a data ID) is totally reasonable for both cases, but any subclass-specific state has got to be different between read-mode and write-mode. I don't know if there's any concrete harm (other than just potential harm) from having a dual state object like that, and it's undeniably useful to have one name to refer to both, but it just doesn't feel right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we start having different formatter classes for reading and writing isn't that going to be really easy to get out of sync? You also end up with having to track twice as many classes and have to have separate entries in the config for them.

.. warning::

The formatter system has only been used to write datasets to files or to bytes that would be written to a file.
The interface may evolve as other types of datastore become available and make use of the formatter system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I had actually imagined that Formatters would simply not be used by some datastores, or that they would have an analogous concept with a totally different interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, early on we planned to experiment with different styles of formatters and so I always assumed that Formatter was a generic concept. That's why the base class is not quite right for files. The lack of prototyping in this area has caused this problem. Cleaning up the base Formatter to be an actually FileFormatter is something we can look at but should not be on this ticket.

doc/lsst.daf.butler/formatters.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@erykoff erykoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants