New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-42928: Add methods to get and set nested dictionaries consistently. #82
Conversation
These conform to a new typing.Protocol being added to lsst.pipe.base.
1515676
to
d7875cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay. I do wonder about round tripping of keys that contain "."
.
""" | ||
try: | ||
value = self.getScalar(key) | ||
except KeyError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is extremely slow on macOS ARM chips. Historically, most of the places where Apple is much slower on ARM than Intel involved loops where daf_base was being called in a loop and an exception was thrown from C++. We found that
if key in self:
value = self.getScalar(key)
was much faster (with the documented caveat about whether you want hierarchical keys to work here, since exists()
is also an option).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I hadn't realized this would be a C++ exception under the hood, but of course it is. Will fix.
value = self.getScalar(key) | ||
except KeyError: | ||
return {} | ||
return value.toDict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can presumably raise if key does not point to a PropertySet. Should that be mentioned in the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, worth doing. The protocol can't specify that behavior, but I can at least say what happens in this case.
Parameters | ||
---------- | ||
key : `str` | ||
String key associated with the mapping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an implicit notion that key
can not include .
in any implementation. Should that be documented anywhere?
If you do pl.set_dict("a.b.c", some_dict)
does pl.get_dict("a.b.c")
work? I don't think it round trips with dots in there. If it doesn't round trip maybe it should raise early if there is a .
in the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll switch to exists
, and document in the protocol that keys with "." are not supported and behavior if they are present them is implementation-defined. I don't want to to look for .
at every possible level of the hierarchy when it'd be a waste of time in the usual case.
name: str | ||
for name in self.getOrderedNames(): | ||
levels = name.split(".") | ||
if levels[0] == key: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in theory you can break out the loop if you've matched in a previous loop but then next time around the key does not match (since the ordered names means that key
is guaranteed not to appear again). Not sure if it matters unless the FITS header is huge.
This does make me think that when we read HIERARCH headers we should have been replacing the spaces with .
.
Also, I was trying to see what happened if you added a PropertySet to a PropertyList twice. It's not pretty.
>>> from lsst.daf.base import PropertyList, PropertySet
>>> pl = PropertyList()
>>> pl["A"] = "B"
>>> ps = PropertySet.from_mapping({"B": "C", "D": "E"})
>>> ps2 = PropertySet.from_mapping({"X": "Y", "Z": "W"})
>>> pl.set("AA", ps)
>>> pl.set("AA", ps2)
>>> print(pl)
A = "B"
AA.D = "E"
AA.B = "C"
AA.Z = "W"
AA.X = "Y"
>>> pl.add("AA", ps)
>>> print(pl)
zsh: segmentation fault python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking out of the loop because we expect the call to set_dict
to be atomic also makes it more directly thread-unsafe (rather than merely thread-unsafe because it's probably calling thread-unsafe things). I'm inclined to leave it as it is, since we're doing an O(N) operation in getting the full set of keys to iterate over already.
This is all making me slightly interested in writing a replacement for PropertyList
as our in-memory thing that maps to FITS headers and really uninterested in fixing PropertyList
itself. I think right now I just have to hold my nose and get back to other things.
SQUASH; DO NOT MERGE.
SQUASH; DO NOT MERGE.
These conform to a new typing.Protocol being added to lsst.pipe.base.