Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-42928: Add methods to get and set nested dictionaries consistently. #82

Merged
merged 4 commits into from Feb 22, 2024

Conversation

TallJimbo
Copy link
Member

These conform to a new typing.Protocol being added to lsst.pipe.base.

These conform to a new typing.Protocol being added to lsst.pipe.base.
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay. I do wonder about round tripping of keys that contain ".".

"""
try:
value = self.getScalar(key)
except KeyError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is extremely slow on macOS ARM chips. Historically, most of the places where Apple is much slower on ARM than Intel involved loops where daf_base was being called in a loop and an exception was thrown from C++. We found that

if key in self:
    value = self.getScalar(key)

was much faster (with the documented caveat about whether you want hierarchical keys to work here, since exists() is also an option).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I hadn't realized this would be a C++ exception under the hood, but of course it is. Will fix.

value = self.getScalar(key)
except KeyError:
return {}
return value.toDict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can presumably raise if key does not point to a PropertySet. Should that be mentioned in the docstring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, worth doing. The protocol can't specify that behavior, but I can at least say what happens in this case.

Parameters
----------
key : `str`
String key associated with the mapping.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an implicit notion that key can not include . in any implementation. Should that be documented anywhere?

If you do pl.set_dict("a.b.c", some_dict) does pl.get_dict("a.b.c") work? I don't think it round trips with dots in there. If it doesn't round trip maybe it should raise early if there is a . in the key.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch to exists, and document in the protocol that keys with "." are not supported and behavior if they are present them is implementation-defined. I don't want to to look for . at every possible level of the hierarchy when it'd be a waste of time in the usual case.

name: str
for name in self.getOrderedNames():
levels = name.split(".")
if levels[0] == key:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in theory you can break out the loop if you've matched in a previous loop but then next time around the key does not match (since the ordered names means that key is guaranteed not to appear again). Not sure if it matters unless the FITS header is huge.

This does make me think that when we read HIERARCH headers we should have been replacing the spaces with ..

Also, I was trying to see what happened if you added a PropertySet to a PropertyList twice. It's not pretty.

>>> from lsst.daf.base import PropertyList, PropertySet
>>> pl = PropertyList()
>>> pl["A"] = "B"
>>> ps = PropertySet.from_mapping({"B": "C", "D": "E"})
>>> ps2 = PropertySet.from_mapping({"X": "Y", "Z": "W"})
>>> pl.set("AA", ps)
>>> pl.set("AA", ps2)
>>> print(pl)
A = "B"
AA.D = "E"
AA.B = "C"
AA.Z = "W"
AA.X = "Y"

>>> pl.add("AA", ps)
>>> print(pl)
zsh: segmentation fault  python

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking out of the loop because we expect the call to set_dict to be atomic also makes it more directly thread-unsafe (rather than merely thread-unsafe because it's probably calling thread-unsafe things). I'm inclined to leave it as it is, since we're doing an O(N) operation in getting the full set of keys to iterate over already.

This is all making me slightly interested in writing a replacement for PropertyList as our in-memory thing that maps to FITS headers and really uninterested in fixing PropertyList itself. I think right now I just have to hold my nose and get back to other things.

@TallJimbo TallJimbo merged commit b0e2216 into main Feb 22, 2024
2 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-42928 branch February 22, 2024 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants