New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-17981: Create type-safe heterogenous map #461
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Extremely impressive work.
I've made a number of minor comments, the most substantial of which involve the question of whether it's worthwhile trying to support Storable
outside of shared_ptr
; I'm not opposed to keeping the unique_ptr
/Storable &
/clone
logic, but I'm not certain we need it, and removing it would definitely lead to some simplifications.
Your use of WrapperCollection
et al seemed fine to me, and I didn't spot any places where you missed obvious opportunities for code reuse, but if there was some duplication that particularly bothered you, it's quite possible I just didn't spot it.
I also don't have any great ideas for moving code outside of headers. You could try to split code that depends on K
only into helper functions that could be explicitly instantiated (I think we can safely enumerate the types we want for K
in advance, especially as we need to do so for Python anyway), but I'm by no means getting that code out of the headers is worth the loss in readability that would probably come from trying to split up the code along those lines.
Two other big-picture items we may want to consider before merging, or at least before others start to use these classes in earnest:
- the set of supported primitive types: I think we want to add at least
int64
; I don't know if that should replaceint
or not; - the class names: this will (hopefully) be sufficiently ubiquitous that just calling the interfaces
Mapping
andMutableMapping
or (Dict
andMutableDict
) might make more sense, especially for Python users for whom type heterogeneity is nothing special (and for whom "hetero" doesn't automatically imply type heterogeneity). I'm not in love with any of those names, but perhaps this is a question we could put to #dm-naming-things on Slack to see if there are other ideas?
return py::cast(value); | ||
} | ||
// HeteroMap::get returns the same reference, depend on RVP to avoid double-deletion | ||
py::object operator()(std::unique_ptr<Storable> const& pointer) { return py::cast(*pointer); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be another reason to just drop support for unique_ptr<Storable>
/Storable &
. Even with RVP protection, a Python caller could:
- retrieve and hold an element corresponding to a
Storable
held byunique_ptr
; - delete that element from the map via
erase
/__delitem__
, deallocating it; - attempt to use the (dangling) element retrieved earlier.
In other words, pybind11::reference_internal
only protects you from the case where the object is deleted because the parent object is deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... we can't use copy
because you should be able to modify mutable Storable
objects from Python and have the map reflect that. I don't like forcing C++ users to hold everything by shared_ptr
, especially because of potential aliasing problems, but that might be the least of the evils here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think it comes down to copy (which would actually need to invoke clone
, of course) vs only supporting shared_ptr
to Storable
. This is the kind of fundamental mismatch between Python and C++ that makes me really want to strongly encourage all Storables
to be immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, I'm not sure this is a problem. You say it's a mismatch between Python and C++, but you could follow the exact same steps in C++ (including for standard library containers, which return deletable elements by reference). If we can't defend against such code in C++, I'm not sure there's much point in worrying about it in Python, either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the difference is that undefined behavior is something C++ developers are familiar with and (should!) always guard against. Python developers have the expectation that the worst they can do is cause a runtime exception to be raised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ktlim, I suppose? I wasn't really familiar with that language either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ktlim, we'd like an exception to the rule
In this case, we're returning a non-const
reference to a map element (consistent with idiomatic C++), but we're worried that might lead to broken references if somebody calls the pybind11 getter, saves the result, and then deletes the element from the map. We wish to use the copy
automatic
policy instead.
We're not worried about being able to modify the values where this bug could come up (in particular, it could only happen if the element was originally added to the map from the C++ side).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I'd claim that you are not giving write access to a data member: you are giving write access to a component of a data member and so formally no exception is required.
In any case, you are authorized to use a safer return value policy (I believe the current code does use copy
rather than automatic
). However, I think it would be good to document the differing behavior between Python and C++ wherever possible and reasonable, on both sides of the language divide.
And, just checking, this is primarily about get()
and __getitem__()
, not the original operator()
line that this is attached to, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the code in question got refactored out after @TallJimbo did his review, but the underlying issue is with how the getters behave.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've appended the following text to the (in-Python) type explanation for GenericMap
, MutableGenericMap
, and SimpleGenericMap
:
As a safety precaution,
~lsst.afw.typehandling.Storable
objects that are
added from C++ may be copied when you retrieve them from Python, making it
impossible to modify them in-place. This issue does not affect objects that
are added from Python, or objects that are always passed by
:cpp:class:shared_ptr
in C++.
Oops, I forgot to respond to your general comments:
Theoretically we should just support all the primitives. However,
I don't think |
Unit tests don't yell at me for adding Should the variant be |
I'd vote for |
Following discussion on |
The current interface includes only single-key operations, and does not (yet) support iteration.
While developing SimpleGenericMap, I found that letting GenericMap store type information internally (via Key) made the interface prohibitively complex, with dozens of methods that had to be implemented by each subclass. Therefore, GenericMap relies on RTTI, making it not entirely type-safe (though users shouldn't see any difference besides performance).
Given Storable's purpose, the wrapper is specifically designed to allow Python subclasses.
f60b134
to
f83bbd0
Compare
GenericMap and MutableGenericMap are wrapped as a Mapping and MutableMapping, respectively. Python unit tests check conformity with dict.
PolymorphicValue is a more specialized class that can be used by GenericMap as an ordinary value type.
f83bbd0
to
e9f231f
Compare
This pull request creates a new subpackage,
lsst.afw.typehandling
, containing theStorable
andHeteroMap
interfaces. A single reference implementation ofHeteroMap
,SimpleHeteroMap
, is provided as well.HeteroMap
is expected to be used to simplify the persistence of complex compound types such asExposureInfo
.(Mutable)HeteroMap
has all the capabilities of a Python(Mutable)Mapping
, aside from some restrictions on key and value types. The C++ class is more limited; in particular, until DM-19467 there is no way to iterate over or persist aHeteroMap
in C++.