Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-17981: Create type-safe heterogenous map #461

Merged
merged 9 commits into from May 8, 2019
Merged

Conversation

kfindeisen
Copy link
Member

@kfindeisen kfindeisen commented Apr 27, 2019

This pull request creates a new subpackage, lsst.afw.typehandling, containing the Storable and HeteroMap interfaces. A single reference implementation of HeteroMap, SimpleHeteroMap, is provided as well. HeteroMap is expected to be used to simplify the persistence of complex compound types such as ExposureInfo.

(Mutable)HeteroMap has all the capabilities of a Python (Mutable)Mapping, aside from some restrictions on key and value types. The C++ class is more limited; in particular, until DM-19467 there is no way to iterate over or persist a HeteroMap in C++.

Copy link
Member

@TallJimbo TallJimbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Extremely impressive work.

I've made a number of minor comments, the most substantial of which involve the question of whether it's worthwhile trying to support Storable outside of shared_ptr; I'm not opposed to keeping the unique_ptr/Storable &/clone logic, but I'm not certain we need it, and removing it would definitely lead to some simplifications.

Your use of WrapperCollection et al seemed fine to me, and I didn't spot any places where you missed obvious opportunities for code reuse, but if there was some duplication that particularly bothered you, it's quite possible I just didn't spot it.

I also don't have any great ideas for moving code outside of headers. You could try to split code that depends on K only into helper functions that could be explicitly instantiated (I think we can safely enumerate the types we want for K in advance, especially as we need to do so for Python anyway), but I'm by no means getting that code out of the headers is worth the loss in readability that would probably come from trying to split up the code along those lines.

Two other big-picture items we may want to consider before merging, or at least before others start to use these classes in earnest:

  • the set of supported primitive types: I think we want to add at least int64; I don't know if that should replace int or not;
  • the class names: this will (hopefully) be sufficiently ubiquitous that just calling the interfaces Mapping and MutableMapping or (Dict and MutableDict) might make more sense, especially for Python users for whom type heterogeneity is nothing special (and for whom "hetero" doesn't automatically imply type heterogeneity). I'm not in love with any of those names, but perhaps this is a question we could put to #dm-naming-things on Slack to see if there are other ideas?

include/lsst/afw/typehandling/HeteroMap.h Outdated Show resolved Hide resolved
include/lsst/afw/typehandling/HeteroMap.h Outdated Show resolved Hide resolved
include/lsst/afw/typehandling/HeteroMap.h Outdated Show resolved Hide resolved
include/lsst/afw/typehandling/HeteroMap.h Outdated Show resolved Hide resolved
include/lsst/afw/typehandling/HeteroMap.h Outdated Show resolved Hide resolved
include/lsst/afw/typehandling/Storable.h Show resolved Hide resolved
include/lsst/afw/typehandling/Storable.h Show resolved Hide resolved
include/lsst/afw/typehandling/python.h Show resolved Hide resolved
return py::cast(value);
}
// HeteroMap::get returns the same reference, depend on RVP to avoid double-deletion
py::object operator()(std::unique_ptr<Storable> const& pointer) { return py::cast(*pointer); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be another reason to just drop support for unique_ptr<Storable>/Storable &. Even with RVP protection, a Python caller could:

  • retrieve and hold an element corresponding to a Storable held by unique_ptr;
  • delete that element from the map via erase/__delitem__, deallocating it;
  • attempt to use the (dangling) element retrieved earlier.

In other words, pybind11::reference_internal only protects you from the case where the object is deleted because the parent object is deleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... we can't use copy because you should be able to modify mutable Storable objects from Python and have the map reflect that. I don't like forcing C++ users to hold everything by shared_ptr, especially because of potential aliasing problems, but that might be the least of the evils here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it comes down to copy (which would actually need to invoke clone, of course) vs only supporting shared_ptr to Storable. This is the kind of fundamental mismatch between Python and C++ that makes me really want to strongly encourage all Storables to be immutable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, I'm not sure this is a problem. You say it's a mismatch between Python and C++, but you could follow the exact same steps in C++ (including for standard library containers, which return deletable elements by reference). If we can't defend against such code in C++, I'm not sure there's much point in worrying about it in Python, either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the difference is that undefined behavior is something C++ developers are familiar with and (should!) always guard against. Python developers have the expectation that the worst they can do is cause a runtime exception to be raised.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktlim, I suppose? I wasn't really familiar with that language either.

Copy link
Member Author

@kfindeisen kfindeisen May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktlim, we'd like an exception to the rule

The reference_internal policy SHALL be used for functions (or properties) giving write access to internal data members

In this case, we're returning a non-const reference to a map element (consistent with idiomatic C++), but we're worried that might lead to broken references if somebody calls the pybind11 getter, saves the result, and then deletes the element from the map. We wish to use the copyautomatic policy instead.

We're not worried about being able to modify the values where this bug could come up (in particular, it could only happen if the element was originally added to the map from the C++ side).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I'd claim that you are not giving write access to a data member: you are giving write access to a component of a data member and so formally no exception is required.

In any case, you are authorized to use a safer return value policy (I believe the current code does use copy rather than automatic). However, I think it would be good to document the differing behavior between Python and C++ wherever possible and reasonable, on both sides of the language divide.

And, just checking, this is primarily about get() and __getitem__(), not the original operator() line that this is attached to, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the code in question got refactored out after @TallJimbo did his review, but the underlying issue is with how the getters behave.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've appended the following text to the (in-Python) type explanation for GenericMap, MutableGenericMap, and SimpleGenericMap:

As a safety precaution, ~lsst.afw.typehandling.Storable objects that are
added from C++ may be copied when you retrieve them from Python, making it
impossible to modify them in-place. This issue does not affect objects that
are added from Python, or objects that are always passed by
:cpp:class:shared_ptr in C++.

python/lsst/afw/typehandling/_HeteroMap.py Outdated Show resolved Hide resolved
@kfindeisen
Copy link
Member Author

kfindeisen commented Apr 30, 2019

Oops, I forgot to respond to your general comments:

the set of supported primitive types: I think we want to add at least int64; I don't know if that should replace int or not

Theoretically we should just support all the primitives. However, boost::variant tends to be buggy with types that are implicitly convertible to each other (I even had some problems with int vs. bool that I'm not sure how I fixed), so we should be cautious. I don't think that std::variant will have the same problem, given that it even allows const and non-const versions of the same type.

the class names: this will (hopefully) be sufficiently ubiquitous that just calling the interfaces Mapping and MutableMapping or (Dict and MutableDict) might make more sense, especially for Python users for whom type heterogeneity is nothing special (and for whom "hetero" doesn't automatically imply type heterogeneity). I'm not in love with any of those names, but perhaps this is a question we could put to #dm-naming-things on Slack to see if there are other ideas?

I don't think Mapping is a good name (it might lead to confusion with the standard library type, especially since I couldn't make HeteroMap an actual Mapping). I'm reluctant to open the can of worms that is #dm-naming-things, but I think either that or an RFC is inevitable at some point.

@kfindeisen
Copy link
Member Author

Unit tests don't yell at me for adding std::int64_t as well as int. I also tried replacing it with std::int32_t (which is equal to int on my system) and didn't get any complaints, so it should be good cross-platform.

Should the variant be int and std::int64_t, or std::int32_t and std::int64_t?

@TallJimbo
Copy link
Member

TallJimbo commented Apr 30, 2019

I'd vote for int32_t and int64_t.

@kfindeisen
Copy link
Member Author

Following discussion on #dm-naming-things, the classes have been renamed to GenericMap, MutableGenericMap, and SimpleGenericMap. All comments, variables, commit messages, etc. have been reworded analogously.

The current interface includes only single-key operations, and does not (yet)
support iteration.
While developing SimpleGenericMap, I found that letting GenericMap store
type information internally (via Key) made the interface prohibitively
complex, with dozens of methods that had to be implemented by each
subclass. Therefore, GenericMap relies on RTTI, making it not entirely
type-safe (though users shouldn't see any difference
besides performance).
Given Storable's purpose, the wrapper is specifically designed to
allow Python subclasses.
GenericMap and MutableGenericMap are wrapped as a Mapping and
MutableMapping, respectively. Python unit tests check conformity
with dict.
PolymorphicValue is a more specialized class that can be used by
GenericMap as an ordinary value type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants