New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-32662: rework catalog column array access to improve type dispatch performance #618
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few suggestions and questions but looks good.
include/lsst/afw/table/Catalog.h
Outdated
_columns = _NotContiguous{}; | ||
} | ||
} | ||
auto result_ptr = std::get_if<ColumnView>(&_columns); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It strikes me that if you are caching anyway, you could probably avoid these lines on each call by having the variant directly refer to <_MabeContiguous, Optional<columns.value()>, std::nullopt> instead of introducing the _NotContiguous type., though I it really is not a huge overhead, so it does not really matter.
include/lsst/afw/table/Catalog.h
Outdated
Internal& getInternal() { return _internal; } | ||
Internal const& getInternal() const { return _internal; } | ||
Internal const & getInternal() const { return _internal; } | ||
Internal drainIntoInternal() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is drainIntoInternal
the right name here? I think it sounds more like draining something and putting it into internal.
* clear from how they are obtained that modifying them will do different | ||
* things. In Python, that clarity is lost, so we mark copied returns as | ||
* const to prevent users from thinking that by modifying the column they are | ||
* modifying the catalog; instead they just can't modify copies at all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see knowing when you can and cant modify a given column (because coping) as a new source of frustration for python users. In what cases do we expect users to be able to set table entries through a view? Is it possible to just always return a const view? This would be memory friendly, but still un-modifiable. If someone does need a mutable container for whatever they are doing, a np.array
should be pretty cheep right? Then it really does not matter when you need to copy.
} | ||
|
||
void _declare_getitem(std::string const *) { | ||
// String columns cannot be retrieved as arrays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mean that there will be no method declared for getitem based on on a string. Does this mean that someone will get a harry pybind message about overloads not supporting type with a list of all overload methods? Would it be better to actually implement the method with a lambda that raised some sort of message that is more helpful for the user?
/// A constexpr variable that evaluates to true if U is in the list (no | ||
/// checking for const, reference, mutable, etc.) | ||
template <typename U> | ||
static constexpr bool contains = std::is_same_v<T, U> || TypeList<E...>::template contains<U>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I am missing something, and it probably is not important to the review, but I cant see where contains is actually used anywhere. Don't take too much time, but if you could explain it or point me to something I would appreciate satisfying my curiosity.
60c0e18
to
defbe29
Compare
This gets in the way of Catalog knowing when it has been made noncontiguous, which is a problem when caching an associated ColumnView.
Making move delegate to copy was playing it extra super safe for backwards compatibility, but in this case it was never actually needed.
This is another loophole in Catalog being able to tell when it has been made noncontiguous, and it appears it was never actually used anyway.
This is a good idea for a host of reasons: - As long as C++ code can get a non-const reference to a Catalog, Python code can't possibly invalidate that cache reliably, because it can't tell when the Catalog is mutated. We're basically assuming that only Python code modifies a Catalog that has a cached ColumnView, and (apparently) getting away with it, but that's at some level just luck, even though typical usage patterns _should_ make it rare. - People don't expect to see a class's state split between C++ and Python (even a cache). It's unusual and surprising. - Having this cache in Python forces the implementation of __getitem__ to straddle the C++/Python boundary in an awkward way, and that's getting in the way of making it go faster. At the same time, I've made it so attempting to get the column view for a noncontiguous catalog results in an empty std::optional return, not an exception, *in C++*. In Python, the behavior is the same for backwards compatibility, but I'm pretty confident that this method was never called in C++ (aside from those Python bindings), and if not, we'll get an obvious compile failure downstream. We also cache the fact that a catalog is not contiguous, rather than recomputing it each time.
This provides more fine-grained control over the overloaded methods we wrap for Python, while also making it harder to forget less-used types (as the old code did for at least the smaller integer types we support).
Implementation was already doing this; we just weren't using it.
Combined with the other changes on this ticket, quick benchmarks show that this is: - 3.8-4.6x faster for by-name column array access (for views) - 1.2-1.5x faster for Key column array access (for views) - 2.6-4.3x faster for integer row indexing - about the same for slice and boolean-array indexing of catalogs. This wasn't exhaustively checked - some Key types will be slower than others, due to the way pybind11 overloads are dispatched, and the non-type-dispatch parts of these calls sometimes scales with the number of rows or columns. But overall this speeds things up, in some cases by a lot. By-name column access is now a bit faster than Key access, which is nice because names are much more natural in Python. That's possible because type dispatch in pure C++ with std::variant (probably a switch or vtable under the hood) is much faster than pybind11 type-conversion dispatch (a loop over all types until one matches).
In some contexts calling this "Internal" because it's the container used inside Catalog made sense, but in the method names in particular it was confusing.
This is probably a tiny bit less storage-efficient, but it's a lot more readable.
While the main purpose of this view is for Python (since Angle isn't a valid numpy dtype), it doesn't depend on any Python stuff, and this makes it more easily usable in both the Catalog and ColumnView pybind11 bindings.
cda5143
to
dd375a0
Compare
Revert "Merge pull request #618 from lsst/tickets/DM-32662"
No description provided.