Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi input concat #2232

Merged
merged 44 commits into from Oct 20, 2021
Merged

Multi input concat #2232

merged 44 commits into from Oct 20, 2021

Conversation

SimonHeybrock
Copy link
Member

@SimonHeybrock SimonHeybrock commented Oct 15, 2021

Fixes #2166.

I have chosen to keep behavior of concatenate, but deprecate it. Instead this provides concat, which matches the naming in the Python Array API specification (and also xarray). I had considered this rename for a while (not least because concatenate is very long), by doing this now we avoid breaking things twice.

Also fix/change so masks cannot be shared with inputs.

@@ -41,6 +41,10 @@ template <class T> void bind_concatenate(py::module &m) {
},
py::arg("x"), py::arg("y"), py::arg("dim"),
py::call_guard<py::gil_scoped_release>());
m.def(
"concat",
[](const std::vector<T> &x, const Dim dim) { return concat(x, dim); },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes copies of all inputs. Is it a shallow or a deep copy?
Unfortunately, I don't see a way to avoid this completely (except for ranges, see other comment) because concat takes a span as argument and Python cannot provide a contiguous memory layout. So we need to at least shallow-copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy constructors correspond to shallow copies. Deep copies would be calls to copy().

@@ -345,7 +345,7 @@ class TargetBinBuilder {

void join(const Dim dim, const Variable &coord) {
m_dims.addInner(dim, 1);
m_joined.emplace_back(concatenate(min(coord), max(coord), dim));
m_joined.emplace_back(concat(std::vector{min(coord), max(coord)}, dim));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the function parameter should be something like a pair of pointers (-> a range?) in order to avoid these kinds of allocations. Do we have access to ranges yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, next step is #2150.

Deprecations
~~~~~~~~~~~~

* ``concatenate`` and ``groupby(..).concatenate`` are deprecated. Use ``concat`` and ``groupby(..).concatenate`` instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ``concatenate`` and ``groupby(..).concatenate`` are deprecated. Use ``concat`` and ``groupby(..).concatenate`` instead.
* ``concatenate`` and ``groupby(..).concatenate`` are deprecated. Use ``concat`` and ``groupby(..).concat`` instead.

return _call_cpp_func(_cpp.buckets.concatenate, self._obj, dim)
raise RuntimeError("Reduction along all dims not supported yet.")
out = _call_cpp_func(_cpp.buckets.concatenate, self._obj, other)
return out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mark this function as deprecated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I'd avoid a deprecation cycle for bins.concatenate since it likely has extremely few users for now.

Note that the function name itself is actually not deprecated (yet). There are two overloads of bins.concatenate (and unfortunately there where mixing code here). What I did here was to rename the first overload to bins.concat. The second overload is semantically different (concatenating from 2 input arrays), and I have (I think) a better plan on how to refactor/improve it. I'll describe it in an issue.

variable/shape.cpp Outdated Show resolved Hide resolved
dataset/shape.cpp Outdated Show resolved Hide resolved
dataset/shape.cpp Outdated Show resolved Hide resolved
"Either both or neither of the inputs must be bin edges.");
} else if (a_.dims()[dim] == (dimsA.contains(dim) ? dimsA.at(dim) : 1)) {
out.emplace(key, concatenate(a_, b[key], dim));
"Either all or none of the inputs must have bin edge coordinates.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, TEST(ConcatenateTest, fail_mixing_point_data_and_histogram) {, but I'll extend it to also cover more than 2 inputs.

dataset/shape.cpp Outdated Show resolved Hide resolved
dataset/shape.cpp Show resolved Hide resolved
@SimonHeybrock SimonHeybrock merged commit 811aa16 into main Oct 20, 2021
@SimonHeybrock SimonHeybrock deleted the multi-input-concat branch October 20, 2021 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-input concatenate
2 participants