-
-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Eigen nullaryExpr for rvalue indexing with multi_index #3046
Conversation
Should we wait to do this till Eigen 3.4 comes out? Because then I think we can use the slices and index views for this |
Do these need to be returned using |
What is the typical timeline for Eigen releases? Weeks or months after RC? We will also have to wait for RcppEigen to upgrade, though I guess we can help with that. If this brings 10% of speedup, I think its worth doing now. |
…stan into nullary_expr_indexing
No. This is for multi indexing that can not use slices anyway. Once we do have Eigen 3.4 we can use those for indices, but I don't think it will affect performance.
Correctness-wise it is not strictly necessary. These are using a lambda to capture the variable that would otherwise be local and would need holder. Since |
See the docs under "Array of indices" which is exactly what we want to do here. 3.4 should come out pretty soon (they are on the RC right now) and then we would just pass the std vector for multi-indexing to the |
I see. I expect using "Array of indices" would perform exactly as fast as what I have in this PR using |
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.14.6 BuildVersion: 18G8022CPU: G++: Clang: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple review comments. I'm fine with this PR but imo I'd rather just wait and use Eigen's indexing stuff.
src/stan/model/indexing/rvalue.hpp
Outdated
[name, &idx](auto& x_ref) { | ||
return plain_type_t<EigMat>::NullaryExpr( | ||
idx.ns_.size(), x_ref.cols(), | ||
[name, &idx, &x_ref](Eigen::Index i, Eigen::Index j) { | ||
return x_ref.coeff(idx.ns_[i] - 1, j); | ||
}); | ||
}, | ||
stan::math::to_ref(x)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need name in either of these lambdas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Removed.
return v_ref.coeff(idx.ns_[i] - 1); | ||
}); | ||
}, | ||
stan::math::to_ref(v)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the to_ref
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indexing with multi index is often used with index much larger than the indexed vector. So each element of the vector is used multiple times in the output. This to ref makes sure that if the indexed vector is expression each element is calculated only once.
const int n = idx.ns_[i]; | ||
math::check_range("matrix[multi] row indexing", name, x_ref.rows(), n); | ||
x_ret.row(i) = x_ref.row(n - 1); | ||
math::check_range("matrix[multi] row indexing", name, x.rows(), idx.ns_[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check range happen in the lambda like the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We only need this check once per row of the result. In lambda it would happen once for each element of the result.
inline auto rvalue(EigVec&& v, const char* name, const index_multi& idx) { | ||
return stan::math::make_holder( | ||
[name, &idx](auto& v_ref) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If idx
is an temporary could this fall out of scope before it's executed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general C++ code yes. In C++ generated by stanc no. This is the same as for any function in math. So as not to litter whole math library with holders we have decided to ignore such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm following, if stanc generated
auto blah = rvalue(rvalue(v, "blah1", index_multi(std::vector<int>{1, 2, 3}), "blah1", index_uni(3)))
Wouldn't the index_multi
fall out of scope here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. It stays in scope until everything in the same line of code is executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me play with this on godbolt for a minute and if I can't get it to break then I'm cool with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so I'm cool with this, as long as we are agreeing that we won't ever use auto
in transformed data/ transformed parameters in the compiler because then we hit a no-no. But don't we want auto
in the compiler for the OpenCL stuff? How less performant is it to just make the multi index have a perfect forwarding template and forward the object along into the lambda?
So as not to litter whole math library with holders we have decided to ignore such cases.
Do you have a link to where that decision was made at? tmk didn't we want to add those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But don't we want auto in the compiler for the OpenCL stuff?
No need. We can directly use matrix_cl
. As for expressions we can handle them more or less exactly as Eigen ones.
How less performant is it to just make the multi index have a perfect forwarding template and forward the object along into the lambda?
That would not work if the same index is used multiple times.
Do you have a link to where that decision was made at? tmk didn't we want to add those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright then I'm good with this
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.14.6 BuildVersion: 18G8022CPU: G++: Clang: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops hang on one sec, I'm looking over the stanc code for closures right now which uses auto and just want to check that is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
false alarm, that auto
is only used to construct the closure so this wouldn't effect it
Submission Checklist
./runTests.py src/test/unit
make cpplint
Summary
By using Eigen
nullaryExpression
for indexing withmulti_index
we can have these indexing functions propagate expressions, potentially reducing the number of times memory needs to be accessed.I have measured that there is no slown for simple indexing like
a[b]
. The speedup fora[b] + a[c]
is around 15% for prim and 10% for rev signature.Intended Effect
Speedup indexing with
multi_index
when used in expressions.How to Verify
Run tests for
rvalue
. Benchmark indexing withmulti_index
.Side Effects
None.
Documentation
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Tadej Ciglarič
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: