-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Use syrk
to compute certain dot products more quickly and accurately
#6932
Conversation
I think something might be wrong with the Also, I have tried to check to see if an array is a transpose view using an ad hoc mechanism I came up with. If somebody has a better way to check this, I am open to suggestions. |
b41860e
to
95049a0
Compare
I tried building this with OpenBLAS locally and am not seeing this same error. Also, I am seeing an appropriate speed up here on transposed arrays (e.g. ~2x faster). |
95049a0
to
520e209
Compare
Ok, I still can't say that this is totally correct, but it doesn't seem to be complaining about the |
syrk
to compute certain dot products more quickly and accuratelysyrk
to compute certain dot products more quickly and accurately
Pinging: @njsmith @charris @argriffing |
|
* Otherwise, use gemm for all other cases. | ||
*/ | ||
if ( | ||
(ap1 == PyArray_BASE(ap2)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, this check is definitely not right. The base pointer doesn't have any semantics.
I think the correct check would be something like: same data pointer, and shapes are transposes of each other, and one is C contiguous and the other is F contiguous (the Trans checks are probably sufficient for the last part).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not super familiar with the NumPy C-API. So, this was my first attempt doing something significant with it. I'll think about how to rework it.
@jakirkham Trying to figure out when one matrix is the transpose of the other is a bit tricky. Furthermore, these days we tend to design functions to operate on stacks of matrices. On account of that, I think it would be better to have explicit functions that receive a single argument and perform Other possibilities are two argument functions for |
@charris, regarding the first case (i.e. the one argument form) I think I still agree with you, which is why I had mentioned something similar earlier. ( #6794 (comment) ) That being said, it wasn't and still isn't clear to me how I was going to make traction on that problem. As the initial problem proposed by @njsmith seemed a bit simpler to get traction, I tried it here. Adding a new function or function(s) to do this may be better in the long run. I am aware of these two argument problems (including this |
R names those operations I think that's an orthogonal issue though. I don't see any reason not to optimize |
520e209
to
c52facb
Compare
This I definitely agree with, which is why I took the time to write this code at all. Regarding the NumPy optimization stuff, I believe both of you know more than me in this area so I'll let you guys hash it out. As for the checks, I think I have fixed them, but they probably should be fused into one conditional to reduce the rechecking. |
Might be good to add tests for the trickier cases like, |
c52facb
to
6e4950a
Compare
Ok, sure that's a good idea. I'll try to work on the tests tomorrow. I've tried to combine the two checks a bit so that they are not repeated for each case. |
cb441e8
to
c4dff61
Compare
@njsmith, I've added some tests that you have suggested. They have checked out locally. Travis should be done soon. If there are any more you can think of, please let me know. |
Travis checks out. |
This is looking good to me. (@charris: do you still have any objections?) @jakirkham: One last thing: this is a pretty awesome user-visible feature, so we should advertise it in the release notes. Could you add a short note to |
...I guess if you're feeling inspired you could also usefully add some benchmarks to the |
Absolutely, I added a note under new features. Though I could see it going under improvements. Please let me know if I need to move it or change wording.
Sure, I have no problem doing this, but I may need a pointer on how to go about doing this. In the worst case, I can copy and tweak existing benchmarks to serve our ends here. |
44b3077
to
0197427
Compare
Updated the benchmarks so they would be sorted in the output next to each other to make it easy to compare. |
0197427
to
0576eb1
Compare
… accurately in certain special cases.
… @ A` and `A @ A.T`.
…ata, which should be optimized, versus the same configuration without shared data.
0576eb1
to
8d8a74d
Compare
Please let me know if there is anything else needed. Otherwise, I am assuming this is complete. Also, I have tried to rebase against master a few times, but there seem to a bunch of updates going in. Instead of clogging up the build queue with rebases of this PR, I am going to let it sit. If you would like to merge and want me to rebase beforehand, feel free to give me a nudge and I will do so. |
FYI as a general rule, there's no need to rebase at all unless master has changed in a way that breaks your patch, or else you want to rewrite history for some other reason (e.g. fix commit messages or squash commits together). Anyway, I think this is ready to go -- thanks @jakirkham! |
ENH: Use `syrk` to compute certain dot products more quickly and accurately
Ah ok, I think in previous PRs that I have done to numpy I had been encouraged to rebase on master if I was out of date. I was assuming this was to keep the git graph from being a giant indecipherable mess. If that is not necessary, I will avoid doing rebases strictly to move commits onto master unless explicitly asked. Cool, thanks for your help in guiding me in this effort @njsmith. Also, thank you everyone for your suggestions and feedback. If anything else comes up with regards to this, feel free to to let me know. |
Benchmarks show the 2x improvement now (for |
Oh wow, that's super cool. I didn't realize you had an awesome benchmark viewer for NumPy. Also, it is good to see it checks out. Thanks for sharing @rgommers. |
Yes we should advertise that better. I guess @pv didn't advertise because it's on his personal site now. But it's pretty cool indeed. |
For those who plan to use this on Mac with OpenBLAS, something to be aware of ( OpenMathLib/OpenBLAS#730 ). This does not affect Linux. |
Turns out this actually optimizes |
Fixes #6794
Related: #6930
Deals with the special cases of
a.T @ a
anda @ a.T
usingsyrk
instead ofgemm
.