Skip to content

Complex dot() #1106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 12, 2021
Merged

Complex dot() #1106

merged 6 commits into from
Nov 12, 2021

Conversation

emmatyping
Copy link
Contributor

@emmatyping emmatyping commented Nov 6, 2021

This uses BLAS zgemm/cgemm to do complex matrix matrix multiplication.

I know there probably should be many more tests (the ones I added were more just a brief correctness check if nothing else) but I thought I'd open this to get feedback on whether or not something like this approach seems reasonable.

I'm mainly focusing on matrix matrix products as that is what I am using, but I can follow up with matrix vector and vector vector product implementation if there is interest.

Fixes #158

Copy link
Member

@bluss bluss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. I guess we need to document somewhere that what gemm operations we support with matrixmultiply vs blas

@jturner314
Copy link
Member

jturner314 commented Nov 6, 2021

Fwiw, in the absence of BLAS, falling back to four matrix multiplications with matrixmultiply and a couple of additions ((A + B i) (C + D i) = (AC - BD) + (AD + BC) i, where A, B, C, and D are real) may be faster than falling back to mat_mul_general.

@emmatyping
Copy link
Contributor Author

Good point! I can add that as well if you want.

@jturner314
Copy link
Member

If you decide to do so, PR #1029 would make it easier and is useful functionality on its own anyway. Feel free to take over work on that if you'd like; I'm very busy right now.

@emmatyping
Copy link
Contributor Author

I actually could use that functionality too (I implemented it myself for my own uses, but want to upstream things...) so I will go ahead and take over that PR and tackle it first!

#[cfg(feature = "blas")]
use crate::dimension::offset_from_low_addr_ptr_to_logical_ptr;
use crate::imp_prelude::*;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We don't need to change anything, just talking). It's just weird that tools would move down the implementation prelude import. They are wrong, that import should clearly be first. (Probably accomplished by having it with a blank line after).

Copy link
Collaborator

@adamreichold adamreichold Nov 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I have always liked that Rustfmt sorts use statements. Whether sorting them alphabetically is good or a bad is a reasonable discussion to have, but as usual with aesthetics, just the fact that the decision is automatic seems to bring almost all of the benefit, i.e. no more tedious discussions on how to sort here or there or writing up rules as a company-wide guideline.

Additionally, I find the ecosystem-wide readability resulting from the common application of Rustfmt's defaults highly beneficial to upstream collaboration.

(Especially since the workaround using a blank line really seems like a good idea if the import actually is special among the others next to it.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, it is a bit strange. Perhaps so other imports don't silently clobber items brought in from the prelude?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good @adamreichold. You'll notice that I'm not yet very fond of rustfmt, unfortunately. I'd be happier if it could accept "multiple good ways" to format the same code.

@bluss
Copy link
Member

bluss commented Nov 8, 2021

I think I want to add fallback implementations of complex gemm into matrixmultiply. That might get a ball rolling and should perform ok-ish to start with.

Reusing dgemm seems like good way to go too, but in practice I think it has a problem with the general matrix multiply and it's alpha muliplier? What's to be done if imag(alpha) != 0? Then it needs extra steps after calling dgemm, if I'm not mistaken.

For some reason I came across this, that some libraries reuse their dgemm by using a multiplication reduction algorithm. That's interesting but seems like quite the project. (paper link). Just using it naively has massive memory overhead.

bild

@bluss
Copy link
Member

bluss commented Nov 8, 2021

This PR would be fine without addressing matrixmultply or fallback much at all, of course. Just saying that anything extra can be a separate PR.

@emmatyping emmatyping marked this pull request as ready for review November 9, 2021 04:56
@emmatyping emmatyping changed the title [WIP] Complex dot() Complex dot() Nov 9, 2021
@emmatyping
Copy link
Contributor Author

Yeah I think I will leave the followup to another PR. I've fixed the small mutability issue so I think this is ready for another round of review.

Copy link
Member

@bluss bluss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, only minor comments

@bluss
Copy link
Member

bluss commented Nov 10, 2021

Matrixmult complex wip is at bluss/matrixmultiply#58 performance for f32 + FMA with auto vectorization is not too bad (maybe it reaches 1/2-1/3 of the potential :). Matrixmultiply as a crate doesn't need to be so important.

@bluss
Copy link
Member

bluss commented Nov 11, 2021

I edited the description to say it's not WIP. I think it's ready. I'd usually say we need to edit the PR description before merge, but it seems like github has gotten to the point where it doesn't even include it in the merge in git, so it's not so important anymore (did github change here?)

(I pushed to fix merge conflicts and do very small edits)

@emmatyping
Copy link
Contributor Author

Matrixmult complex wip is at bluss/matrixmultiply#58 performance for f32 + FMA with auto vectorization is not too bad (maybe it reaches 1/2-1/3 of the potential :). Matrixmultiply as a crate doesn't need to be so important.

Great! I can try to take a look at that a bit later.

I edited the description to say it's not WIP. I think it's ready. I'd usually say we need to edit the PR description before merge, but it seems like github has gotten to the point where it doesn't even include it in the merge in git, so it's not so important anymore (did github change here?)

I think(?) the message is only added if there is a single commit or depending on the kind of merge you do maybe.

@bluss
Copy link
Member

bluss commented Nov 12, 2021

Thanks for finally closing this one :)

I think the use dgemm-for-zgemm idea still can give good benefits compared with matrixmultiply (in its current state). Handling the general case with complex alpha, beta is a bit tricky, I don't know what the best performing solution is. But alpha != 1 or beta != 0 is the uncommon case.

@bluss bluss added this to the 0.15.4 milestone Nov 12, 2021
@bluss bluss merged commit 0172657 into rust-ndarray:master Nov 12, 2021
@bluss bluss added the blas label Nov 12, 2021
@emmatyping emmatyping deleted the complexmatmul branch November 12, 2021 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use BLAS for complex .mat_mul too
4 participants