Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 4FMA and 4VNNI instructions #40

Closed
wants to merge 2 commits into from
Closed

Conversation

rsdubtso
Copy link
Contributor

The two commits below

  1. Add new instructions as described in https://software.intel.com/sites/default/files/managed/26/40/319433-026.pdf

  2. Fix a typo leaf 7 CPUID detection

@herumi
Copy link
Owner

herumi commented Nov 26, 2016

Thank you for fixing the bug about cpuid and a patch for AVX512_4FMAPS.
But I have a question. I do not know how to deal a new Op/En T1_4X for compressed displacement(disp8*N) such as v4fmaddps zmm1, zmm2, [rax + 64] and v4fmaddss xmm1, xmm2, [rax + 64]. Does 45906ae generate correct encoding for them?

@rsdubtso
Copy link
Contributor Author

I've checked the encoding using an internal version of XED assembler/disassembler (part of Intel Software Development Emulator); unfortunately, the latest externally available version does not yet support the 4FMAPS and 4VNNI.

XED recognizes the compressed displacement encoding in the examples above as expected.

As a precaution, I can suggest keeping this PR open until Intel releases a public SDE version which supports the new instructions. I can post here once this happens and then you would be able to verify that the encoding is correct indeed.

I also want to thank you for an awesome tool. We are using it for MKL-DNN.

@herumi
Copy link
Owner

herumi commented Nov 27, 2016

I see https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=47acf0bd9faef8634d242e19ec3b7f784d10ba76 and verified that xbyak generates expected encoding, then I merged it in master. Thank you for your feedback.

@herumi
Copy link
Owner

herumi commented Nov 27, 2016

I renamed AVX512_4VNNI to AVX512_4VNNIW according Intel Architecture Instruction Set Extensions Programming Reference.

@rsdubtso
Copy link
Contributor Author

Thanks. Yes, the 4VNNIW is the correct name.

@herumi herumi closed this Nov 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants