AArch64: Support unaligned inputs for top-level APIs#992
Conversation
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113103 cycles |
113147 cycles |
1.00 |
ML-DSA-44 sign |
355458 cycles |
355180 cycles |
1.00 |
ML-DSA-44 verify |
117778 cycles |
117748 cycles |
1.00 |
ML-DSA-65 keypair |
196374 cycles |
196464 cycles |
1.00 |
ML-DSA-65 sign |
588499 cycles |
588331 cycles |
1.00 |
ML-DSA-65 verify |
194525 cycles |
194432 cycles |
1.00 |
ML-DSA-87 keypair |
322283 cycles |
322129 cycles |
1.00 |
ML-DSA-87 sign |
751478 cycles |
752572 cycles |
1.00 |
ML-DSA-87 verify |
319567 cycles |
319915 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212369 cycles |
212671 cycles |
1.00 |
ML-DSA-44 sign |
759604 cycles |
759391 cycles |
1.00 |
ML-DSA-44 verify |
228684 cycles |
228993 cycles |
1.00 |
ML-DSA-65 keypair |
379979 cycles |
380383 cycles |
1.00 |
ML-DSA-65 sign |
1252009 cycles |
1251394 cycles |
1.00 |
ML-DSA-65 verify |
371528 cycles |
372186 cycles |
1.00 |
ML-DSA-87 keypair |
604426 cycles |
605169 cycles |
1.00 |
ML-DSA-87 sign |
1593286 cycles |
1591399 cycles |
1.00 |
ML-DSA-87 verify |
618286 cycles |
617312 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
68994 cycles |
69132 cycles |
1.00 |
ML-DSA-44 sign |
187502 cycles |
188226 cycles |
1.00 |
ML-DSA-44 verify |
68839 cycles |
69219 cycles |
0.99 |
ML-DSA-65 keypair |
119505 cycles |
119330 cycles |
1.00 |
ML-DSA-65 sign |
300007 cycles |
300188 cycles |
1.00 |
ML-DSA-65 verify |
115357 cycles |
115292 cycles |
1.00 |
ML-DSA-87 keypair |
203395 cycles |
203812 cycles |
1.00 |
ML-DSA-87 sign |
394047 cycles |
395329 cycles |
1.00 |
ML-DSA-87 verify |
195377 cycles |
195762 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
56712 cycles |
56534 cycles |
1.00 |
ML-DSA-44 sign |
181126 cycles |
181554 cycles |
1.00 |
ML-DSA-44 verify |
60963 cycles |
61421 cycles |
0.99 |
ML-DSA-65 keypair |
98739 cycles |
99035 cycles |
1.00 |
ML-DSA-65 sign |
298327 cycles |
298572 cycles |
1.00 |
ML-DSA-65 verify |
100362 cycles |
100520 cycles |
1.00 |
ML-DSA-87 keypair |
152557 cycles |
152827 cycles |
1.00 |
ML-DSA-87 sign |
354373 cycles |
355251 cycles |
1.00 |
ML-DSA-87 verify |
153134 cycles |
153945 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
68316 cycles |
68196 cycles |
1.00 |
ML-DSA-44 sign |
201904 cycles |
201973 cycles |
1.00 |
ML-DSA-44 verify |
70857 cycles |
70544 cycles |
1.00 |
ML-DSA-65 keypair |
121055 cycles |
121006 cycles |
1.00 |
ML-DSA-65 sign |
330827 cycles |
331382 cycles |
1.00 |
ML-DSA-65 verify |
117616 cycles |
118067 cycles |
1.00 |
ML-DSA-87 keypair |
198261 cycles |
198246 cycles |
1.00 |
ML-DSA-87 sign |
426845 cycles |
426898 cycles |
1.00 |
ML-DSA-87 verify |
194474 cycles |
194357 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
134547 cycles |
134725 cycles |
1.00 |
ML-DSA-44 sign |
524933 cycles |
524539 cycles |
1.00 |
ML-DSA-44 verify |
147503 cycles |
148068 cycles |
1.00 |
ML-DSA-65 keypair |
226615 cycles |
227491 cycles |
1.00 |
ML-DSA-65 sign |
860232 cycles |
864775 cycles |
0.99 |
ML-DSA-65 verify |
234772 cycles |
235581 cycles |
1.00 |
ML-DSA-87 keypair |
372609 cycles |
373600 cycles |
1.00 |
ML-DSA-87 sign |
1082433 cycles |
1084589 cycles |
1.00 |
ML-DSA-87 verify |
384569 cycles |
385427 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
41364 cycles |
40899 cycles |
1.01 |
ML-DSA-44 sign |
132691 cycles |
133167 cycles |
1.00 |
ML-DSA-44 verify |
43485 cycles |
43772 cycles |
0.99 |
ML-DSA-65 keypair |
71976 cycles |
71918 cycles |
1.00 |
ML-DSA-65 sign |
214079 cycles |
214333 cycles |
1.00 |
ML-DSA-65 verify |
72701 cycles |
72735 cycles |
1.00 |
ML-DSA-87 keypair |
107842 cycles |
108000 cycles |
1.00 |
ML-DSA-87 sign |
251560 cycles |
251840 cycles |
1.00 |
ML-DSA-87 verify |
111427 cycles |
110205 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
157459 cycles |
157349 cycles |
1.00 |
ML-DSA-44 sign |
549434 cycles |
550739 cycles |
1.00 |
ML-DSA-44 verify |
169228 cycles |
169371 cycles |
1.00 |
ML-DSA-65 keypair |
267721 cycles |
268104 cycles |
1.00 |
ML-DSA-65 sign |
903267 cycles |
903355 cycles |
1.00 |
ML-DSA-65 verify |
274193 cycles |
274822 cycles |
1.00 |
ML-DSA-87 keypair |
448386 cycles |
448166 cycles |
1.00 |
ML-DSA-87 sign |
1158776 cycles |
1157655 cycles |
1.00 |
ML-DSA-87 verify |
458169 cycles |
458080 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
128521 cycles |
128200 cycles |
1.00 |
ML-DSA-44 sign |
447738 cycles |
447698 cycles |
1.00 |
ML-DSA-44 verify |
138449 cycles |
144650 cycles |
0.96 |
ML-DSA-65 keypair |
221172 cycles |
220629 cycles |
1.00 |
ML-DSA-65 sign |
727356 cycles |
727485 cycles |
1.00 |
ML-DSA-65 verify |
223271 cycles |
223203 cycles |
1.00 |
ML-DSA-87 keypair |
365594 cycles |
365009 cycles |
1.00 |
ML-DSA-87 sign |
927658 cycles |
925883 cycles |
1.00 |
ML-DSA-87 verify |
373613 cycles |
372797 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
120234 cycles |
120079 cycles |
1.00 |
ML-DSA-44 sign |
447266 cycles |
445606 cycles |
1.00 |
ML-DSA-44 verify |
130859 cycles |
130275 cycles |
1.00 |
ML-DSA-65 keypair |
204156 cycles |
204060 cycles |
1.00 |
ML-DSA-65 sign |
729810 cycles |
727326 cycles |
1.00 |
ML-DSA-65 verify |
210434 cycles |
209046 cycles |
1.01 |
ML-DSA-87 keypair |
339526 cycles |
337824 cycles |
1.01 |
ML-DSA-87 sign |
923483 cycles |
922255 cycles |
1.00 |
ML-DSA-87 verify |
346563 cycles |
346248 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
72358 cycles |
72238 cycles |
1.00 |
ML-DSA-44 sign |
212138 cycles |
211968 cycles |
1.00 |
ML-DSA-44 verify |
75655 cycles |
75586 cycles |
1.00 |
ML-DSA-65 keypair |
127554 cycles |
127524 cycles |
1.00 |
ML-DSA-65 sign |
350157 cycles |
350022 cycles |
1.00 |
ML-DSA-65 verify |
125339 cycles |
125427 cycles |
1.00 |
ML-DSA-87 keypair |
205615 cycles |
208263 cycles |
0.99 |
ML-DSA-87 sign |
443492 cycles |
448961 cycles |
0.99 |
ML-DSA-87 verify |
205139 cycles |
205274 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
138682 cycles |
138472 cycles |
1.00 |
ML-DSA-44 sign |
484073 cycles |
483937 cycles |
1.00 |
ML-DSA-44 verify |
148468 cycles |
162299 cycles |
0.91 |
ML-DSA-65 keypair |
241326 cycles |
241394 cycles |
1.00 |
ML-DSA-65 sign |
792952 cycles |
792376 cycles |
1.00 |
ML-DSA-65 verify |
240749 cycles |
241234 cycles |
1.00 |
ML-DSA-87 keypair |
395469 cycles |
396580 cycles |
1.00 |
ML-DSA-87 sign |
1013175 cycles |
1012710 cycles |
1.00 |
ML-DSA-87 verify |
402892 cycles |
402628 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113371 cycles |
113500 cycles |
1.00 |
ML-DSA-44 sign |
355488 cycles |
355773 cycles |
1.00 |
ML-DSA-44 verify |
118001 cycles |
118295 cycles |
1.00 |
ML-DSA-65 keypair |
196603 cycles |
196483 cycles |
1.00 |
ML-DSA-65 sign |
588574 cycles |
588268 cycles |
1.00 |
ML-DSA-65 verify |
194607 cycles |
194737 cycles |
1.00 |
ML-DSA-87 keypair |
322577 cycles |
323067 cycles |
1.00 |
ML-DSA-87 sign |
753420 cycles |
753282 cycles |
1.00 |
ML-DSA-87 verify |
320148 cycles |
320275 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
213057 cycles |
212962 cycles |
1.00 |
ML-DSA-44 sign |
760326 cycles |
761109 cycles |
1.00 |
ML-DSA-44 verify |
241281 cycles |
234656 cycles |
1.03 |
ML-DSA-65 keypair |
380798 cycles |
380376 cycles |
1.00 |
ML-DSA-65 sign |
1252337 cycles |
1253515 cycles |
1.00 |
ML-DSA-65 verify |
372454 cycles |
371858 cycles |
1.00 |
ML-DSA-87 keypair |
606382 cycles |
604334 cycles |
1.00 |
ML-DSA-87 sign |
1593353 cycles |
1594506 cycles |
1.00 |
ML-DSA-87 verify |
618144 cycles |
618425 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
34518 cycles |
35118 cycles |
0.98 |
ML-DSA-44 sign |
119680 cycles |
120041 cycles |
1.00 |
ML-DSA-44 verify |
38022 cycles |
38197 cycles |
1.00 |
ML-DSA-65 keypair |
61187 cycles |
60705 cycles |
1.01 |
ML-DSA-65 sign |
201763 cycles |
200568 cycles |
1.01 |
ML-DSA-65 verify |
62728 cycles |
62638 cycles |
1.00 |
ML-DSA-87 keypair |
94268 cycles |
93448 cycles |
1.01 |
ML-DSA-87 sign |
238850 cycles |
237996 cycles |
1.00 |
ML-DSA-87 verify |
96329 cycles |
95574 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 730be94 | Previous: 339e496 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
93618 cycles |
93421 cycles |
1.00 |
ML-DSA-44 sign |
332433 cycles |
332489 cycles |
1.00 |
ML-DSA-44 verify |
99590 cycles |
99620 cycles |
1.00 |
ML-DSA-65 keypair |
159749 cycles |
159711 cycles |
1.00 |
ML-DSA-65 sign |
544197 cycles |
544428 cycles |
1.00 |
ML-DSA-65 verify |
160722 cycles |
160796 cycles |
1.00 |
ML-DSA-87 keypair |
267030 cycles |
266842 cycles |
1.00 |
ML-DSA-87 sign |
706066 cycles |
706051 cycles |
1.00 |
ML-DSA-87 verify |
270154 cycles |
269870 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
polyz_unpack_{17,19}_asm are the only AArch64 assembly routines that
load from potentially unaligned user-provided buffers (the signature
passed to verify). All other assembly (NTT, rej_uniform, etc.) operates
on aligned internal buffers.
The ld1 with .4s/.2d element sizes and the ldr s/d instructions used
here require 4/8-byte alignment on Device memory (bare-metal AArch64
without MMU). Replace with .16b element sizes and ld1 {v.8b}, which
do not require alignment.
With this fixed, remove the MLD_TEST_NO_UNALIGNED workaround from the
aarch64-virt baremetal platform so the unaligned-buffer functional test
runs on baremetal as well.
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
730be94 to
18abe46
Compare
CBMC Results (ML-DSA-65)Full Results (175 proofs)
|
CBMC Results (ML-DSA-44)Full Results (175 proofs)
|
CBMC Results (ML-DSA-87)Full Results (175 proofs)
|
hanno-becker
left a comment
There was a problem hiding this comment.
I agree with this change. I checked that the API documentation did not mention the AArch64 alignment requirements, so we don't need any change there.
polyz_unpack_{17,19}_asm are the only AArch64 assembly routines that load from potentially unaligned user-provided buffers (the signature passed to verify). All other assembly (NTT, rej_uniform, etc.) operates on aligned internal buffers.
The ld1 with .4s/.2d element sizes and the ldr s/d instructions used here require 4/8-byte alignment on Device memory (bare-metal AArch64 without MMU). Replace with .16b element sizes and ld1 {v.8b}, which do not require alignment.
With this fixed, remove the MLD_TEST_NO_UNALIGNED workaround from the aarch64-virt baremetal platform so the unaligned-buffer functional test runs on baremetal as well.