Skip to content

AArch64: Support unaligned inputs for top-level APIs#992

Merged
hanno-becker merged 1 commit into
mainfrom
unaligned-inputs
Mar 9, 2026
Merged

AArch64: Support unaligned inputs for top-level APIs#992
hanno-becker merged 1 commit into
mainfrom
unaligned-inputs

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

polyz_unpack_{17,19}_asm are the only AArch64 assembly routines that load from potentially unaligned user-provided buffers (the signature passed to verify). All other assembly (NTT, rej_uniform, etc.) operates on aligned internal buffers.

The ld1 with .4s/.2d element sizes and the ldr s/d instructions used here require 4/8-byte alignment on Device memory (bare-metal AArch64 without MMU). Replace with .16b element sizes and ld1 {v.8b}, which do not require alignment.

With this fixed, remove the MLD_TEST_NO_UNALIGNED workaround from the aarch64-virt baremetal platform so the unaligned-buffer functional test runs on baremetal as well.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 113103 cycles 113147 cycles 1.00
ML-DSA-44 sign 355458 cycles 355180 cycles 1.00
ML-DSA-44 verify 117778 cycles 117748 cycles 1.00
ML-DSA-65 keypair 196374 cycles 196464 cycles 1.00
ML-DSA-65 sign 588499 cycles 588331 cycles 1.00
ML-DSA-65 verify 194525 cycles 194432 cycles 1.00
ML-DSA-87 keypair 322283 cycles 322129 cycles 1.00
ML-DSA-87 sign 751478 cycles 752572 cycles 1.00
ML-DSA-87 verify 319567 cycles 319915 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 212369 cycles 212671 cycles 1.00
ML-DSA-44 sign 759604 cycles 759391 cycles 1.00
ML-DSA-44 verify 228684 cycles 228993 cycles 1.00
ML-DSA-65 keypair 379979 cycles 380383 cycles 1.00
ML-DSA-65 sign 1252009 cycles 1251394 cycles 1.00
ML-DSA-65 verify 371528 cycles 372186 cycles 1.00
ML-DSA-87 keypair 604426 cycles 605169 cycles 1.00
ML-DSA-87 sign 1593286 cycles 1591399 cycles 1.00
ML-DSA-87 verify 618286 cycles 617312 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 68994 cycles 69132 cycles 1.00
ML-DSA-44 sign 187502 cycles 188226 cycles 1.00
ML-DSA-44 verify 68839 cycles 69219 cycles 0.99
ML-DSA-65 keypair 119505 cycles 119330 cycles 1.00
ML-DSA-65 sign 300007 cycles 300188 cycles 1.00
ML-DSA-65 verify 115357 cycles 115292 cycles 1.00
ML-DSA-87 keypair 203395 cycles 203812 cycles 1.00
ML-DSA-87 sign 394047 cycles 395329 cycles 1.00
ML-DSA-87 verify 195377 cycles 195762 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 56712 cycles 56534 cycles 1.00
ML-DSA-44 sign 181126 cycles 181554 cycles 1.00
ML-DSA-44 verify 60963 cycles 61421 cycles 0.99
ML-DSA-65 keypair 98739 cycles 99035 cycles 1.00
ML-DSA-65 sign 298327 cycles 298572 cycles 1.00
ML-DSA-65 verify 100362 cycles 100520 cycles 1.00
ML-DSA-87 keypair 152557 cycles 152827 cycles 1.00
ML-DSA-87 sign 354373 cycles 355251 cycles 1.00
ML-DSA-87 verify 153134 cycles 153945 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 68316 cycles 68196 cycles 1.00
ML-DSA-44 sign 201904 cycles 201973 cycles 1.00
ML-DSA-44 verify 70857 cycles 70544 cycles 1.00
ML-DSA-65 keypair 121055 cycles 121006 cycles 1.00
ML-DSA-65 sign 330827 cycles 331382 cycles 1.00
ML-DSA-65 verify 117616 cycles 118067 cycles 1.00
ML-DSA-87 keypair 198261 cycles 198246 cycles 1.00
ML-DSA-87 sign 426845 cycles 426898 cycles 1.00
ML-DSA-87 verify 194474 cycles 194357 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 134547 cycles 134725 cycles 1.00
ML-DSA-44 sign 524933 cycles 524539 cycles 1.00
ML-DSA-44 verify 147503 cycles 148068 cycles 1.00
ML-DSA-65 keypair 226615 cycles 227491 cycles 1.00
ML-DSA-65 sign 860232 cycles 864775 cycles 0.99
ML-DSA-65 verify 234772 cycles 235581 cycles 1.00
ML-DSA-87 keypair 372609 cycles 373600 cycles 1.00
ML-DSA-87 sign 1082433 cycles 1084589 cycles 1.00
ML-DSA-87 verify 384569 cycles 385427 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 41364 cycles 40899 cycles 1.01
ML-DSA-44 sign 132691 cycles 133167 cycles 1.00
ML-DSA-44 verify 43485 cycles 43772 cycles 0.99
ML-DSA-65 keypair 71976 cycles 71918 cycles 1.00
ML-DSA-65 sign 214079 cycles 214333 cycles 1.00
ML-DSA-65 verify 72701 cycles 72735 cycles 1.00
ML-DSA-87 keypair 107842 cycles 108000 cycles 1.00
ML-DSA-87 sign 251560 cycles 251840 cycles 1.00
ML-DSA-87 verify 111427 cycles 110205 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 157459 cycles 157349 cycles 1.00
ML-DSA-44 sign 549434 cycles 550739 cycles 1.00
ML-DSA-44 verify 169228 cycles 169371 cycles 1.00
ML-DSA-65 keypair 267721 cycles 268104 cycles 1.00
ML-DSA-65 sign 903267 cycles 903355 cycles 1.00
ML-DSA-65 verify 274193 cycles 274822 cycles 1.00
ML-DSA-87 keypair 448386 cycles 448166 cycles 1.00
ML-DSA-87 sign 1158776 cycles 1157655 cycles 1.00
ML-DSA-87 verify 458169 cycles 458080 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 128521 cycles 128200 cycles 1.00
ML-DSA-44 sign 447738 cycles 447698 cycles 1.00
ML-DSA-44 verify 138449 cycles 144650 cycles 0.96
ML-DSA-65 keypair 221172 cycles 220629 cycles 1.00
ML-DSA-65 sign 727356 cycles 727485 cycles 1.00
ML-DSA-65 verify 223271 cycles 223203 cycles 1.00
ML-DSA-87 keypair 365594 cycles 365009 cycles 1.00
ML-DSA-87 sign 927658 cycles 925883 cycles 1.00
ML-DSA-87 verify 373613 cycles 372797 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 120234 cycles 120079 cycles 1.00
ML-DSA-44 sign 447266 cycles 445606 cycles 1.00
ML-DSA-44 verify 130859 cycles 130275 cycles 1.00
ML-DSA-65 keypair 204156 cycles 204060 cycles 1.00
ML-DSA-65 sign 729810 cycles 727326 cycles 1.00
ML-DSA-65 verify 210434 cycles 209046 cycles 1.01
ML-DSA-87 keypair 339526 cycles 337824 cycles 1.01
ML-DSA-87 sign 923483 cycles 922255 cycles 1.00
ML-DSA-87 verify 346563 cycles 346248 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 72358 cycles 72238 cycles 1.00
ML-DSA-44 sign 212138 cycles 211968 cycles 1.00
ML-DSA-44 verify 75655 cycles 75586 cycles 1.00
ML-DSA-65 keypair 127554 cycles 127524 cycles 1.00
ML-DSA-65 sign 350157 cycles 350022 cycles 1.00
ML-DSA-65 verify 125339 cycles 125427 cycles 1.00
ML-DSA-87 keypair 205615 cycles 208263 cycles 0.99
ML-DSA-87 sign 443492 cycles 448961 cycles 0.99
ML-DSA-87 verify 205139 cycles 205274 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 138682 cycles 138472 cycles 1.00
ML-DSA-44 sign 484073 cycles 483937 cycles 1.00
ML-DSA-44 verify 148468 cycles 162299 cycles 0.91
ML-DSA-65 keypair 241326 cycles 241394 cycles 1.00
ML-DSA-65 sign 792952 cycles 792376 cycles 1.00
ML-DSA-65 verify 240749 cycles 241234 cycles 1.00
ML-DSA-87 keypair 395469 cycles 396580 cycles 1.00
ML-DSA-87 sign 1013175 cycles 1012710 cycles 1.00
ML-DSA-87 verify 402892 cycles 402628 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 113371 cycles 113500 cycles 1.00
ML-DSA-44 sign 355488 cycles 355773 cycles 1.00
ML-DSA-44 verify 118001 cycles 118295 cycles 1.00
ML-DSA-65 keypair 196603 cycles 196483 cycles 1.00
ML-DSA-65 sign 588574 cycles 588268 cycles 1.00
ML-DSA-65 verify 194607 cycles 194737 cycles 1.00
ML-DSA-87 keypair 322577 cycles 323067 cycles 1.00
ML-DSA-87 sign 753420 cycles 753282 cycles 1.00
ML-DSA-87 verify 320148 cycles 320275 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 213057 cycles 212962 cycles 1.00
ML-DSA-44 sign 760326 cycles 761109 cycles 1.00
ML-DSA-44 verify 241281 cycles 234656 cycles 1.03
ML-DSA-65 keypair 380798 cycles 380376 cycles 1.00
ML-DSA-65 sign 1252337 cycles 1253515 cycles 1.00
ML-DSA-65 verify 372454 cycles 371858 cycles 1.00
ML-DSA-87 keypair 606382 cycles 604334 cycles 1.00
ML-DSA-87 sign 1593353 cycles 1594506 cycles 1.00
ML-DSA-87 verify 618144 cycles 618425 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 34518 cycles 35118 cycles 0.98
ML-DSA-44 sign 119680 cycles 120041 cycles 1.00
ML-DSA-44 verify 38022 cycles 38197 cycles 1.00
ML-DSA-65 keypair 61187 cycles 60705 cycles 1.01
ML-DSA-65 sign 201763 cycles 200568 cycles 1.01
ML-DSA-65 verify 62728 cycles 62638 cycles 1.00
ML-DSA-87 keypair 94268 cycles 93448 cycles 1.01
ML-DSA-87 sign 238850 cycles 237996 cycles 1.00
ML-DSA-87 verify 96329 cycles 95574 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 730be94 Previous: 339e496 Ratio
ML-DSA-44 keypair 93618 cycles 93421 cycles 1.00
ML-DSA-44 sign 332433 cycles 332489 cycles 1.00
ML-DSA-44 verify 99590 cycles 99620 cycles 1.00
ML-DSA-65 keypair 159749 cycles 159711 cycles 1.00
ML-DSA-65 sign 544197 cycles 544428 cycles 1.00
ML-DSA-65 verify 160722 cycles 160796 cycles 1.00
ML-DSA-87 keypair 267030 cycles 266842 cycles 1.00
ML-DSA-87 sign 706066 cycles 706051 cycles 1.00
ML-DSA-87 verify 270154 cycles 269870 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

polyz_unpack_{17,19}_asm are the only AArch64 assembly routines that
load from potentially unaligned user-provided buffers (the signature
passed to verify). All other assembly (NTT, rej_uniform, etc.) operates
on aligned internal buffers.

The ld1 with .4s/.2d element sizes and the ldr s/d instructions used
here require 4/8-byte alignment on Device memory (bare-metal AArch64
without MMU). Replace with .16b element sizes and ld1 {v.8b}, which
do not require alignment.

With this fixed, remove the MLD_TEST_NO_UNALIGNED workaround from the
aarch64-virt baremetal platform so the unaligned-buffer functional test
runs on baremetal as well.

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Mar 9, 2026

CBMC Results (ML-DSA-65)

Full Results (175 proofs)
Proof Status Current Previous Change
**TOTAL** 2328s 2385s -2.4%
polyvecl_pointwise_acc_montgomery_c 236s 258s -9%
mld_attempt_signature_generation 201s 202s -0%
sign_verify_internal 181s 181s +0%
polyvec_matrix_expand 149s 150s -1%
rej_uniform_native 147s 145s +1%
poly_pointwise_montgomery_c 142s 153s -7%
mld_invntt_layer 118s 124s -5%
mld_ct_memcmp 79s 86s -8%
polyvec_matrix_expand_serial 66s 70s -6%
sign_signature_internal 51s 52s -2%
keccak_squeezeblocks_x4 45s 42s +7%
mld_ntt_layer 43s 45s -4%
mld_compute_t0_t1_tr_from_sk_components 26s 26s +0%
fqmul 21s 20s +5%
polyveck_decompose 20s 18s +11%
rej_uniform 20s 21s -5%
rej_uniform_c 20s 20s +0%
polymat_permute_bitrev_to_custom 19s 18s +6%
poly_uniform_eta_4x 18s 17s +6%
poly_chknorm_c 16s 18s -11%
poly_uniform_4x 16s 17s -6%
polyt0_unpack 15s 19s -21%
polyvec_matrix_pointwise_montgomery 14s 13s +8%
polyveck_use_hint 14s 12s +17%
keccakf1600x4_permute_native 13s 15s -13%
mld_polyvecl_permute_bitrev_to_custom_native 12s 14s -14%
polyveck_invntt_tomont 12s 8s +50%
mld_check_pct 11s 13s -15%
mld_ntt_butterfly_block 11s 14s -21%
polyveck_add 10s 10s +0%
keccak_absorb_once_x4 9s 9s +0%
poly_caddq_c 9s 8s +12%
poly_invntt_tomont_c 9s 15s -40%
sign 9s 12s -25%
keccakf1600_permute 8s 7s +14%
poly_use_hint_c 8s 5s +60%
polyveck_ntt 8s 8s +0%
polyveck_reduce 8s 9s -11%
mld_sample_s1_s2_serial 7s 5s +40%
poly_decompose_c 7s 9s -22%
polyeta_unpack 7s 7s +0%
polyveck_caddq 7s 10s -30%
polyveck_make_hint 7s 7s +0%
polyveck_power2round 7s 12s -42%
polyveck_sub 7s 7s +0%
sign_pk_from_sk 7s 8s -12%
keccakf1600_permute_native 6s 8s -25%
mld_h 6s 5s +20%
mld_sample_s1_s2 6s 4s +50%
polyveck_shiftl 6s 9s -33%
polyvecl_ntt 6s 8s -25%
polyvecl_uniform_gamma1_serial 6s 5s +20%
rej_eta_c 6s 3s +100%
sign_signature 6s 6s +0%
sign_verify_pre_hash_internal 6s 6s +0%
sign_verify_pre_hash_shake256 6s 4s +50%
caddq 5s 2s +150%
keccak_absorb 5s 6s -17%
mld_compute_pack_z 5s 6s -17%
mld_ct_get_optblocker_i64 5s 3s +67%
montgomery_reduce 5s 1s +400%
pack_pk 5s 2s +150%
poly_caddq_native 5s 2s +150%
poly_decompose_native 5s 3s +67%
poly_power2round 5s 3s +67%
poly_shiftl 5s 3s +67%
poly_uniform 5s 4s +25%
polyveck_pack_t0 5s 2s +150%
polyveck_pointwise_poly_montgomery 5s 6s -17%
polyz_unpack_c 5s 6s -17%
shake256_squeeze 5s 2s +150%
sign_open 5s 5s +0%
sign_signature_extmu 5s 3s +67%
sign_signature_pre_hash_internal 5s 5s +0%
unpack_hints 5s 5s +0%
unpack_pk 5s 4s +25%
unpack_sig 5s 3s +67%
unpack_sk 5s 5s +0%
make_hint 4s 3s +33%
poly_add 4s 4s +0%
poly_caddq 4s 4s +0%
poly_caddq_native_aarch64 4s 2s +100%
poly_challenge 4s 5s -20%
poly_chknorm 4s 4s +0%
poly_ntt 4s 3s +33%
poly_ntt_c 4s 3s +33%
poly_ntt_native 4s 4s +0%
poly_pointwise_montgomery 4s 3s +33%
poly_reduce 4s 2s +100%
poly_sub 4s 3s +33%
poly_uniform_eta 4s 8s -50%
polyeta_pack 4s 3s +33%
polyt0_pack 4s 6s -33%
polyt1_pack 4s 1s +300%
polyveck_chknorm 4s 5s -20%
polyveck_unpack_eta 4s 3s +33%
polyvecl_chknorm 4s 5s -20%
polyvecl_pack_eta 4s 5s -20%
polyvecl_uniform_gamma1 4s 2s +100%
polyvecl_unpack_eta 4s 4s +0%
polyw1_pack 4s 2s +100%
polyz_unpack 4s 3s +33%
power2round 4s 5s -20%
rej_eta_native 4s 3s +33%
shake128_absorb 4s 3s +33%
shake128_squeeze 4s 2s +100%
sign_keypair 4s 3s +33%
sign_keypair_internal 4s 3s +33%
sign_signature_pre_hash_shake256 4s 4s +0%
decompose 3s 2s +50%
keccak_finalize 3s 2s +50%
keccak_init 3s 3s +0%
keccakf1600_xor_bytes (big endian) 3s 3s +0%
keccakf1600x4_permute 3s 2s +50%
keccakf1600x4_xor_bytes 3s 2s +50%
mld_ct_cmask_nonzero_u32 3s 5s -40%
mld_ct_cmask_nonzero_u8 3s 3s +0%
mld_ct_get_optblocker_u8 3s 1s +200%
mld_prepare_domain_separation_prefix 3s 2s +50%
mld_value_barrier_u8 3s 1s +200%
ntt_native_x86_64 3s 5s -40%
pack_sk 3s 2s +50%
poly_chknorm_native 3s 3s +0%
poly_invntt_tomont 3s 4s -25%
poly_uniform_gamma1 3s 3s +0%
poly_uniform_gamma1_4x 3s 7s -57%
poly_use_hint 3s 3s +0%
poly_use_hint_native 3s 3s +0%
polyt1_unpack 3s 2s +50%
polyveck_pack_eta 3s 3s +0%
polyveck_pack_w1 3s 4s -25%
polyveck_unpack_t0 3s 6s -50%
polyvecl_permute_bitrev_to_custom 3s 3s +0%
polyvecl_pointwise_acc_montgomery 3s 4s -25%
polyvecl_pointwise_acc_montgomery_native 3s 5s -40%
polyvecl_unpack_z 3s 3s +0%
polyz_pack 3s 4s -25%
polyz_unpack_native 3s 3s +0%
shake128_init 3s 2s +50%
shake128_release 3s 4s -25%
shake128x4_squeezeblocks 3s 2s +50%
shake256 3s 2s +50%
shake256_init 3s 3s +0%
shake256_release 3s 3s +0%
sign_verify 3s 4s -25%
sys_check_capability 3s 5s -40%
fqscale 2s 3s -33%
intt_native_x86_64 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 5s -60%
keccakf1600_xor_bytes 2s 2s +0%
keccakf1600x4_extract_bytes 2s 2s +0%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_get_optblocker_u32 2s 4s -50%
mld_ct_sel_int32 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_value_barrier_i64 2s 3s -33%
mld_value_barrier_u32 2s 2s +0%
pack_sig_c_h 2s 2s +0%
pack_sig_z 2s 2s +0%
poly_decompose 2s 5s -60%
poly_invntt_tomont_native 2s 3s -33%
poly_make_hint 2s 4s -50%
poly_pointwise_montgomery_native 2s 2s +0%
rej_eta 2s 2s +0%
shake128x4_absorb_once 2s 2s +0%
shake256_absorb 2s 1s +100%
shake256_finalize 2s 3s -33%
shake256x4_absorb_once 2s 1s +100%
sign_verify_extmu 2s 5s -60%
use_hint 2s 3s -33%
keccak_squeeze 1s 2s -50%
mld_ct_abs_i32 1s 2s -50%
reduce32 1s 2s -50%
shake128_finalize 1s 2s -50%
shake256x4_squeezeblocks 1s 4s -75%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Mar 9, 2026

CBMC Results (ML-DSA-44)

Full Results (175 proofs)
Proof Status Current Previous Change
**TOTAL** 2191s 2160s +1.4%
sign_verify_internal 277s 268s +3%
mld_attempt_signature_generation 239s 238s +0%
polyvecl_pointwise_acc_montgomery_c 227s 221s +3%
rej_uniform_native 152s 149s +2%
poly_pointwise_montgomery_c 147s 152s -3%
mld_ct_memcmp 88s 86s +2%
mld_invntt_layer 56s 55s +2%
mld_ntt_layer 47s 48s -2%
sign_signature_internal 47s 49s -4%
poly_invntt_tomont_c 46s 42s +10%
keccak_squeezeblocks_x4 42s 43s -2%
rej_uniform 21s 23s -9%
poly_uniform_eta_4x 20s 16s +25%
fqmul 19s 19s +0%
rej_uniform_c 18s 20s -10%
poly_uniform_4x 17s 17s +0%
polyt0_unpack 17s 16s +6%
polyvec_matrix_expand 17s 15s +13%
polymat_permute_bitrev_to_custom 16s 16s +0%
keccakf1600x4_permute_native 14s 13s +8%
mld_compute_t0_t1_tr_from_sk_components 14s 15s -7%
mld_polyvecl_permute_bitrev_to_custom_native 14s 16s -12%
poly_chknorm_c 14s 15s -7%
mld_ntt_butterfly_block 13s 15s -13%
polyz_unpack_c 13s 12s +8%
polyeta_unpack 12s 14s -14%
keccak_absorb_once_x4 10s 12s -17%
keccakf1600_permute 9s 7s +29%
sign 9s 7s +29%
keccakf1600_permute_native 8s 8s +0%
mld_check_pct 8s 6s +33%
polyveck_add 8s 7s +14%
polyveck_pointwise_poly_montgomery 8s 6s +33%
keccak_absorb 7s 6s +17%
poly_caddq_c 7s 9s -22%
polyvec_matrix_expand_serial 7s 8s -12%
polyveck_caddq 7s 4s +75%
sign_pk_from_sk 7s 6s +17%
mld_compute_pack_z 6s 7s -14%
mld_h 6s 5s +20%
polyvec_matrix_pointwise_montgomery 6s 5s +20%
polyveck_decompose 6s 6s +0%
polyveck_power2round 6s 4s +50%
polyveck_sub 6s 7s -14%
polyvecl_chknorm 6s 5s +20%
rej_eta_c 6s 4s +50%
sign_keypair 6s 2s +200%
sign_verify_pre_hash_shake256 6s 5s +20%
mld_prepare_domain_separation_prefix 5s 2s +150%
mld_sample_s1_s2 5s 5s +0%
pack_sig_z 5s 2s +150%
poly_caddq 5s 2s +150%
poly_caddq_native 5s 5s +0%
poly_challenge 5s 5s +0%
poly_decompose_c 5s 7s -29%
poly_invntt_tomont 5s 3s +67%
poly_ntt_native 5s 5s +0%
poly_uniform 5s 5s +0%
polyt0_pack 5s 4s +25%
polyveck_make_hint 5s 3s +67%
polyveck_ntt 5s 6s -17%
polyveck_pack_eta 5s 2s +150%
polyveck_reduce 5s 5s +0%
polyveck_use_hint 5s 7s -29%
polyvecl_pack_eta 5s 4s +25%
rej_eta_native 5s 3s +67%
shake128_squeeze 5s 3s +67%
sign_verify 5s 3s +67%
unpack_hints 5s 4s +25%
caddq 4s 2s +100%
keccak_finalize 4s 2s +100%
keccakf1600_xor_bytes (big endian) 4s 3s +33%
mld_ct_cmask_neg_i32 4s 2s +100%
mld_ct_cmask_nonzero_u32 4s 4s +0%
mld_ct_cmask_nonzero_u8 4s 4s +0%
mld_ct_get_optblocker_i64 4s 1s +300%
mld_ct_get_optblocker_u32 4s 3s +33%
mld_sample_s1_s2_serial 4s 5s -20%
mld_value_barrier_u8 4s 2s +100%
poly_caddq_native_aarch64 4s 3s +33%
poly_chknorm 4s 4s +0%
poly_decompose_native 4s 2s +100%
poly_make_hint 4s 5s -20%
poly_pointwise_montgomery_native 4s 3s +33%
poly_sub 4s 3s +33%
polyeta_pack 4s 4s +0%
polyt1_pack 4s 6s -33%
polyveck_chknorm 4s 5s -20%
polyveck_invntt_tomont 4s 4s +0%
polyveck_unpack_t0 4s 2s +100%
polyvecl_ntt 4s 5s -20%
polyvecl_unpack_eta 4s 4s +0%
polyz_unpack_native 4s 3s +33%
shake256 4s 2s +100%
shake256_init 4s 4s +0%
shake256_squeeze 4s 2s +100%
shake256x4_squeezeblocks 4s 3s +33%
sign_open 4s 4s +0%
sign_signature 4s 4s +0%
sign_signature_extmu 4s 5s -20%
sign_verify_extmu 4s 5s -20%
unpack_pk 4s 3s +33%
unpack_sig 4s 3s +33%
unpack_sk 4s 4s +0%
use_hint 4s 2s +100%
decompose 3s 2s +50%
intt_native_x86_64 3s 5s -40%
keccak_init 3s 3s +0%
keccakf1600x4_extract_bytes 3s 3s +0%
keccakf1600x4_xor_bytes 3s 2s +50%
make_hint 3s 1s +200%
mld_ct_get_optblocker_u8 3s 2s +50%
mld_ct_sel_int32 3s 4s -25%
mld_keccakf1600_extract_bytes 3s 1s +200%
mld_value_barrier_i64 3s 2s +50%
mld_value_barrier_u32 3s 4s -25%
montgomery_reduce 3s 4s -25%
ntt_native_x86_64 3s 4s -25%
pack_pk 3s 3s +0%
pack_sig_c_h 3s 3s +0%
poly_decompose 3s 6s -50%
poly_power2round 3s 1s +200%
poly_reduce 3s 4s -25%
poly_shiftl 3s 3s +0%
poly_uniform_eta 3s 5s -40%
poly_uniform_gamma1 3s 4s -25%
poly_uniform_gamma1_4x 3s 4s -25%
poly_use_hint 3s 4s -25%
poly_use_hint_c 3s 5s -40%
polyt1_unpack 3s 3s +0%
polyveck_pack_t0 3s 4s -25%
polyveck_shiftl 3s 5s -40%
polyvecl_permute_bitrev_to_custom 3s 4s -25%
polyvecl_pointwise_acc_montgomery 3s 3s +0%
polyvecl_pointwise_acc_montgomery_native 3s 7s -57%
polyvecl_uniform_gamma1 3s 2s +50%
polyvecl_uniform_gamma1_serial 3s 3s +0%
polyz_pack 3s 2s +50%
rej_eta 3s 3s +0%
shake128_finalize 3s 2s +50%
shake128_init 3s 2s +50%
shake128_release 3s 3s +0%
shake256_absorb 3s 2s +50%
sign_keypair_internal 3s 6s -50%
sign_signature_pre_hash_internal 3s 5s -40%
sign_signature_pre_hash_shake256 3s 4s -25%
sys_check_capability 3s 2s +50%
fqscale 2s 2s +0%
keccak_squeeze 2s 4s -50%
keccakf1600_xor_bytes 2s 3s -33%
keccakf1600x4_permute 2s 2s +0%
mld_ct_abs_i32 2s 3s -33%
pack_sk 2s 2s +0%
poly_add 2s 3s -33%
poly_chknorm_native 2s 3s -33%
poly_invntt_tomont_native 2s 4s -50%
poly_ntt 2s 4s -50%
poly_ntt_c 2s 3s -33%
poly_pointwise_montgomery 2s 2s +0%
poly_use_hint_native 2s 3s -33%
polyveck_pack_w1 2s 6s -67%
polyveck_unpack_eta 2s 3s -33%
polyvecl_unpack_z 2s 4s -50%
polyz_unpack 2s 1s +100%
power2round 2s 2s +0%
reduce32 2s 2s +0%
shake128_absorb 2s 1s +100%
shake128x4_absorb_once 2s 3s -33%
shake128x4_squeezeblocks 2s 2s +0%
shake256_finalize 2s 3s -33%
shake256_release 2s 5s -60%
shake256x4_absorb_once 2s 3s -33%
sign_verify_pre_hash_internal 2s 3s -33%
keccakf1600_extract_bytes (big endian) 1s 1s +0%
polyw1_pack 1s 6s -83%

@mkannwischer mkannwischer marked this pull request as ready for review March 9, 2026 09:10
@mkannwischer mkannwischer requested a review from a team as a code owner March 9, 2026 09:10
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Mar 9, 2026

CBMC Results (ML-DSA-87)

Full Results (175 proofs)
Proof Status Current Previous Change
**TOTAL** 2434s 2459s -1.0%
sign_verify_internal 356s 358s -1%
mld_attempt_signature_generation 234s 230s +2%
polyvecl_pointwise_acc_montgomery_c 164s 174s -6%
polyvec_matrix_expand 155s 154s +1%
poly_pointwise_montgomery_c 139s 127s +9%
rej_uniform_native 137s 143s -4%
mld_invntt_layer 119s 117s +2%
polyvec_matrix_expand_serial 105s 108s -3%
mld_ct_memcmp 80s 78s +3%
sign_signature_internal 45s 46s -2%
keccak_squeezeblocks_x4 44s 42s +5%
mld_ntt_layer 43s 45s -4%
mld_compute_t0_t1_tr_from_sk_components 24s 27s -11%
polymat_permute_bitrev_to_custom 24s 24s +0%
rej_uniform 20s 22s -9%
fqmul 18s 23s -22%
poly_chknorm_c 18s 18s +0%
poly_uniform_eta_4x 17s 15s +13%
poly_uniform_4x 16s 15s +7%
rej_uniform_c 16s 15s +7%
polyt0_unpack 14s 15s -7%
polyvec_matrix_pointwise_montgomery 14s 12s +17%
polyeta_unpack 13s 14s -7%
keccakf1600x4_permute_native 12s 15s -20%
polyveck_power2round 12s 12s +0%
mld_ntt_butterfly_block 11s 12s -8%
poly_invntt_tomont_c 11s 11s +0%
keccak_absorb_once_x4 10s 9s +11%
keccakf1600_permute_native 10s 8s +25%
polyveck_reduce 10s 9s +11%
poly_decompose_c 9s 10s -10%
keccakf1600_permute 8s 7s +14%
mld_polyvecl_permute_bitrev_to_custom_native 8s 8s +0%
mld_sample_s1_s2_serial 8s 9s -11%
polyveck_decompose 8s 8s +0%
polyveck_use_hint 8s 9s -11%
sign 8s 8s +0%
mld_check_pct 7s 6s +17%
mld_sample_s1_s2 7s 5s +40%
poly_caddq_c 7s 6s +17%
polyveck_add 7s 5s +40%
polyveck_caddq 7s 6s +17%
polyveck_chknorm 7s 7s +0%
polyveck_invntt_tomont 7s 6s +17%
polyveck_make_hint 7s 4s +75%
polyveck_pointwise_poly_montgomery 7s 7s +0%
polyveck_shiftl 7s 7s +0%
polyveck_sub 7s 4s +75%
polyvecl_ntt 7s 8s -12%
sign_keypair_internal 7s 3s +133%
sign_pk_from_sk 7s 10s -30%
keccak_absorb 6s 6s +0%
keccak_squeeze 6s 4s +50%
mld_compute_pack_z 6s 5s +20%
sign_verify_pre_hash_internal 6s 2s +200%
poly_challenge 5s 3s +67%
poly_invntt_tomont_native 5s 3s +67%
poly_uniform 5s 5s +0%
poly_uniform_gamma1_4x 5s 6s -17%
poly_use_hint_c 5s 3s +67%
polyt0_pack 5s 4s +25%
polyveck_ntt 5s 6s -17%
polyvecl_chknorm 5s 5s +0%
polyvecl_uniform_gamma1 5s 4s +25%
sign_verify 5s 5s +0%
sign_verify_extmu 5s 4s +25%
unpack_hints 5s 4s +25%
caddq 4s 4s +0%
fqscale 4s 3s +33%
keccakf1600_xor_bytes 4s 1s +300%
keccakf1600_xor_bytes (big endian) 4s 3s +33%
mld_ct_cmask_nonzero_u8 4s 1s +300%
ntt_native_x86_64 4s 6s -33%
pack_pk 4s 4s +0%
pack_sig_c_h 4s 5s -20%
pack_sig_z 4s 2s +100%
pack_sk 4s 3s +33%
poly_caddq 4s 5s -20%
poly_make_hint 4s 4s +0%
poly_power2round 4s 4s +0%
poly_reduce 4s 4s +0%
poly_uniform_eta 4s 5s -20%
poly_uniform_gamma1 4s 4s +0%
poly_use_hint 4s 5s -20%
poly_use_hint_native 4s 2s +100%
polyveck_pack_t0 4s 3s +33%
polyveck_unpack_eta 4s 5s -20%
polyveck_unpack_t0 4s 4s +0%
polyvecl_uniform_gamma1_serial 4s 4s +0%
polyz_unpack_c 4s 4s +0%
rej_eta 4s 3s +33%
rej_eta_c 4s 4s +0%
rej_eta_native 4s 5s -20%
shake256_finalize 4s 4s +0%
sign_signature 4s 6s -33%
sys_check_capability 4s 3s +33%
decompose 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
keccakf1600x4_extract_bytes 3s 4s -25%
keccakf1600x4_permute 3s 3s +0%
mld_ct_cmask_nonzero_u32 3s 3s +0%
mld_ct_get_optblocker_i64 3s 3s +0%
mld_h 3s 5s -40%
mld_prepare_domain_separation_prefix 3s 4s -25%
mld_value_barrier_u32 3s 4s -25%
poly_caddq_native 3s 5s -40%
poly_chknorm 3s 2s +50%
poly_decompose 3s 4s -25%
poly_ntt_c 3s 2s +50%
poly_pointwise_montgomery_native 3s 4s -25%
poly_shiftl 3s 1s +200%
polyt1_pack 3s 3s +0%
polyveck_pack_eta 3s 2s +50%
polyvecl_permute_bitrev_to_custom 3s 1s +200%
polyvecl_pointwise_acc_montgomery 3s 6s -50%
polyvecl_pointwise_acc_montgomery_native 3s 2s +50%
polyvecl_unpack_eta 3s 4s -25%
polyw1_pack 3s 3s +0%
polyz_pack 3s 3s +0%
polyz_unpack 3s 2s +50%
power2round 3s 3s +0%
shake128_absorb 3s 3s +0%
shake128_finalize 3s 3s +0%
shake128x4_squeezeblocks 3s 2s +50%
shake256_absorb 3s 1s +200%
shake256_init 3s 2s +50%
sign_keypair 3s 4s -25%
sign_open 3s 5s -40%
sign_signature_extmu 3s 2s +50%
sign_signature_pre_hash_shake256 3s 4s -25%
sign_verify_pre_hash_shake256 3s 4s -25%
unpack_sig 3s 2s +50%
unpack_sk 3s 4s -25%
intt_native_x86_64 2s 2s +0%
keccak_finalize 2s 2s +0%
keccak_init 2s 1s +100%
keccakf1600x4_xor_bytes 2s 1s +100%
make_hint 2s 2s +0%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_neg_i32 2s 3s -33%
mld_ct_get_optblocker_u32 2s 3s -33%
mld_ct_get_optblocker_u8 2s 3s -33%
mld_ct_sel_int32 2s 3s -33%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_value_barrier_i64 2s 4s -50%
mld_value_barrier_u8 2s 3s -33%
montgomery_reduce 2s 3s -33%
poly_add 2s 5s -60%
poly_caddq_native_aarch64 2s 3s -33%
poly_invntt_tomont 2s 3s -33%
poly_ntt_native 2s 3s -33%
poly_pointwise_montgomery 2s 5s -60%
polyeta_pack 2s 3s -33%
polyveck_pack_w1 2s 3s -33%
polyvecl_pack_eta 2s 2s +0%
polyvecl_unpack_z 2s 3s -33%
polyz_unpack_native 2s 4s -50%
reduce32 2s 3s -33%
shake128_init 2s 2s +0%
shake128_release 2s 4s -50%
shake128_squeeze 2s 2s +0%
shake256 2s 3s -33%
shake256_release 2s 5s -60%
shake256_squeeze 2s 3s -33%
shake256x4_squeezeblocks 2s 3s -33%
sign_signature_pre_hash_internal 2s 2s +0%
unpack_pk 2s 3s -33%
use_hint 2s 3s -33%
poly_chknorm_native 1s 3s -67%
poly_decompose_native 1s 2s -50%
poly_ntt 1s 4s -75%
poly_sub 1s 4s -75%
polyt1_unpack 1s 3s -67%
shake128x4_absorb_once 1s 3s -67%
shake256x4_absorb_once 1s 4s -75%

Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this change. I checked that the API documentation did not mention the AArch64 alignment requirements, so we don't need any change there.

@hanno-becker hanno-becker merged commit 521f3bb into main Mar 9, 2026
371 checks passed
@hanno-becker hanno-becker deleted the unaligned-inputs branch March 9, 2026 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants