b9670

github-actions released this 16 Jun 13:10

02810c7

Fix and restrict NVFP4 edge-cases in llama-graph (#24331)

Move post-GEMM MUL required for dequant b4 lora and bias add

see #23484 :

For lora, I would presume we want fully dequantized values before
doing the residuals, but this depends on how the LORAs were
generated. Literature tells me LORA happens post-mul but pre-bias add #8332
For ModelOPT, bias-add should happen on fully-dequantized
values

Restrict build_ffn for NVFP4 to supported combinations

macOS/iOS:

macOS Apple Silicon (arm64)
macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED
macOS Intel (x64)
iOS XCFramework

Linux:

Android:

Android arm64 (CPU)

Windows:

openEuler:

DISABLED
openEuler x86 (310p)
openEuler x86 (910b, ACL Graph)
openEuler aarch64 (310p)
openEuler aarch64 (910b, ACL Graph)

UI:

UI

Assets 26