Skip to content

b9670

Latest

Choose a tag to compare

@github-actions github-actions released this 16 Jun 13:10
02810c7

Fix and restrict NVFP4 edge-cases in llama-graph (#24331)

  • Move post-GEMM MUL required for dequant b4 lora and bias add

see #23484 :

  1. For lora, I would presume we want fully dequantized values before
    doing the residuals, but this depends on how the LORAs were
    generated. Literature tells me LORA happens post-mul but pre-bias add #8332
  2. For ModelOPT, bias-add should happen on fully-dequantized
    values
  • Restrict build_ffn for NVFP4 to supported combinations

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI: