Skip to content

Layerwise int4 kimi#973

Draft
abhishek-singh591 wants to merge 69 commits into
quic:mainfrom
abhishek-singh591:layerwise_int4_kimi
Draft

Layerwise int4 kimi#973
abhishek-singh591 wants to merge 69 commits into
quic:mainfrom
abhishek-singh591:layerwise_int4_kimi

Conversation

@abhishek-singh591
Copy link
Copy Markdown
Contributor

@abhishek-singh591 abhishek-singh591 commented May 7, 2026

Setup and Run Instructions

Follow the steps below to set up and run Kimi K2.5 layerwise export/compile using run.py.

Step 1: Download the Model

Download Kimi K2.5 from Hugging Face:

Step 2: Set Model Path

run.py now requires --model_path (no hardcoded default).

Example:

--model_path /home/huggingface_hub/models--moonshotai--Kimi-K2.5/snapshots/54383e83fa343a1331754112fb9e3410c55efa2f

Step 3: Command-Line Arguments

Argument Description Default
--model_path Path to downloaded Kimi model (required) None (required)
--aic_hw_version Accelerator HW version passed to final compile ai100
--window_size Number of layers per export window 1
--layerwise_mode single_qpc or multiple_qpc single_qpc
--total_layers Total text layers; auto-resolved from model config if not set None
--num-devices Number of devices for compile stages 1
--batch_size Batch size for specialization config 1
--seq_len Prefill/compile sequence length 1
--ctx_len Context length 128
--num_cores Number of accelerator cores 16
--mxfp6 / --no-mxfp6 Enable/disable MXFP6 matmul compile flag Enabled
--mxint8_kv_cache / --no-mxint8_kv_cache Enable/disable MXINT8 KV cache Enabled
--enable_blocking / --no-enable_blocking Enable/disable blocking in QAIC config Disabled
--blocking_mode Blocking mode kv
--num_kv_heads_repeat KV heads repeat count 1
--num_kv_blocks Number of KV blocks 4
--head_block_size Head block size 4
--absorption Enable MLA absorption Disabled
--online Enable MLA online mode Disabled
--prefill_only Compile in prefill-only mode Disabled

Step 4: Layerwise Strategy

  • single_qpc:
    • Builds layerwise ONNX windows
    • Merges via QEfficient.utils.layerwise_pipeline(...)
    • Runs final full-model compile via compile_full_model.py flow
  • multiple_qpc:
    • Compiles layerwise outputs directly with QEfficient.utils.compile_layerwise(...)
    • Runs QEfficient.utils.inference_pipeline(...)

Step 5: Run Examples

Minimal:

  python run.py \
    --model_path /path/to/Kimi-K2.5/snapshot

Your requested test shape (total_layers=3):

  --model_path /home/huggingface_hub/models--moonshotai--Kimi-K2.5/snapshots/54383e83fa343a1331754112fb9e3410c55efa2f \
  --total_layers 3 \
  --window_size 1 \
  --layerwise_mode single_qpc \
  --aic_hw_version ai100

Multiple-QPC mode:

  python run.py \
    --model_path /path/to/Kimi-K2.5/snapshot \
    --layerwise_mode multiple_qpc \
    --window_size 1

Note on Layer Windows

Windowing uses half-open intervals:

[start, end)

  • start is inclusive
  • end is exclusive

Windows are generated from the top layer range down to 0 (for example, with total_layers=3, window_size=1: (2,3), (1,2),
(0,1)).

abhishek-singh591 and others added 30 commits April 29, 2026 16:08
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>
Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>
Signed-off-by: Mamta Singh <168400541+quic-mamta@users.noreply.github.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Merge remote-tracking branch 'upstream/mla_int4_moe' into layerwise_int4_kimi
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
abhishek-singh591 and others added 9 commits May 8, 2026 20:56
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
fix prefill output

Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
fixed EP Q chunking

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
@abhishek-singh591 abhishek-singh591 marked this pull request as draft May 18, 2026 06:08
Signed-off-by: Abhishek kumar singh <sabhis@qti.qualcomm.com>
abhishek-singh591 and others added 11 commits May 18, 2026 13:56
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek kumar singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek kumar singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek kumar singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek kumar singh <sabhis@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@ochougul ochougul force-pushed the layerwise_int4_kimi branch from 9d11bf8 to a9d5413 Compare May 18, 2026 22:39
ochougul and others added 5 commits May 19, 2026 18:11
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Signed-off-by: Abhishek Kumar Singh <sabhis@qti.qualcomm.com>
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants