Skip to content

Conversation

@asmigosw
Copy link
Contributor

Added flags:

  1. --iteration: Number of iterations to run the inference after loading the QPC once.
  2. --automation: If true, it prints input, output, and performance stats.

Example command: python -m QEfficient.cloud.infer --model_name gpt2 --batch_size 1 --prompt_len 32 --ctx_len 128 --mxfp6 --num_cores 16 --device_group [0] --prompt "My name is" --mos 1 --aic_enable_depth_first --iteration 2 --automation

@quic-rishinr quic-rishinr marked this pull request as ready for review October 9, 2025 04:58
Signed-off-by: Asmita Goswami <asmigosw@qti.qualcomm.com>
@quic-hemagnih quic-hemagnih merged commit 8e13633 into quic:main Oct 9, 2025
5 checks passed
ochougul pushed a commit that referenced this pull request Nov 3, 2025
Added flags:
1. **--iteration:** Number of iterations to run the inference after
loading the QPC once.
2. **--automation:** If true, it prints input, output, and performance
stats.

Example command: `python -m QEfficient.cloud.infer --model_name gpt2
--batch_size 1 --prompt_len 32 --ctx_len 128 --mxfp6 --num_cores 16
--device_group [0] --prompt "My name is" --mos 1
--aic_enable_depth_first --iteration 2 --automation`

Signed-off-by: Asmita Goswami <asmigosw@qti.qualcomm.com>
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants