From 674f31174a639314a3d42e3cc4ee94bf581febe1 Mon Sep 17 00:00:00 2001 From: Abhishek Kumar Singh Date: Fri, 10 Oct 2025 23:32:14 +0530 Subject: [PATCH] Update Qeff Documentation to indicate vLLM Support in Validated Models Page JIRA: https://jira-dc.qualcomm.com/jira/browse/QRNMSWREQ-3782 Signed-off-by: Varun Gupta --- docs/source/validate.md | 84 +++++++++++++++++++---------------------- 1 file changed, 38 insertions(+), 46 deletions(-) diff --git a/docs/source/validate.md b/docs/source/validate.md index e17e85578..b5ab87629 100644 --- a/docs/source/validate.md +++ b/docs/source/validate.md @@ -4,21 +4,21 @@ ## Text-only Language Models ### Text Generation Task -**QEff Auto Class:** [`QEFFAutoModelForCausalLM`](#QEFFAutoModelForCausalLM) +**QEff Auto Class:** `QEFFAutoModelForCausalLM` -| Architecture | Model Family | Representative Models | CB Support | -|-------------------------|--------------------|--------------------------------------------------------------------------------------|------------| -| **FalconForCausalLM** | Falcon | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) | ✔️ | -| **Qwen3MoeForCausalLM** | Qwen3Moe | [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) | ✔️ | +| Architecture | Model Family | Representative Models | [vLLM Support](https://quic.github.io/cloud-ai-sdk-pages/latest/Getting-Started/Installation/vLLM/vLLM/index.html) | +|-------------------------|--------------------|--------------------------------------------------------------------------------------|--------------| +| **FalconForCausalLM** | Falcon** | [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b) | ✔️ | +| **Qwen3MoeForCausalLM** | Qwen3Moe | [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) | ✕ | | **GemmaForCausalLM** | CodeGemma | [google/codegemma-2b](https://huggingface.co/google/codegemma-2b)
[google/codegemma-7b](https://huggingface.co/google/codegemma-7b) | ✔️ | -| | Gemma | [google/gemma-2b](https://huggingface.co/google/gemma-2b)
[google/gemma-7b](https://huggingface.co/google/gemma-7b)
[google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b)
[google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b)
[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) | ✔️ | +| | Gemma*** | [google/gemma-2b](https://huggingface.co/google/gemma-2b)
[google/gemma-7b](https://huggingface.co/google/gemma-7b)
[google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b)
[google/gemma-2-9b](https://huggingface.co/google/gemma-2-9b)
[google/gemma-2-27b](https://huggingface.co/google/gemma-2-27b) | ✔️ | | **GPTBigCodeForCausalLM** | Starcoder1.5 | [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) | ✔️ | | | Starcoder2 | [bigcode/starcoder2-15b](https://huggingface.co/bigcode/starcoder2-15b) | ✔️ | | **GPTJForCausalLM** | GPT-J | [EleutherAI/gpt-j-6b](https://huggingface.co/EleutherAI/gpt-j-6b) | ✔️ | | **GPT2LMHeadModel** | GPT-2 | [openai-community/gpt2](https://huggingface.co/openai-community/gpt2) | ✔️ | | **GraniteForCausalLM** | Granite 3.1 | [ibm-granite/granite-3.1-8b-instruct](https://huggingface.co/ibm-granite/granite-3.1-8b-instruct)
[ibm-granite/granite-guardian-3.1-8b](https://huggingface.co/ibm-granite/granite-guardian-3.1-8b) | ✔️ | | | Granite 20B | [ibm-granite/granite-20b-code-base-8k](https://huggingface.co/ibm-granite/granite-20b-code-base-8k)
[ibm-granite/granite-20b-code-instruct-8k](https://huggingface.co/ibm-granite/granite-20b-code-instruct-8k) | ✔️ | -| **InternVLChatModel** | Intern-VL | [OpenGVLab/InternVL2_5-1B](https://huggingface.co/OpenGVLab/InternVL2_5-1B) | | +| **InternVLChatModel** | Intern-VL | [OpenGVLab/InternVL2_5-1B](https://huggingface.co/OpenGVLab/InternVL2_5-1B) | ✔️ | | | | **LlamaForCausalLM** | CodeLlama | [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)
[codellama/CodeLlama-13b-hf](https://huggingface.co/codellama/CodeLlama-13b-hf)
[codellama/CodeLlama-34b-hf](https://huggingface.co/codellama/CodeLlama-34b-hf) | ✔️ | | | DeepSeek-R1-Distill-Llama | [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) | ✔️ | | | InceptionAI-Adapted | [inceptionai/jais-adapted-7b](https://huggingface.co/inceptionai/jais-adapted-7b)
[inceptionai/jais-adapted-13b-chat](https://huggingface.co/inceptionai/jais-adapted-13b-chat)
[inceptionai/jais-adapted-70b](https://huggingface.co/inceptionai/jais-adapted-70b) | ✔️ | @@ -31,45 +31,42 @@ | **MistralForCausalLM** | Mistral | [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) | ✔️ | | **MixtralForCausalLM** | Codestral
Mixtral | [mistralai/Codestral-22B-v0.1](https://huggingface.co/mistralai/Codestral-22B-v0.1)
[mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) | ✔️ | | **MPTForCausalLM** | MPT | [mosaicml/mpt-7b](https://huggingface.co/mosaicml/mpt-7b) | ✔️ | -| **Phi3ForCausalLM** | Phi-3, Phi-3.5 | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | ✔️ | +| **Phi3ForCausalLM** | Phi-3**, Phi-3.5** | [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) | ✔️ | | **QwenForCausalLM** | DeepSeek-R1-Distill-Qwen | [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) | ✔️ | | | Qwen2, Qwen2.5 | [Qwen/Qwen2-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2-1.5B-Instruct) | ✔️ | | **LlamaSwiftKVForCausalLM** | swiftkv | [Snowflake/Llama-3.1-SwiftKV-8B-Instruct](https://huggingface.co/Snowflake/Llama-3.1-SwiftKV-8B-Instruct) | ✔️ | -| **Grok1ModelForCausalLM** | grok-1 | [hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1) | ✔️ | - ---- - +| **Grok1ModelForCausalLM** | grok-1 | [hpcai-tech/grok-1](https://huggingface.co/hpcai-tech/grok-1) | ✕ | +- ** set "trust-remote-code" flag to True for e2e inference with vLLM +- *** pass "disable-sliding-window" flag for e2e inference of Gemma-2 family of models with vLLM ## Embedding Models ### Text Embedding Task -**QEff Auto Class:** [`QEFFAutoModel`](#QEFFAutoModel) - -| Architecture | Model Family | Representative Models | -|--------------|--------------|---------------------------------| -| **BertModel** | BERT-based | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)
[BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5)
[BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)
[e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) | -| **LlamaModel** | Llama-based | [intfloat/e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) | -| **MPNetForMaskedLM** | MPNet | [sentence-transformers/multi-qa-mpnet-base-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1) | -| **MistralModel** | Mistral | [e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) | -| **NomicBertModel** | NomicBERT | [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) | -| **Qwen2ForCausalLM** | Qwen2 | [stella_en_1.5B_v5](https://huggingface.co/NovaSearch/stella_en_1.5B_v5) | -| **RobertaModel** | RoBERTa | [ibm-granite/granite-embedding-30m-english](https://huggingface.co/ibm-granite/granite-embedding-30m-english)
[ibm-granite/granite-embedding-125m-english](https://huggingface.co/ibm-granite/granite-embedding-125m-english) | -| **XLMRobertaForSequenceClassification** | XLM-RoBERTa | [bge-reranker-v2-m3bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) | -| **XLMRobertaModel** | XLM-RoBERTa |[ibm-granite/granite-embedding-107m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual)
[ibm-granite/granite-embedding-278m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual) | - ---- +**QEff Auto Class:** `QEFFAutoModel` + +| Architecture | Model Family | Representative Models | vLLM Support | +|--------------|--------------|---------------------------------|--------------| +| **BertModel** | BERT-based | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)
[BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5)
[BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5)
[e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) | ✔️ | +| **MPNetForMaskedLM** | MPNet | [sentence-transformers/multi-qa-mpnet-base-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1) | ✕ | +| **MistralModel** | Mistral | [e5-mistral-7b-instruct](https://huggingface.co/intfloat/e5-mistral-7b-instruct) | ✕ | +| **NomicBertModel** | NomicBERT | [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) | ✕ | +| **Qwen2ForCausalLM** | Qwen2 | [stella_en_1.5B_v5](https://huggingface.co/NovaSearch/stella_en_1.5B_v5) | ✔️ | +| **RobertaModel** | RoBERTa | [ibm-granite/granite-embedding-30m-english](https://huggingface.co/ibm-granite/granite-embedding-30m-english)
[ibm-granite/granite-embedding-125m-english](https://huggingface.co/ibm-granite/granite-embedding-125m-english) | ✔️ | +| **XLMRobertaForSequenceClassification** | XLM-RoBERTa | [bge-reranker-v2-m3bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) | ✕ | +| **XLMRobertaModel** | XLM-RoBERTa |[ibm-granite/granite-embedding-107m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual)
[ibm-granite/granite-embedding-278m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-278m-multilingual) | ✔️ | ## Multimodal Language Models ### Vision-Language Models (Text + Image Generation) -**QEff Auto Class:** [`QEFFAutoModelForImageTextToText`](#QEFFAutoModelForImageTextToText) +**QEff Auto Class:** `QEFFAutoModelForImageTextToText` -| Architecture | Model Family | Representative Models | CB Support | Single Qpc Support | Dual Qpc Support | -|-----------------------------|--------------|----------------------------------------------------------------------------------------|------------|--------------------|------------------| -| **LlavaForConditionalGeneration** | LLaVA-1.5 | [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) | ✕ | ✔️ | ✔️ | -| **MllamaForConditionalGeneration** | Llama 3.2 | [meta-llama/Llama-3.2-11B-Vision Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)
[meta-llama/Llama-3.2-90B-Vision](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision) | ✕ | ✔️ | ✔️ | -|**LlavaNextForConditionalGeneration** | Granite Vision | [ibm-granite/granite-vision-3.2-2b](https://huggingface.co/ibm-granite/granite-vision-3.2-2b) | ✕ | ✕ | ✔️ | -|**Llama4ForConditionalGeneration** | Llama-4-Scout | [Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) | ✕ | ✔️ | ✔️ | -|**Gemma3ForConditionalGeneration** | Gemma3 | [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)| ✕ | ✔️ | ✔️ | +| Architecture | Model Family | Representative Models | Qeff Single Qpc | Qeff Dual Qpc | vllm Single Qpc | vllm Dual Qpc | +|------------------------------------|--------------|----------------------------------------------------------------------------------------|------------|---------------------|-------------------|-----------------| +| **LlavaForConditionalGeneration** | LLaVA-1.5 | [llava-hf/llava-1.5-7b-hf](https://huggingface.co/llava-hf/llava-1.5-7b-hf) | ✔️ | ✔️ | ✔️ | ✔️ | +| **MllamaForConditionalGeneration** | Llama 3.2 | [meta-llama/Llama-3.2-11B-Vision Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct)
[meta-llama/Llama-3.2-90B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct) | ✔️ | ✔️ | ✔️ | ✔️ | +| **LlavaNextForConditionalGeneration** | Granite Vision | [ibm-granite/granite-vision-3.2-2b](https://huggingface.co/ibm-granite/granite-vision-3.2-2b) | ✕ | ✔️ | ✕ | ✔️ | +| **Llama4ForConditionalGeneration** | Llama-4-Scout | [Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) | ✔️ | ✔️ | ✔️ | ✔️ | +| **Gemma3ForConditionalGeneration** | Gemma3*** | [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) | ✔️ | ✔️ | ✔️ | ✕ | +- *** pass "disable-sliding-window" flag for e2e inference with vLLM **Dual QPC:** @@ -85,25 +82,20 @@ In the Dual QPC(Qualcomm Program Container) setup, the model is split across two **Single QPC:** In the single QPC(Qualcomm Program Container) setup, the entire model—including both image encoding and text generation—runs within a single QPC. There is no model splitting, and all components operate within the same execution environment. -**For more details click [here](#QEFFAutoModelForImageTextToText)** -```{NOTE} + +**Note:** The choice between Single and Dual QPC is determined during model instantiation using the `kv_offload` setting. If the `kv_offload` is set to `True` it runs in dual QPC and if its set to `False` model runs in single QPC mode. -``` --- - ### Audio Models (Automatic Speech Recognition) - Transcription Task +**QEff Auto Class:** `QEFFAutoModelForSpeechSeq2Seq` -**QEff Auto Class:** [`QEFFAutoModelForSpeechSeq2Seq`](#QEFFAutoModelForSpeechSeq2Seq) - -| Architecture | Model Family | Representative Models | -|--------------|--------------|----------------------------------------------------------------------------------------| -| **Whisper** | Whisper | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)
[openai/whisper-base](https://huggingface.co/openai/whisper-base)
[openai/whisper-small](https://huggingface.co/openai/whisper-small)
[openai/whisper-medium](https://huggingface.co/openai/whisper-medium)
[openai/whisper-large](https://huggingface.co/openai/whisper-large)
[openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) | - ---- +| Architecture | Model Family | Representative Models | vLLM Support | +|--------------|--------------|----------------------------------------------------------------------------------------|--------------| +| **Whisper** | Whisper | [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny)
[openai/whisper-base](https://huggingface.co/openai/whisper-base)
[openai/whisper-small](https://huggingface.co/openai/whisper-small)
[openai/whisper-medium](https://huggingface.co/openai/whisper-medium)
[openai/whisper-large](https://huggingface.co/openai/whisper-large)
[openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) | ✔️ | (models_coming_soon)= # Models Coming Soon