quic · quic-rishinr · Nov 14, 2025 · Nov 13, 2025
@@ -108,8 +108,8 @@ For more details about using ``QEfficient`` via Cloud AI 100 Apps SDK, visit [Li
 
 ## Documentation
 
-* [Quick Start Guide](https://quic.github.io/efficient-transformers/source/quick_start.html#)
-* [Python API](https://quic.github.io/efficient-transformers/source/hl_api.html)
+* [Quick Start Guide](https://quic.github.io/efficient-transformers/source/quick_start.html)
+* [QEFF API](https://quic.github.io/efficient-transformers/source/qeff_autoclasses.html)
 * [Validated Models](https://quic.github.io/efficient-transformers/source/validate.html)
 * [Models coming soon](https://quic.github.io/efficient-transformers/source/validate.html#models-coming-soon)
 

@@ -30,6 +30,8 @@ Supported Features
      - Enables execution with FP8 precision, significantly improving performance and reducing memory usage for computational tasks.
    * - Prefill caching
      - Enhances inference speed by caching key-value pairs for shared prefixes, reducing redundant computations and improving efficiency.
+   * - On Device Sampling
+     - Enables sampling operations to be executed directly on the QAIC device rather than the host CPU for QEffForCausalLM models. This enhancement significantly reduces host-device communication overhead and improves inference throughput and scalability. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/examples/on_device_sampling.py>`_ for more **details**.
    * - Prompt-Lookup Decoding
      - Speeds up text generation by using overlapping parts of the input prompt and the generated text, making the process faster without losing quality. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/examples/pld_spd_inference.py>`_ for more **details**.
    * - :ref:`PEFT LoRA support <QEffAutoPeftModelForCausalLM>`