Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,8 @@ For more details about using ``QEfficient`` via Cloud AI 100 Apps SDK, visit [Li

## Documentation

* [Quick Start Guide](https://quic.github.io/efficient-transformers/source/quick_start.html#)
* [Python API](https://quic.github.io/efficient-transformers/source/hl_api.html)
* [Quick Start Guide](https://quic.github.io/efficient-transformers/source/quick_start.html)
* [QEFF API](https://quic.github.io/efficient-transformers/source/qeff_autoclasses.html)
* [Validated Models](https://quic.github.io/efficient-transformers/source/validate.html)
* [Models coming soon](https://quic.github.io/efficient-transformers/source/validate.html#models-coming-soon)

Expand Down
2 changes: 2 additions & 0 deletions docs/source/supported_features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ Supported Features
- Enables execution with FP8 precision, significantly improving performance and reducing memory usage for computational tasks.
* - Prefill caching
- Enhances inference speed by caching key-value pairs for shared prefixes, reducing redundant computations and improving efficiency.
* - On Device Sampling
- Enables sampling operations to be executed directly on the QAIC device rather than the host CPU for QEffForCausalLM models. This enhancement significantly reduces host-device communication overhead and improves inference throughput and scalability. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/examples/on_device_sampling.py>`_ for more **details**.
* - Prompt-Lookup Decoding
- Speeds up text generation by using overlapping parts of the input prompt and the generated text, making the process faster without losing quality. Refer `sample script <https://github.com/quic/efficient-transformers/blob/main/examples/pld_spd_inference.py>`_ for more **details**.
* - :ref:`PEFT LoRA support <QEffAutoPeftModelForCausalLM>`
Expand Down
Loading