generate 🤜 🤛 torch.compile
Part of the PyTorch 2024 H2 roadmap.
This issue is a tracker of the compatibility between .generate and torch.compile (intro docs by pytorch). The goal is to enable fullgraph=True compilation on the main generate use cases.
⚠️ Is your generate use case not covered by this tracker? Check if it was requested below and upvote it if it was. Otherwise, add a comment. We will consider expanding the selection below on widely requested use cases 🤗
Decoding Strategies (end-to-end compilation)
Generate Flags and Options
Models
Notes:
- models tagged as "important models" in our CI + popular models
- language models released starting from v4.42 should ALL support compile
Decoder-only:
Encoder-decoder:
Quantization
Others
generate🤜 🤛torch.compilePart of the PyTorch 2024 H2 roadmap.
This issue is a tracker of the compatibility between
.generateandtorch.compile(intro docs by pytorch). The goal is to enablefullgraph=Truecompilation on the maingenerateuse cases.generateuse case not covered by this tracker? Check if it was requested below and upvote it if it was. Otherwise, add a comment. We will consider expanding the selection below on widely requested use cases 🤗Decoding Strategies (end-to-end compilation)
greedy_search/sampleare compatible (Generate: end-to-end compilation #30788)beam_search/beam_sampleare compatible, depends on the step aboveassisted_decoding(aka speculative decoding) is compatible, depends on the steps aboveGenerate Flags and Options
LogitsProcessorclasses were checked for compatibility (and the appropriate exceptions are raised when not compatible)StoppingCriteriaclasses were checked for compatibility (and the appropriate exceptions are raised when not compatible)Models
Notes:
Decoder-only:
Core generation] Adds support for static KV cache #27931)gemma] Adds support for Gemma 💎 #29167)torch.compileimplementation #29891)Encoder-decoder:
Quantization
Others