You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the solution you would like:
Implement self-speculative decoding as described in this paper where the earlier layers act as the draft stage and remaining layers act as the verification stage.
Describe the alternatives you have considered:
There are different options to implement that:
Implement regular Speculative Decoding where the draft stage is a separate model, and then Self-Speculative Decoding could be implemented by providing a subset of the layers as the draft model (e.g., this implementation)
If we use this setup, we can add some flags to inform earlier layers if they are running the draft stage or verification stage
Directly implement Self-Speculative Decoding as done here
Describe the solution you would like:
Implement self-speculative decoding as described in this paper where the earlier layers act as the draft stage and remaining layers act as the verification stage.
Describe the alternatives you have considered:
There are different options to implement that:
Additional Context:
The text was updated successfully, but these errors were encountered: