Skip to content

MTP (Multi-Token Prediction) support #5

@sureshg

Description

@sureshg

Gemma 4 ships with lightweight drafter models (gemma-4-E4B-it-assistant, gemma-4-26B-A4B-it-assistant, etc.) that enable Multi-Token Prediction for speculative decoding. llama.cpp already landed support for this and folks are seeing 40-50% faster generation. Would be great to have this in Gemma4.java.

https://x.com/ggerganov/status/2056391115469689330

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions