Fix the prompt wrapping of gemma3-1b #523

ufownl · 2025-03-18T08:59:42Z

The weights of gemma3-1b on Kaggle are instruction-tuned, so the value of its prompt wrapping should be PromptWrapping::GEMMA_IT.

pculliton

Thanks! Agreed - was planning on including this in a fix from the Google side. Appreciate it.

ufownl · 2025-03-18T13:43:34Z

The implementation of prompt wrapping for Gemma3 also does not work properly. I have fixed it in my project but have no time to port it to gemma.cpp yet.

pculliton · 2025-03-18T14:02:51Z

The implementation of prompt wrapping for Gemma3 also does not work properly. I have fixed it in my project but have no time to port it to gemma.cpp yet.

Thanks - anything in particular we should be looking for? Do you mean that the image wrapping isn't working, or IT wrapping for the VLM models? If your timeline for porting would be long, we may just tackle it as we'd like to get any fixes in the codebase as soon as we can.

ufownl · 2025-03-18T15:45:06Z

There are three issues I have found so far:

1st, IT wrapping is not applied to the VLM model at all:

In the file gemma/common.cc:

void Wrap(const ModelInfo& info, size_t pos, std::string& prompt) {

  // Instruction-tuned models are trained to expect control tokens.
  if (info.wrapping == PromptWrapping::GEMMA_IT) {
    // Prepend "<end_of_turn>" if this is a multi-turn dialogue continuation.
    const std::string start = (pos == 0)
                                  ? "<start_of_turn>user\n"
                                  : "<end_of_turn>\n<start_of_turn>user\n";
    prompt = start + prompt + "<end_of_turn>\n<start_of_turn>model\n";
  }
}

2nd, I've tried to fix the first issue by modifying the condition in Wrap function above, the generation phase could not be completed correctly, looks like it's in a dead loop. I'm not sure what happened, I have not looked into it yet. Maybe it was caused by the changes in the tokenizer? In my project, I stop the generation phase when the IT model outputs <end_of_turn>.

3rd, the implementation of image tokens wrapping for VLM models is currently not compatible with IT wrapping and the <bos> token will be added repeatedly. This section may need to be completely rewritten.

I viewed gemma_pytorch's preprocessor and the chat template on HuggingFace and used the following wrapping in my project:

<bos><start_of_turn>user\n[text tokens]\n\n<start_of_image>[image tokens]<end_of_image>\n\n<end_of_turn>\n<start_of_turn>model\n

<end_of_turn><start_of_turn>user\n[text tokens]\n\n<start_of_image>[image tokens]<end_of_image>\n\n<end_of_turn>\n<start_of_turn>model\n

It might be a good idea to separate tokenization of text and images, and IT wrapping. This is how I implemented it in lua-cgemma (removed some irrelevant code):

std::vector<int> session::tokenize(const gcpp::ImageTokens& image, const char* text, size_t len) const {
  auto text_part = tokenize_text(std::string(text, len));
  std::vector<int> prompt;
  switch (inst_->model().Info().wrapping) {
    case gcpp::PromptWrapping::PALIGEMMA: {
      ...
    }
    case gcpp::PromptWrapping::GEMMA_VLM: {
      std::vector<int> soi;
      soi.reserve(2);
      if (!inst_->model().Tokenizer().Encode("\n\n<start_of_image>", &soi)) {
        throw std::runtime_error("Tokenizer encoding failed. (session::tokenize)");
      }
      std::vector<int> eoi;
      eoi.reserve(2);
      if (!inst_->model().Tokenizer().Encode("<end_of_image>\n\n", &eoi)) {
        throw std::runtime_error("Tokenizer encoding failed. (session::tokenize)");
      }
      const auto prompt_size = text_part.size() + soi.size() + image.BatchSize() + eoi.size();
      prompt.reserve(prompt_size);
      prompt.insert(prompt.cend(), text_part.cbegin(), text_part.cend());
      prompt.insert(prompt.cend(), soi.cbegin(), soi.cend());
      prompt.insert(prompt.cend(), image.BatchSize(), -2);
      prompt.insert(prompt.cend(), eoi.cbegin(), eoi.cend());
      return tokenize_wrap(prompt);  // doing IT wrapping
    }
    default:
      throw std::invalid_argument("Current variant does not support images.");
  }
}

pculliton · 2025-03-19T13:52:15Z

Thank you! Much appreciated.

Fix the prompt wrapping of gemma3-1b

b16ce9a

pculliton approved these changes Mar 18, 2025

View reviewed changes

pculliton merged commit 3d419ec into google:dev Mar 19, 2025
5 of 6 checks passed

pculliton added the copybara-import Trigger Copybara for merging pull requests label Mar 20, 2025

ufownl mentioned this pull request Mar 26, 2025

Refactor WrapAndTokenize to work properly with Gemma3 #529

Merged

mrblenderTBS mentioned this pull request Mar 27, 2025

Problems in REPL mode #531

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix the prompt wrapping of gemma3-1b #523

Fix the prompt wrapping of gemma3-1b #523

Uh oh!

ufownl commented Mar 18, 2025

Uh oh!

pculliton left a comment

Uh oh!

ufownl commented Mar 18, 2025

Uh oh!

pculliton commented Mar 18, 2025

Uh oh!

ufownl commented Mar 18, 2025

Uh oh!

pculliton commented Mar 19, 2025

Uh oh!

Uh oh!

Uh oh!

Fix the prompt wrapping of gemma3-1b #523

Fix the prompt wrapping of gemma3-1b #523

Uh oh!

Conversation

ufownl commented Mar 18, 2025

Uh oh!

pculliton left a comment

Choose a reason for hiding this comment

Uh oh!

ufownl commented Mar 18, 2025

Uh oh!

pculliton commented Mar 18, 2025

Uh oh!

ufownl commented Mar 18, 2025

Uh oh!

pculliton commented Mar 19, 2025

Uh oh!

Uh oh!

Uh oh!