Skip to content

Fix the prompt wrapping of gemma3-1b #523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 19, 2025

Conversation

ufownl
Copy link
Contributor

@ufownl ufownl commented Mar 18, 2025

The weights of gemma3-1b on Kaggle are instruction-tuned, so the value of its prompt wrapping should be PromptWrapping::GEMMA_IT.

Copy link
Collaborator

@pculliton pculliton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Agreed - was planning on including this in a fix from the Google side. Appreciate it.

@ufownl
Copy link
Contributor Author

ufownl commented Mar 18, 2025

The implementation of prompt wrapping for Gemma3 also does not work properly. I have fixed it in my project but have no time to port it to gemma.cpp yet.

@pculliton
Copy link
Collaborator

The implementation of prompt wrapping for Gemma3 also does not work properly. I have fixed it in my project but have no time to port it to gemma.cpp yet.

Thanks - anything in particular we should be looking for? Do you mean that the image wrapping isn't working, or IT wrapping for the VLM models? If your timeline for porting would be long, we may just tackle it as we'd like to get any fixes in the codebase as soon as we can.

@ufownl
Copy link
Contributor Author

ufownl commented Mar 18, 2025

There are three issues I have found so far:

1st, IT wrapping is not applied to the VLM model at all:

In the file gemma/common.cc:

void Wrap(const ModelInfo& info, size_t pos, std::string& prompt) {

  // Instruction-tuned models are trained to expect control tokens.
  if (info.wrapping == PromptWrapping::GEMMA_IT) {
    // Prepend "<end_of_turn>" if this is a multi-turn dialogue continuation.
    const std::string start = (pos == 0)
                                  ? "<start_of_turn>user\n"
                                  : "<end_of_turn>\n<start_of_turn>user\n";
    prompt = start + prompt + "<end_of_turn>\n<start_of_turn>model\n";
  }
}

2nd, I've tried to fix the first issue by modifying the condition in Wrap function above, the generation phase could not be completed correctly, looks like it's in a dead loop. I'm not sure what happened, I have not looked into it yet. Maybe it was caused by the changes in the tokenizer? In my project, I stop the generation phase when the IT model outputs <end_of_turn>.

3rd, the implementation of image tokens wrapping for VLM models is currently not compatible with IT wrapping and the <bos> token will be added repeatedly. This section may need to be completely rewritten.

I viewed gemma_pytorch's preprocessor and the chat template on HuggingFace and used the following wrapping in my project:

<bos><start_of_turn>user\n[text tokens]\n\n<start_of_image>[image tokens]<end_of_image>\n\n<end_of_turn>\n<start_of_turn>model\n

<end_of_turn><start_of_turn>user\n[text tokens]\n\n<start_of_image>[image tokens]<end_of_image>\n\n<end_of_turn>\n<start_of_turn>model\n

It might be a good idea to separate tokenization of text and images, and IT wrapping. This is how I implemented it in lua-cgemma (removed some irrelevant code):

std::vector<int> session::tokenize(const gcpp::ImageTokens& image, const char* text, size_t len) const {
  auto text_part = tokenize_text(std::string(text, len));
  std::vector<int> prompt;
  switch (inst_->model().Info().wrapping) {
    case gcpp::PromptWrapping::PALIGEMMA: {
      ...
    }
    case gcpp::PromptWrapping::GEMMA_VLM: {
      std::vector<int> soi;
      soi.reserve(2);
      if (!inst_->model().Tokenizer().Encode("\n\n<start_of_image>", &soi)) {
        throw std::runtime_error("Tokenizer encoding failed. (session::tokenize)");
      }
      std::vector<int> eoi;
      eoi.reserve(2);
      if (!inst_->model().Tokenizer().Encode("<end_of_image>\n\n", &eoi)) {
        throw std::runtime_error("Tokenizer encoding failed. (session::tokenize)");
      }
      const auto prompt_size = text_part.size() + soi.size() + image.BatchSize() + eoi.size();
      prompt.reserve(prompt_size);
      prompt.insert(prompt.cend(), text_part.cbegin(), text_part.cend());
      prompt.insert(prompt.cend(), soi.cbegin(), soi.cend());
      prompt.insert(prompt.cend(), image.BatchSize(), -2);
      prompt.insert(prompt.cend(), eoi.cbegin(), eoi.cend());
      return tokenize_wrap(prompt);  // doing IT wrapping
    }
    default:
      throw std::invalid_argument("Current variant does not support images.");
  }
}

@pculliton
Copy link
Collaborator

Thank you! Much appreciated.

@pculliton pculliton merged commit 3d419ec into google:dev Mar 19, 2025
5 of 6 checks passed
@pculliton pculliton added the copybara-import Trigger Copybara for merging pull requests label Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
copybara-import Trigger Copybara for merging pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants