-
Notifications
You must be signed in to change notification settings - Fork 553
Fix the prompt wrapping of gemma3-1b #523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Agreed - was planning on including this in a fix from the Google side. Appreciate it.
The implementation of prompt wrapping for Gemma3 also does not work properly. I have fixed it in my project but have no time to port it to gemma.cpp yet. |
Thanks - anything in particular we should be looking for? Do you mean that the image wrapping isn't working, or IT wrapping for the VLM models? If your timeline for porting would be long, we may just tackle it as we'd like to get any fixes in the codebase as soon as we can. |
There are three issues I have found so far: 1st, IT wrapping is not applied to the VLM model at all: In the file gemma/common.cc: void Wrap(const ModelInfo& info, size_t pos, std::string& prompt) {
// Instruction-tuned models are trained to expect control tokens.
if (info.wrapping == PromptWrapping::GEMMA_IT) {
// Prepend "<end_of_turn>" if this is a multi-turn dialogue continuation.
const std::string start = (pos == 0)
? "<start_of_turn>user\n"
: "<end_of_turn>\n<start_of_turn>user\n";
prompt = start + prompt + "<end_of_turn>\n<start_of_turn>model\n";
}
} 2nd, I've tried to fix the first issue by modifying the condition in 3rd, the implementation of image tokens wrapping for VLM models is currently not compatible with IT wrapping and the I viewed gemma_pytorch's preprocessor and the chat template on HuggingFace and used the following wrapping in my project:
It might be a good idea to separate tokenization of text and images, and IT wrapping. This is how I implemented it in lua-cgemma (removed some irrelevant code): std::vector<int> session::tokenize(const gcpp::ImageTokens& image, const char* text, size_t len) const {
auto text_part = tokenize_text(std::string(text, len));
std::vector<int> prompt;
switch (inst_->model().Info().wrapping) {
case gcpp::PromptWrapping::PALIGEMMA: {
...
}
case gcpp::PromptWrapping::GEMMA_VLM: {
std::vector<int> soi;
soi.reserve(2);
if (!inst_->model().Tokenizer().Encode("\n\n<start_of_image>", &soi)) {
throw std::runtime_error("Tokenizer encoding failed. (session::tokenize)");
}
std::vector<int> eoi;
eoi.reserve(2);
if (!inst_->model().Tokenizer().Encode("<end_of_image>\n\n", &eoi)) {
throw std::runtime_error("Tokenizer encoding failed. (session::tokenize)");
}
const auto prompt_size = text_part.size() + soi.size() + image.BatchSize() + eoi.size();
prompt.reserve(prompt_size);
prompt.insert(prompt.cend(), text_part.cbegin(), text_part.cend());
prompt.insert(prompt.cend(), soi.cbegin(), soi.cend());
prompt.insert(prompt.cend(), image.BatchSize(), -2);
prompt.insert(prompt.cend(), eoi.cbegin(), eoi.cend());
return tokenize_wrap(prompt); // doing IT wrapping
}
default:
throw std::invalid_argument("Current variant does not support images.");
}
} |
Thank you! Much appreciated. |
The weights of gemma3-1b on Kaggle are instruction-tuned, so the value of its prompt wrapping should be
PromptWrapping::GEMMA_IT
.