-
Notifications
You must be signed in to change notification settings - Fork 14k
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Microsoft has released a new Phi-4 14B model. So far it's available only on Azure AI Foundry, in a few days it will appear on HuggingFace.
Motivation
The model is advertised as having strong reasoning abilities despite its relatively small size. It would be great to have it supported in llama.cpp.
Possible Implementation
The model uses Phi3ForCausalLM architecture that is already supported in llama.cpp. The differences I noticed that cause problems are:
- It uses
GPT2Tokenizertokenizer_class, notLlamaTokenizerlike the previous Phi models. Theconvert_hf_to_gguf.pyscript expectsPhi3ForCausalLM-based models to have SentencePiecetokenizer.modelfile and throws exception if it's not present. It has to be modified to support Phi-4. - The model has
sliding_windowparameter value set tonullin config.json. Phi-4 Technical Report says:
The phi-4 model is based on a decoder-only transformer architecture with 14B parameters and a default context length of 4096. This is later extended to a 16K context length during midtraining. The architecture closely follows phi-3-medium, except that we now use the tiktoken tokenizer (for better multilingual support) with a padded vocabulary size of 100,352 (including unused tokens) and we use full attention over the 4K context length, rather than a 2K sliding window used in phi-3-medium
My initial solution for the 1st problem was:
diff --git a/convert_hf_to_gguf.py b/convert_hf_to_gguf.py
index c63d929c..1ae37b83 100755
--- a/convert_hf_to_gguf.py
+++ b/convert_hf_to_gguf.py
@@ -2129,6 +2129,9 @@ class Phi3MiniModel(Model):
model_arch = gguf.MODEL_ARCH.PHI3
def set_vocab(self):
+ if self.metadata.name == "Phi 4":
+ return self._set_vocab_gpt2()
+
from sentencepiece import SentencePieceProcessor
tokenizer_path = self.dir_model / 'tokenizer.model'
As for the second problem, I manually changed sliding_window parameter value to the max context length (16384) in config.json before conversion. This allowed me to test the model. I suppose the final implementation shall detect presence of Phi 4 model and build full KQ mask instead of sliding window KQ mask.