Skip to content

Feature Request: Add support for Phi-4 model #10814

@fairydreaming

Description

@fairydreaming

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Microsoft has released a new Phi-4 14B model. So far it's available only on Azure AI Foundry, in a few days it will appear on HuggingFace.

Motivation

The model is advertised as having strong reasoning abilities despite its relatively small size. It would be great to have it supported in llama.cpp.

Possible Implementation

The model uses Phi3ForCausalLM architecture that is already supported in llama.cpp. The differences I noticed that cause problems are:

  1. It uses GPT2Tokenizer tokenizer_class, not LlamaTokenizer like the previous Phi models. The convert_hf_to_gguf.py script expects Phi3ForCausalLM-based models to have SentencePiece tokenizer.model file and throws exception if it's not present. It has to be modified to support Phi-4.
  2. The model has sliding_window parameter value set to null in config.json. Phi-4 Technical Report says:

The phi-4 model is based on a decoder-only transformer architecture with 14B parameters and a default context length of 4096. This is later extended to a 16K context length during midtraining. The architecture closely follows phi-3-medium, except that we now use the tiktoken tokenizer (for better multilingual support) with a padded vocabulary size of 100,352 (including unused tokens) and we use full attention over the 4K context length, rather than a 2K sliding window used in phi-3-medium

My initial solution for the 1st problem was:

diff --git a/convert_hf_to_gguf.py b/convert_hf_to_gguf.py
index c63d929c..1ae37b83 100755
--- a/convert_hf_to_gguf.py
+++ b/convert_hf_to_gguf.py
@@ -2129,6 +2129,9 @@ class Phi3MiniModel(Model):
     model_arch = gguf.MODEL_ARCH.PHI3
 
     def set_vocab(self):
+        if self.metadata.name == "Phi 4":
+            return self._set_vocab_gpt2()
+
         from sentencepiece import SentencePieceProcessor
 
         tokenizer_path = self.dir_model / 'tokenizer.model'

As for the second problem, I manually changed sliding_window parameter value to the max context length (16384) in config.json before conversion. This allowed me to test the model. I suppose the final implementation shall detect presence of Phi 4 model and build full KQ mask instead of sliding window KQ mask.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions