feat: adds phi model #1442

drbh · 2024-01-13T15:19:22Z

This PR adds basic modeling for phi-2

run

text-generation-server \
    serve \
    microsoft/phi-2 \
    --revision 834565c23f9b28b96ccbeabe614dd906b6db551a

test

curl -s localhost:3000/generate \
   -X POST \
   -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
   -H 'Content-Type: application/json' | jq .
# {
#   "generated_text": "\nDeep learning is a subset of machine learning that uses artificial neural networks to learn from data. These"
# }

notes

recently (~1 day ago) the Phi weights and model were updated to accommodate adding GQA/MQA attention to the model. This impl expects the original model format so a fixed revision is required at the moment.
this PR only includes a basic implementation of the model and can later be extended for support Flash and Sharded versions as well as make use of better optimization

Narsil · 2024-01-16T13:48:32Z

You need to implement the flash one, the non flash are not interesting anymore and just kept as legacy.

drbh · 2024-01-24T01:31:42Z

server/text_generation_server/models/custom_modeling/flash_phi_modeling.py

+        # NOTE: this is the main difference between Llama and Phi
+        # in llama the rotary embeddings are applied to the whole query and key.
+        # Phi uses PARTIAL rotary embeddings, which are applied to the first 32 dimensions
+        #
+        # Apply partial positional embeddings in place
+        self.rotary_emb(
+            query[:, :, :self.num_heads], kv[:, 0, :, :self.num_heads],
+            cos, sin
+        )


highlighting the most important lines here. Note only a portion of the query and key are rotary embedded.

drbh · 2024-01-24T01:39:52Z

Hey @OlivierDehaene would you mind taking a look at this model (specifically flash phi)? The non flash is only for legacy purposes.

Notes:
I've copied the flash llama file and adjusted it to load the phi weights. Almost all layers are the same as llama except for the attention layer, Phi applies partial rotary embeddings (highlighted above).

Outside of those changes there are a couple differences in weight naming and Phi uses bias. I think it makes sense to have Phi in a separate file rather than bloat the llama implementation.

Please let me know if I should make any changes! Thanks 🙏

OlivierDehaene · 2024-01-24T16:06:31Z

server/text_generation_server/models/__init__.py

+                use_medusa=use_medusa,
+            )
+        else:
+            raise NotImplementedError(FLASH_ATT_ERROR_MESSAGE.format("Phi"))


Could we also load it with the Phi class based on CausalLM? Or is the layout different?
I'm not too familiar with Phi.

thanks for the feedback!

yes, Phi can be loaded as a CausalLM with the 4.37.1 version of transformers. In my last PR I bumped the version, removed the NotImplementedError and tested that Phi works on my local cpu

server/text_generation_server/models/custom_modeling/flash_phi_modeling.py

OlivierDehaene

Great thanks!

This PR adds basic modeling for phi-2 run ```bash text-generation-server \ serve \ microsoft/phi-2 \ --revision 834565c23f9b28b96ccbeabe614dd906b6db551a ``` test ```bash curl -s localhost:3000/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' | jq . # { # "generated_text": "\nDeep learning is a subset of machine learning that uses artificial neural networks to learn from data. These" # } ``` notes - recently (~1 day ago) the Phi weights and model were updated to accommodate adding [GQA/MQA attention to the model.](huggingface/transformers#28163) This impl expects the original model format so a fixed revision is required at the moment. - this PR only includes a basic implementation of the model and can later be extended for support Flash and Sharded versions as well as make use of better optimization

This PR adds basic modeling for phi-2 run ```bash text-generation-server \ serve \ microsoft/phi-2 \ --revision 834565c23f9b28b96ccbeabe614dd906b6db551a ``` test ```bash curl -s localhost:3000/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' | jq . ``` notes - recently (~1 day ago) the Phi weights and model were updated to accommodate adding [GQA/MQA attention to the model.](huggingface/transformers#28163) This impl expects the original model format so a fixed revision is required at the moment. - this PR only includes a basic implementation of the model and can later be extended for support Flash and Sharded versions as well as make use of better optimization

feat: adds phi model

cd8e0b2

drbh requested a review from Narsil January 15, 2024 14:07

drbh added 9 commits January 17, 2024 22:30

feat: load phi weights and produce nonsense tokens

77ee1f1

fix: improve model initalization

43441ca

fix: remove debug logs

5db645a

fix: load attn weights to align with flash attn

8204f23

fix: prefer parallel attn load and small refactors

2b43c5b

fix: remove unused imports and duplicate spaces

c49332a

feat: avoid copy for partial rotary embeddings

18f13a1

feat: add integration tests and snapshots for phi

c7ad2b6

fix: add inline comments to highlight differences with llama

9939237

drbh requested a review from OlivierDehaene January 24, 2024 01:28

drbh commented Jan 24, 2024

View reviewed changes

OlivierDehaene reviewed Jan 24, 2024

View reviewed changes

drbh added 3 commits January 24, 2024 12:23

fix: adjust model config vars and other refactors

9bcd21a

fix: cleanup config, remove unused values and fix non flash init

6134f01

fix: bump transformers version to support phi fallback

dd39877

drbh requested a review from OlivierDehaene January 25, 2024 14:27

OlivierDehaene approved these changes Jan 25, 2024

View reviewed changes

OlivierDehaene merged commit 7e2a743 into main Jan 25, 2024
7 checks passed

OlivierDehaene deleted the impl-phi branch January 25, 2024 14:37

drbh mentioned this pull request Feb 3, 2024

microsoft/phi-2 not work in TGI #1365

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adds phi model #1442

feat: adds phi model #1442

drbh commented Jan 13, 2024

Narsil commented Jan 16, 2024

drbh Jan 24, 2024

drbh commented Jan 24, 2024

OlivierDehaene Jan 24, 2024

drbh Jan 24, 2024 •

edited

OlivierDehaene left a comment

feat: adds phi model #1442

feat: adds phi model #1442

Conversation

drbh commented Jan 13, 2024

Narsil commented Jan 16, 2024

drbh Jan 24, 2024

Choose a reason for hiding this comment

drbh commented Jan 24, 2024

OlivierDehaene Jan 24, 2024

Choose a reason for hiding this comment

drbh Jan 24, 2024 • edited

Choose a reason for hiding this comment

OlivierDehaene left a comment

Choose a reason for hiding this comment

drbh Jan 24, 2024 •

edited