Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds phi model #1442

Merged
merged 13 commits into from
Jan 25, 2024
Merged

feat: adds phi model #1442

merged 13 commits into from
Jan 25, 2024

Conversation

drbh
Copy link
Collaborator

@drbh drbh commented Jan 13, 2024

This PR adds basic modeling for phi-2

run

text-generation-server \
    serve \
    microsoft/phi-2 \
    --revision 834565c23f9b28b96ccbeabe614dd906b6db551a

test

curl -s localhost:3000/generate \
   -X POST \
   -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
   -H 'Content-Type: application/json' | jq .
# {
#   "generated_text": "\nDeep learning is a subset of machine learning that uses artificial neural networks to learn from data. These"
# }

notes

  • recently (~1 day ago) the Phi weights and model were updated to accommodate adding GQA/MQA attention to the model. This impl expects the original model format so a fixed revision is required at the moment.
  • this PR only includes a basic implementation of the model and can later be extended for support Flash and Sharded versions as well as make use of better optimization

@drbh drbh requested a review from Narsil January 15, 2024 14:07
@Narsil
Copy link
Collaborator

Narsil commented Jan 16, 2024

You need to implement the flash one, the non flash are not interesting anymore and just kept as legacy.

Comment on lines 185 to 193
# NOTE: this is the main difference between Llama and Phi
# in llama the rotary embeddings are applied to the whole query and key.
# Phi uses PARTIAL rotary embeddings, which are applied to the first 32 dimensions
#
# Apply partial positional embeddings in place
self.rotary_emb(
query[:, :, :self.num_heads], kv[:, 0, :, :self.num_heads],
cos, sin
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

highlighting the most important lines here. Note only a portion of the query and key are rotary embedded.

@drbh
Copy link
Collaborator Author

drbh commented Jan 24, 2024

Hey @OlivierDehaene would you mind taking a look at this model (specifically flash phi)? The non flash is only for legacy purposes.

Notes:
I've copied the flash llama file and adjusted it to load the phi weights. Almost all layers are the same as llama except for the attention layer, Phi applies partial rotary embeddings (highlighted above).

Outside of those changes there are a couple differences in weight naming and Phi uses bias. I think it makes sense to have Phi in a separate file rather than bloat the llama implementation.

Please let me know if I should make any changes! Thanks 🙏

use_medusa=use_medusa,
)
else:
raise NotImplementedError(FLASH_ATT_ERROR_MESSAGE.format("Phi"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also load it with the Phi class based on CausalLM? Or is the layout different?
I'm not too familiar with Phi.

Copy link
Collaborator Author

@drbh drbh Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the feedback!

yes, Phi can be loaded as a CausalLM with the 4.37.1 version of transformers. In my last PR I bumped the version, removed the NotImplementedError and tested that Phi works on my local cpu

Copy link
Member

@OlivierDehaene OlivierDehaene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks!

@OlivierDehaene OlivierDehaene merged commit 7e2a743 into main Jan 25, 2024
7 checks passed
@OlivierDehaene OlivierDehaene deleted the impl-phi branch January 25, 2024 14:37
helena-intel pushed a commit to helena-intel/text-generation-inference-hf that referenced this pull request Feb 1, 2024
This PR adds basic modeling for phi-2 

run
```bash
text-generation-server \
    serve \
    microsoft/phi-2 \
    --revision 834565c23f9b28b96ccbeabe614dd906b6db551a
```


test
```bash
curl -s localhost:3000/generate \
   -X POST \
   -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
   -H 'Content-Type: application/json' | jq .
# {
#   "generated_text": "\nDeep learning is a subset of machine learning that uses artificial neural networks to learn from data. These"
# }
```



notes 
- recently (~1 day ago) the Phi weights and model were updated to
accommodate adding [GQA/MQA attention to the
model.](huggingface/transformers#28163) This
impl expects the original model format so a fixed revision is required
at the moment.
- this PR only includes a basic implementation of the model and can
later be extended for support Flash and Sharded versions as well as make
use of better optimization
@drbh drbh mentioned this pull request Feb 3, 2024
4 tasks
kdamaszk pushed a commit to kdamaszk/tgi-gaudi that referenced this pull request Apr 29, 2024
This PR adds basic modeling for phi-2

run
```bash
text-generation-server \
    serve \
    microsoft/phi-2 \
    --revision 834565c23f9b28b96ccbeabe614dd906b6db551a
```

test
```bash
curl -s localhost:3000/generate \
   -X POST \
   -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
   -H 'Content-Type: application/json' | jq .
```

notes
- recently (~1 day ago) the Phi weights and model were updated to
accommodate adding [GQA/MQA attention to the
model.](huggingface/transformers#28163) This
impl expects the original model format so a fixed revision is required
at the moment.
- this PR only includes a basic implementation of the model and can
later be extended for support Flash and Sharded versions as well as make
use of better optimization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants