Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLaMA 2 example for DirectML #701

Merged
merged 63 commits into from
Nov 10, 2023

Conversation

PatriceVignola
Copy link
Contributor

This adds the LLaMA 2 optimizations for DirectML with examples, and a sample ChatApp that was inspired from this but stripped down to an MVP and customized for DirectML.

It also adds a "CompositePyTorchModel" which follows the same principle as the Composite Optimum model but for raw pytorch models instead.

@@ -759,6 +759,43 @@ def to_json(self, check_object: bool = False):
return serialize_to_json(config, check_object)


class CompositePyTorchModel(PyTorchModel):
Copy link
Collaborator

@guotuofeng guotuofeng Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between this one and OptimumModel? could we leverage OptimumModel or unify them? It seems the two class is similiar.

pyproject.toml Outdated Show resolved Hide resolved
@trajepl
Copy link
Contributor

trajepl commented Nov 7, 2023

For reference:
in the coming release of optimum, the optimum exporter can export only the merged models.
https://github.com/huggingface/optimum/pull/1257/files

Also, with nightly ort/optimum, Olive optimize llama2 as following:
https://github.com/microsoft/Olive/tree/main/examples/llama2

@PatriceVignola
Copy link
Contributor Author

what's the difference between this one and OptimumModel? could we leverage OptimumModel or unify them? It seems the two class is similiar.

I decoupled OptimumModel and CompositePyTorch model. They look similar at first glance, but they don't have much in common aside from the model_components part. OptimumModel is a very lightweight class that only needs to return strings of the actual model names which get handled by optimum in its own conversion pass, whereas CompositePyTorch model is a container of multiple other PyTorchModels that get handled recursively in the main OnnxConversion pass.

@jambayk
Copy link
Contributor

jambayk commented Nov 7, 2023

what's the difference between this one and OptimumModel? could we leverage OptimumModel or unify them? It seems the two class is similiar.

I decoupled OptimumModel and CompositePyTorch model. They look similar at first glance, but they don't have much in common aside from the model_components part. OptimumModel is a very lightweight class that only needs to return strings of the actual model names which get handled by optimum in its own conversion pass, whereas CompositePyTorch model is a container of multiple other PyTorchModels that get handled recursively in the main OnnxConversion pass.

That's what I thought too when looking at the code.
On another note: I am thinking of removing the OptimumModel class entirely later. It is only used for OptimumConversion but I don't see a reason why we cannot just use PyTorchModel + hf_config and OptimumConversion + model_components as pass config. This pass will be able return an ONNXModel or CompositeONNXModel base on the number of components.
There are other olive passes that also only support pytorch models with hf_config.

olive/model/__init__.py Outdated Show resolved Hide resolved
guotuofeng
guotuofeng previously approved these changes Nov 9, 2023
@PatriceVignola PatriceVignola merged commit 28cf0dc into main Nov 10, 2023
31 checks passed
@PatriceVignola PatriceVignola deleted the user/pavignol/directml-llama-sample-2 branch November 10, 2023 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants