Add LLaMA 2 example for DirectML #701

PatriceVignola · 2023-11-07T00:09:43Z

This adds the LLaMA 2 optimizations for DirectML with examples, and a sample ChatApp that was inspired from this but stripped down to an MVP and customized for DirectML.

It also adds a "CompositePyTorchModel" which follows the same principle as the Composite Optimum model but for raw pytorch models instead.

.pre-commit-config.yaml

guotuofeng · 2023-11-07T00:28:36Z

olive/model/__init__.py

@@ -759,6 +759,43 @@ def to_json(self, check_object: bool = False):
        return serialize_to_json(config, check_object)


+class CompositePyTorchModel(PyTorchModel):


what's the difference between this one and OptimumModel? could we leverage OptimumModel or unify them? It seems the two class is similiar.

…avignol/directml-llama-sample-2

pyproject.toml

olive/model/__init__.py

trajepl · 2023-11-07T09:41:52Z

For reference:
in the coming release of optimum, the optimum exporter can export only the merged models.
https://github.com/huggingface/optimum/pull/1257/files

Also, with nightly ort/optimum, Olive optimize llama2 as following:
https://github.com/microsoft/Olive/tree/main/examples/llama2

…avignol/directml-llama-sample-2

PatriceVignola · 2023-11-07T16:57:08Z

what's the difference between this one and OptimumModel? could we leverage OptimumModel or unify them? It seems the two class is similiar.

I decoupled OptimumModel and CompositePyTorch model. They look similar at first glance, but they don't have much in common aside from the model_components part. OptimumModel is a very lightweight class that only needs to return strings of the actual model names which get handled by optimum in its own conversion pass, whereas CompositePyTorch model is a container of multiple other PyTorchModels that get handled recursively in the main OnnxConversion pass.

jambayk · 2023-11-07T17:03:13Z

what's the difference between this one and OptimumModel? could we leverage OptimumModel or unify them? It seems the two class is similiar.

I decoupled OptimumModel and CompositePyTorch model. They look similar at first glance, but they don't have much in common aside from the model_components part. OptimumModel is a very lightweight class that only needs to return strings of the actual model names which get handled by optimum in its own conversion pass, whereas CompositePyTorch model is a container of multiple other PyTorchModels that get handled recursively in the main OnnxConversion pass.

That's what I thought too when looking at the code.
On another note: I am thinking of removing the OptimumModel class entirely later. It is only used for OptimumConversion but I don't see a reason why we cannot just use PyTorchModel + hf_config and OptimumConversion + model_components as pass config. This pass will be able return an ONNXModel or CompositeONNXModel base on the number of components.
There are other olive passes that also only support pytorch models with hf_config.

olive/passes/onnx/optimum_conversion.py

olive/model/__init__.py

PatriceVignola added 30 commits September 18, 2023 03:02

Add a Llama 2 sample for DirectML

c5f57ff

Remove cos/sin from float32 list

bc4feb0

Make Norm layer fp16

4c7c191

Make more RMSNorm float32

7c927de

More fp32 conversions

78833b1

Provide max_gen_len

2fb953d

Improve ScatterND perf

50020d5

Support any value for max_seq_len

d28ecc4

Split the cache in many parts

dadc77c

Uncomment snippet

8743604

Remove unneeded gather

b65d9f3

Slice cos/sin outside of the inner loop

5d03663

Add dynamic sample

7b21092

Enable dynamic graph fusion

0a71537

Slightly improve UpdateCache perf

a08ce0f

Remove float16 workaround

b6d6cea

Set gpt2 as the model type in config

b63184c

Add MHA support

6ea7319

Add workaround for binding caches to same input/output

49d1478

Enable attention fusion

5019038

Add perf compute options

bf16a3f

Add LLaMA license and use guides

e538b6c

Add Chat app (WIP)

b57c145

Enable exposing chat app over local network

148a314

Fix relative imports

3dc2a8c

Fix relative import issues

6e07b7d

Add RotaryEmbedding support

14dc8f0

Refactor code

1b23bb7

WIP

2d5acb8

Fix chat app

28bfc5e

guotuofeng reviewed Nov 7, 2023

View reviewed changes

.pre-commit-config.yaml Show resolved Hide resolved

guotuofeng reviewed Nov 7, 2023

View reviewed changes

PatriceVignola added 4 commits November 6, 2023 19:57

WIP

2c49da7

WIP

f1b3bf8

Remove OptimumModel

1364d84

Merge branch 'main' of https://github.com/microsoft/Olive into user/p…

650f564

…avignol/directml-llama-sample-2

guotuofeng reviewed Nov 7, 2023

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

Update requirement to onnxruntime-directml>=1.16.2

6e61af9

jambayk reviewed Nov 7, 2023

View reviewed changes

olive/model/__init__.py Show resolved Hide resolved

PatriceVignola added 4 commits November 7, 2023 08:35

Apply lintrunner

f99b081

Revert CompositePytorchModel and OptimumModel merge

5dcc6c0

Merge branch 'main' of https://github.com/microsoft/Olive into user/p…

d5aa5d0

…avignol/directml-llama-sample-2

Address PR comments

61514c9

Ignore pylint for ChatApp

aacad95

guotuofeng reviewed Nov 8, 2023

View reviewed changes

olive/passes/onnx/optimum_conversion.py Outdated Show resolved Hide resolved

guotuofeng reviewed Nov 8, 2023

View reviewed changes

olive/model/__init__.py Outdated Show resolved Hide resolved

Revert unneeded change

1be509f

PatriceVignola requested a review from guotuofeng November 8, 2023 21:02

Add COMPOSITE_MODEL

bad589e

guotuofeng previously approved these changes Nov 9, 2023

View reviewed changes

Add option to choose between 7b and 7b-chat

83773ad

PatriceVignola dismissed guotuofeng’s stale review via 83773ad November 10, 2023 00:13

Add LayerNorm and RMS option

94b3a63

PatriceVignola requested a review from guotuofeng November 10, 2023 00:13

guotuofeng approved these changes Nov 10, 2023

View reviewed changes

PatriceVignola merged commit 28cf0dc into main Nov 10, 2023
31 checks passed

PatriceVignola deleted the user/pavignol/directml-llama-sample-2 branch November 10, 2023 02:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LLaMA 2 example for DirectML #701

Add LLaMA 2 example for DirectML #701

PatriceVignola commented Nov 7, 2023

guotuofeng Nov 7, 2023 •

edited

trajepl commented Nov 7, 2023

PatriceVignola commented Nov 7, 2023

jambayk commented Nov 7, 2023 •

edited

		@@ -759,6 +759,43 @@ def to_json(self, check_object: bool = False):
		return serialize_to_json(config, check_object)


		class CompositePyTorchModel(PyTorchModel):

Add LLaMA 2 example for DirectML #701

Add LLaMA 2 example for DirectML #701

Conversation

PatriceVignola commented Nov 7, 2023

guotuofeng Nov 7, 2023 • edited

Choose a reason for hiding this comment

trajepl commented Nov 7, 2023

PatriceVignola commented Nov 7, 2023

jambayk commented Nov 7, 2023 • edited

guotuofeng Nov 7, 2023 •

edited

jambayk commented Nov 7, 2023 •

edited