0.8.0 rc3 cherry picks #1475

RyanUnderhill · 2025-05-12T22:06:32Z

Address previous PR review comments from #1470 (#1473)
Address QNN specific regressions (#1470)
Fix array eos_token_id handling (#1463)
Constrained decoding integration (#1381)
Remove BF16 CPU from valid GQA configuration (#1469)
Avoid adding providers if not requested (#1464)
Persist provider options across ClearProviders, AppendProvider where possible (#1454)
Fix accuracy issues with Gemma models (#1448)
Add bfloat16 support in model builder (#1447)
Add final norm for LoRA models (#1446)

Update version to 0.8.0-rc3

### Description This PR adds the missing pattern to identify the final norm layer in LoRA models. It also cleans up some of the classes in the model builder. ### Motivation and Context The missing final norm layer in LoRA models caused the generated LoRA models to be incorrect.

### Description This PR adds [bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) support in the model builder. ### Motivation and Context Most SLMs and LLMs are trained in bfloat16 precision. Casting from bfloat16 to [float16](https://en.wikipedia.org/wiki/Half-precision_floating-point_format) can cause accuracy loss in models (e.g. Google's Gemma model family). The NumPy dependency when converting a [torch.Tensor](https://pytorch.org/docs/stable/tensors.html) object to an [ONNX TensorProto](https://onnx.ai/onnx/api/helper.html#onnx.helper.make_tensor) object has been removed. This will allow torch.Tensor objects in [other precisions that are not supported in NumPy](https://numpy.org/doc/stable/user/basics.types.html#relationship-between-numpy-data-types-and-c-data-types) to be converted to ONNX TensorProto objects. This PR also fixes [this issue](#691).

### Description This PR fixes accuracy issues with Google's Gemma models by using bfloat16 precision, [always using float32 precision to compute any LayerNorms](https://github.com/huggingface/transformers/blob/fee1190601b5d04ec6d3f7f58fd22788d7f3236d/src/transformers/models/gemma3/modeling_gemma3.py#L141-L146), and casting the output logits to float32 always. ### Motivation and Context This PR has been tested with Gemma-2 and Gemma-3. It is using the bfloat16 changes from [this PR](#1447) as well as the missing final norm changes from [this PR](#1420). --------- Co-authored-by: Nenad Banfic <46795300+nenad1002@users.noreply.github.com> Co-authored-by: Nenad Banfic <nebanfic@microsoft.com>

…possible (#1454)

Most CPUs do not support BF16, hence removing it as an option since we miss some underlying kernel implementation

Integrate Constrained decoding using LLGuidance library. Based on Ying's Constrained Decoding branch (yingxiong/constrained_decoding) --------- Co-authored-by: Ying Xiong <yingxiong@microsoft.com> Co-authored-by: Michał Moskal <michal@moskal.me> Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com> Co-authored-by: Ryan Hill <38674843+RyanUnderhill@users.noreply.github.com> Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>

Update windows packaging pipelines to use build.py by aciddelgado · Pull Request #1468 · microsoft/onnxruntime-genai

RyanUnderhill and others added 12 commits May 12, 2025 13:41

Update version

f8fd5ee

Persist provider options across ClearProviders, AppendProvider where …

7cc7c81

…possible (#1454)

Avoid adding providers if not requested (#1464)

9100f53

Remove BF16 CPU from valid GQA configuration (#1469)

53b64ac

Most CPUs do not support BF16, hence removing it as an option since we miss some underlying kernel implementation

Fix array eos_token_id handling (#1463)

537c739

Address QNN specific regressions (#1470)

24f4b4f

Address previous PR review comments from #1470 (#1473)

1105548

Remove small block of code due to lack of this PR for it to work:

4014c92

Update windows packaging pipelines to use build.py by aciddelgado · Pull Request #1468 · microsoft/onnxruntime-genai

baijumeswani approved these changes May 14, 2025

View reviewed changes

baijumeswani merged commit c22b15a into rel-0.8.0 May 14, 2025
13 checks passed

baijumeswani deleted the ryanunderhill/rc3_cherry_picks branch May 14, 2025 16:38

natke changed the title ~~Ryanunderhill/rc3 cherry picks~~ 0.8.0 rc3 cherry picks Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.8.0 rc3 cherry picks #1475

0.8.0 rc3 cherry picks #1475

Uh oh!

RyanUnderhill commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

0.8.0 rc3 cherry picks #1475

0.8.0 rc3 cherry picks #1475

Uh oh!

Conversation

RyanUnderhill commented May 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants