onnxruntime 1.17.0 - fp16 model of inswapper causing render issues #19437

henryruhs · 2024-02-06T10:39:12Z

Describe the issue

Since we updated to onnxruntime==1.17.0 the float16 version of the inswapper model stopped working and causes broken results depending in the integration.

Falling back to onnxruntime==1.16.3 resolves the issue. It seems to be broken for CPU and CUDA but works for TensorRT.

Distorted face (cuda)

Face box being black (cpu)

To reproduce

I created a dedicated repository to reproduce the issue and convert the model.

https://github.com/henryruhs/onnxruntime-fp16-issue

Urgency

Not sure how to define urgency, but this effects thousand of users as our project (FaceFusion) is kinda popular.

Platform

Linux

OS Version

Ubuntu 22 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU, CUDA

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

tianleiwu · 2024-02-06T21:22:41Z

I can reproduce the issue. Let me try dump node inputs/outputs and see which operator might cause result change.

tianleiwu · 2024-02-06T23:33:15Z

The difference is caused by #17953. Some Cast node is no longer removed in ORT 1.17.0.

For example, the Cast node after Mul that caused the overflow can be removed safely using offline tool:

The fix shall be done in the fp16 conversion tool, which shall not add such extra Cast nodes that might cause overflow issue. Some simple post processing like the following shall be enough:

import onnx
from onnxruntime.transformers.onnx_model import OnnxModel
onnx_model=OnnxModel(onnx.load("inswapper_128_fp16.onnx"))
onnx_model.remove_cascaded_cast_nodes()
onnx_model.save_model_to_file("inswapper_128_fp16_v2.onnx", use_external_data_format=False, all_tensors_to_one_file=True)

Snippet of example change of the run.py script that can run in ORT 1.17:

SWAPPER_MODEL_PATH = 'models/inswapper_128_fp16.onnx'
INSWAPPER_MATRIX = numpy_helper.to_array(onnx.load(SWAPPER_MODEL_PATH).graph.initializer[-1])

fixed_model_path = 'models/inswapper_128_fp16_fixed.onnx'
from onnxruntime.transformers.onnx_model import OnnxModel
onnx_model=OnnxModel(onnx.load(SWAPPER_MODEL_PATH))
onnx_model.remove_cascaded_cast_nodes()
onnx_model.save_model_to_file(fixed_model_path, use_external_data_format=False, all_tensors_to_one_file=True)

provider='CUDAExecutionProvider'
INSWAPPER = onnxruntime.InferenceSession(fixed_model_path, providers=[provider])

tianleiwu · 2024-02-07T00:06:28Z

For CPU execution provider, it is better to run fp32 model. It is because CPU cannot run fp16 in most computation operators, and need to convert to fp32 to run those operators (You can save the optimized model from CPU provider, and you can understand that). If you run some benchmark, fp32 model shall be faster than fp16 model in CPU.

henryruhs · 2024-02-12T01:02:02Z

Do you suggest waiting for a fix or should we re-convert the model with suggested tweaks?

tianleiwu · 2024-02-12T18:43:57Z

@henryruhs, please re-convert the model. That's the fastest way to walkaround the issue.

The fix I mentioned shall be done in onnxconverter-common, and you can track the status of here: microsoft/onnxconverter-common#271.

henryruhs · 2024-03-22T11:05:54Z

@tianleiwu We tried onnx_model.remove_cascaded_cast_nodes() but it does not work. Therefore we need to wait for a fix on your side. What about a compat mode for the time being?

tianleiwu · 2024-03-22T16:26:34Z

@henryruhs, I verified that the issue was resolved using the code snippet I provided above. Note that the list of initializer might changed so you might need use name to find initializer for INSWAPPER_MATRIX.

henryruhs · 2024-03-22T16:31:15Z

@tianleiwu Are you using the CUDA 12.2 version of onnxruntime 1.17.x ???

tianleiwu · 2024-03-22T22:55:13Z

@tianleiwu Are you using the CUDA 12.2 version of onnxruntime 1.17.x ???

Yes. I think it shall be good with CUDA 11.8 version of ORT 1.17.x too when extra Cast nodes are removed from onnx model.

henryruhs · 2024-04-05T13:05:58Z

@tianleiwu First, thanks for your patience. I revisited the issue again and figured out that the "fixed" version does indeed work when using the "original / before fixed" model initializer.

That being said, it seems to mess up the shape or internals once the cascaded_cast_nodes have been removed.

source_embedding = numpy.dot(source_embedding, INSWAPPER_MATRIX) / numpy.linalg.norm(source_embedding)
ValueError: shapes (1,512) and (1,) not aligned: 512 (dim 1) != 1 (dim 0)

I can verify the report with this code - it works under CUDA 12:

INSWAPPER_MATRIX = numpy.load('./MODEL_INITIALIZER.npy')

File: https://github.com/henryruhs/onnxruntime-fp16-issue/raw/master/inswapper_initializer.npy

Not sure what we can do from here, I wish we could undo the changes from #17953.

tianleiwu · 2024-04-05T15:38:04Z

You can get the initializer by name (assume that the initializer name does not change).

For #17953, we will keep it. Maybe we can add an option to remove cascaded cast nodes as a native optimizer so that user can use it (without using python script).

henryruhs · 2024-04-05T18:15:39Z

I fixed it be adding the initializer back afterwards:

import onnx
from onnx import numpy_helper
from onnxruntime.transformers.onnx_model import OnnxModel

import numpy as np

PATH = '.assets/models/'
SWAPPER_MODEL_PATH = PATH + 'inswapper_128_fp16.onnx'

model = onnx.load(SWAPPER_MODEL_PATH)

onnx_model = OnnxModel(model)
onnx_model.remove_cascaded_cast_nodes()

onnx_initializer = model.graph.initializer[-1]
INSWAPPER_INITIALIZER = np.load("initializer.npy")

new_initializer_array = np.array(INSWAPPER_INITIALIZER)

model.graph.initializer.append(numpy_helper.from_array(new_initializer_array, name="initializer"))

onnx_model.save_model_to_file(PATH + "inswapper_128_fp16_v2.onnx", use_external_data_format=False, all_tensors_to_one_file=True)

henryruhs · 2024-04-05T18:15:54Z

Thanks again for the support

tianleiwu self-assigned this Feb 6, 2024

tianleiwu mentioned this issue Feb 7, 2024

Extra Cast nodes causes overflow in onnxruntime 1.17 microsoft/onnxconverter-common#271

Closed

sophies927 added the release:1.17.0 label Feb 8, 2024

henryruhs closed this as completed Apr 5, 2024

tianleiwu mentioned this issue Jul 18, 2024

[Performance] Get nan value when I block all the node in fp16 conversion #21345

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onnxruntime 1.17.0 - fp16 model of inswapper causing render issues #19437

onnxruntime 1.17.0 - fp16 model of inswapper causing render issues #19437

henryruhs commented Feb 6, 2024 •

edited

Loading

tianleiwu commented Feb 6, 2024 •

edited

Loading

tianleiwu commented Feb 6, 2024 •

edited

Loading

tianleiwu commented Feb 7, 2024 •

edited

Loading

henryruhs commented Feb 12, 2024

tianleiwu commented Feb 12, 2024 •

edited

Loading

henryruhs commented Mar 22, 2024

tianleiwu commented Mar 22, 2024

henryruhs commented Mar 22, 2024

tianleiwu commented Mar 22, 2024 •

edited

Loading

henryruhs commented Apr 5, 2024 •

edited

Loading

tianleiwu commented Apr 5, 2024

henryruhs commented Apr 5, 2024

henryruhs commented Apr 5, 2024

onnxruntime 1.17.0 - fp16 model of inswapper causing render issues #19437

onnxruntime 1.17.0 - fp16 model of inswapper causing render issues #19437

Comments

henryruhs commented Feb 6, 2024 • edited Loading

Describe the issue

Distorted face (cuda)

Face box being black (cpu)

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

tianleiwu commented Feb 6, 2024 • edited Loading

tianleiwu commented Feb 6, 2024 • edited Loading

tianleiwu commented Feb 7, 2024 • edited Loading

henryruhs commented Feb 12, 2024

tianleiwu commented Feb 12, 2024 • edited Loading

henryruhs commented Mar 22, 2024

tianleiwu commented Mar 22, 2024

henryruhs commented Mar 22, 2024

tianleiwu commented Mar 22, 2024 • edited Loading

henryruhs commented Apr 5, 2024 • edited Loading

tianleiwu commented Apr 5, 2024

henryruhs commented Apr 5, 2024

henryruhs commented Apr 5, 2024

henryruhs commented Feb 6, 2024 •

edited

Loading

tianleiwu commented Feb 6, 2024 •

edited

Loading

tianleiwu commented Feb 6, 2024 •

edited

Loading

tianleiwu commented Feb 7, 2024 •

edited

Loading

tianleiwu commented Feb 12, 2024 •

edited

Loading

tianleiwu commented Mar 22, 2024 •

edited

Loading

henryruhs commented Apr 5, 2024 •

edited

Loading