Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

onnxruntime 1.17.0 - fp16 model of inswapper causing render issues #19437

Closed
henryruhs opened this issue Feb 6, 2024 · 13 comments
Closed

onnxruntime 1.17.0 - fp16 model of inswapper causing render issues #19437

henryruhs opened this issue Feb 6, 2024 · 13 comments
Assignees

Comments

@henryruhs
Copy link

henryruhs commented Feb 6, 2024

Describe the issue

Since we updated to onnxruntime==1.17.0 the float16 version of the inswapper model stopped working and causes broken results depending in the integration.

Falling back to onnxruntime==1.16.3 resolves the issue. It seems to be broken for CPU and CUDA but works for TensorRT.

Distorted face (cuda)

Broken1

Face box being black (cpu)

Broken2

To reproduce

I created a dedicated repository to reproduce the issue and convert the model.

https://github.com/henryruhs/onnxruntime-fp16-issue

Urgency

Not sure how to define urgency, but this effects thousand of users as our project (FaceFusion) is kinda popular.

Platform

Linux

OS Version

Ubuntu 22 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU, CUDA

Execution Provider Library Version

No response

@tianleiwu
Copy link
Contributor

tianleiwu commented Feb 6, 2024

I can reproduce the issue. Let me try dump node inputs/outputs and see which operator might cause result change.

@tianleiwu tianleiwu self-assigned this Feb 6, 2024
@tianleiwu
Copy link
Contributor

tianleiwu commented Feb 6, 2024

The difference is caused by #17953. Some Cast node is no longer removed in ORT 1.17.0.

For example, the Cast node after Mul that caused the overflow can be removed safely using offline tool:
image

The fix shall be done in the fp16 conversion tool, which shall not add such extra Cast nodes that might cause overflow issue. Some simple post processing like the following shall be enough:

import onnx
from onnxruntime.transformers.onnx_model import OnnxModel
onnx_model=OnnxModel(onnx.load("inswapper_128_fp16.onnx"))
onnx_model.remove_cascaded_cast_nodes()
onnx_model.save_model_to_file("inswapper_128_fp16_v2.onnx", use_external_data_format=False, all_tensors_to_one_file=True)

Snippet of example change of the run.py script that can run in ORT 1.17:

SWAPPER_MODEL_PATH = 'models/inswapper_128_fp16.onnx'
INSWAPPER_MATRIX = numpy_helper.to_array(onnx.load(SWAPPER_MODEL_PATH).graph.initializer[-1])

fixed_model_path = 'models/inswapper_128_fp16_fixed.onnx'
from onnxruntime.transformers.onnx_model import OnnxModel
onnx_model=OnnxModel(onnx.load(SWAPPER_MODEL_PATH))
onnx_model.remove_cascaded_cast_nodes()
onnx_model.save_model_to_file(fixed_model_path, use_external_data_format=False, all_tensors_to_one_file=True)

provider='CUDAExecutionProvider'
INSWAPPER = onnxruntime.InferenceSession(fixed_model_path, providers=[provider])

@tianleiwu
Copy link
Contributor

tianleiwu commented Feb 7, 2024

For CPU execution provider, it is better to run fp32 model. It is because CPU cannot run fp16 in most computation operators, and need to convert to fp32 to run those operators (You can save the optimized model from CPU provider, and you can understand that). If you run some benchmark, fp32 model shall be faster than fp16 model in CPU.

@henryruhs
Copy link
Author

Do you suggest waiting for a fix or should we re-convert the model with suggested tweaks?

@tianleiwu
Copy link
Contributor

tianleiwu commented Feb 12, 2024

@henryruhs, please re-convert the model. That's the fastest way to walkaround the issue.

The fix I mentioned shall be done in onnxconverter-common, and you can track the status of here: microsoft/onnxconverter-common#271.

@henryruhs
Copy link
Author

@tianleiwu We tried onnx_model.remove_cascaded_cast_nodes() but it does not work. Therefore we need to wait for a fix on your side. What about a compat mode for the time being?

@tianleiwu
Copy link
Contributor

@henryruhs, I verified that the issue was resolved using the code snippet I provided above. Note that the list of initializer might changed so you might need use name to find initializer for INSWAPPER_MATRIX.

@henryruhs
Copy link
Author

@tianleiwu Are you using the CUDA 12.2 version of onnxruntime 1.17.x ???

@tianleiwu
Copy link
Contributor

tianleiwu commented Mar 22, 2024

@tianleiwu Are you using the CUDA 12.2 version of onnxruntime 1.17.x ???

Yes. I think it shall be good with CUDA 11.8 version of ORT 1.17.x too when extra Cast nodes are removed from onnx model.

@henryruhs
Copy link
Author

henryruhs commented Apr 5, 2024

@tianleiwu First, thanks for your patience. I revisited the issue again and figured out that the "fixed" version does indeed work when using the "original / before fixed" model initializer.

That being said, it seems to mess up the shape or internals once the cascaded_cast_nodes have been removed.

source_embedding = numpy.dot(source_embedding, INSWAPPER_MATRIX) / numpy.linalg.norm(source_embedding)
ValueError: shapes (1,512) and (1,) not aligned: 512 (dim 1) != 1 (dim 0)

I can verify the report with this code - it works under CUDA 12:

INSWAPPER_MATRIX = numpy.load('./MODEL_INITIALIZER.npy')

File: https://github.com/henryruhs/onnxruntime-fp16-issue/raw/master/inswapper_initializer.npy

Not sure what we can do from here, I wish we could undo the changes from #17953.

@tianleiwu
Copy link
Contributor

You can get the initializer by name (assume that the initializer name does not change).

For #17953, we will keep it. Maybe we can add an option to remove cascaded cast nodes as a native optimizer so that user can use it (without using python script).

@henryruhs
Copy link
Author

I fixed it be adding the initializer back afterwards:

import onnx
from onnx import numpy_helper
from onnxruntime.transformers.onnx_model import OnnxModel

import numpy as np

PATH = '.assets/models/'
SWAPPER_MODEL_PATH = PATH + 'inswapper_128_fp16.onnx'

model = onnx.load(SWAPPER_MODEL_PATH)

onnx_model = OnnxModel(model)
onnx_model.remove_cascaded_cast_nodes()

onnx_initializer = model.graph.initializer[-1]
INSWAPPER_INITIALIZER = np.load("initializer.npy")

new_initializer_array = np.array(INSWAPPER_INITIALIZER)

model.graph.initializer.append(numpy_helper.from_array(new_initializer_array, name="initializer"))

onnx_model.save_model_to_file(PATH + "inswapper_128_fp16_v2.onnx", use_external_data_format=False, all_tensors_to_one_file=True)

@henryruhs
Copy link
Author

Thanks again for the support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants