Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse onnx model failed #346

Closed
nanmi opened this issue Jan 6, 2023 · 6 comments
Closed

parse onnx model failed #346

nanmi opened this issue Jan 6, 2023 · 6 comments

Comments

@nanmi
Copy link

nanmi commented Jan 6, 2023

problem:

kls@ubuntu:~/workspace$ ~/libraries/TensorRT-8.4.1.5/bin/trtexec --onnx=./unet-q.onnx --saveEngine=unet-int8.engine
&&&& RUNNING TensorRT.trtexec [TensorRT v8401] # /home/kls/libraries/TensorRT-8.4.1.5/bin/trtexec --onnx=./unet-q.onnx --saveEngine=unet-int8.engine
[01/06/2023-17:32:20] [I] === Model Options ===
[01/06/2023-17:32:20] [I] Format: ONNX
[01/06/2023-17:32:20] [I] Model: ./unet-q.onnx
[01/06/2023-17:32:20] [I] Output:
[01/06/2023-17:32:20] [I] === Build Options ===
[01/06/2023-17:32:20] [I] Max batch: explicit batch
[01/06/2023-17:32:20] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[01/06/2023-17:32:20] [I] minTiming: 1
[01/06/2023-17:32:20] [I] avgTiming: 8
[01/06/2023-17:32:20] [I] Precision: FP32
[01/06/2023-17:32:20] [I] LayerPrecisions: 
[01/06/2023-17:32:20] [I] Calibration: 
[01/06/2023-17:32:20] [I] Refit: Disabled
[01/06/2023-17:32:20] [I] Sparsity: Disabled
[01/06/2023-17:32:20] [I] Safe mode: Disabled
[01/06/2023-17:32:20] [I] DirectIO mode: Disabled
[01/06/2023-17:32:20] [I] Restricted mode: Disabled
[01/06/2023-17:32:20] [I] Build only: Disabled
[01/06/2023-17:32:20] [I] Save engine: unet-int8.engine
[01/06/2023-17:32:20] [I] Load engine: 
[01/06/2023-17:32:20] [I] Profiling verbosity: 0
[01/06/2023-17:32:20] [I] Tactic sources: Using default tactic sources
[01/06/2023-17:32:20] [I] timingCacheMode: local
[01/06/2023-17:32:20] [I] timingCacheFile: 
[01/06/2023-17:32:20] [I] Input(s)s format: fp32:CHW
[01/06/2023-17:32:20] [I] Output(s)s format: fp32:CHW
[01/06/2023-17:32:20] [I] Input build shapes: model
[01/06/2023-17:32:20] [I] Input calibration shapes: model
[01/06/2023-17:32:20] [I] === System Options ===
[01/06/2023-17:32:20] [I] Device: 0
[01/06/2023-17:32:20] [I] DLACore: 
[01/06/2023-17:32:20] [I] Plugins:
[01/06/2023-17:32:20] [I] === Inference Options ===
[01/06/2023-17:32:20] [I] Batch: Explicit
[01/06/2023-17:32:20] [I] Input inference shapes: model
[01/06/2023-17:32:20] [I] Iterations: 10
[01/06/2023-17:32:20] [I] Duration: 3s (+ 200ms warm up)
[01/06/2023-17:32:20] [I] Sleep time: 0ms
[01/06/2023-17:32:20] [I] Idle time: 0ms
[01/06/2023-17:32:20] [I] Streams: 1
[01/06/2023-17:32:20] [I] ExposeDMA: Disabled
[01/06/2023-17:32:20] [I] Data transfers: Enabled
[01/06/2023-17:32:20] [I] Spin-wait: Disabled
[01/06/2023-17:32:20] [I] Multithreading: Disabled
[01/06/2023-17:32:20] [I] CUDA Graph: Disabled
[01/06/2023-17:32:20] [I] Separate profiling: Disabled
[01/06/2023-17:32:20] [I] Time Deserialize: Disabled
[01/06/2023-17:32:20] [I] Time Refit: Disabled
[01/06/2023-17:32:20] [I] Inputs:
[01/06/2023-17:32:20] [I] === Reporting Options ===
[01/06/2023-17:32:20] [I] Verbose: Disabled
[01/06/2023-17:32:20] [I] Averages: 10 inferences
[01/06/2023-17:32:20] [I] Percentile: 99
[01/06/2023-17:32:20] [I] Dump refittable layers:Disabled
[01/06/2023-17:32:20] [I] Dump output: Disabled
[01/06/2023-17:32:20] [I] Profile: Disabled
[01/06/2023-17:32:20] [I] Export timing to JSON file: 
[01/06/2023-17:32:20] [I] Export output to JSON file: 
[01/06/2023-17:32:20] [I] Export profile to JSON file: 
[01/06/2023-17:32:20] [I] 
[01/06/2023-17:32:20] [I] === Device Information ===
[01/06/2023-17:32:20] [I] Selected Device: NVIDIA A10
[01/06/2023-17:32:20] [I] Compute Capability: 8.6
[01/06/2023-17:32:20] [I] SMs: 72
[01/06/2023-17:32:20] [I] Compute Clock Rate: 1.695 GHz
[01/06/2023-17:32:20] [I] Device Global Memory: 22731 MiB
[01/06/2023-17:32:20] [I] Shared Memory per SM: 100 KiB
[01/06/2023-17:32:20] [I] Memory Bus Width: 384 bits (ECC enabled)
[01/06/2023-17:32:20] [I] Memory Clock Rate: 6.251 GHz
[01/06/2023-17:32:20] [I] 
[01/06/2023-17:32:20] [I] TensorRT version: 8.4.1
[01/06/2023-17:32:21] [I] [TRT] [MemUsageChange] Init CUDA: CPU +535, GPU +0, now: CPU 542, GPU 499 (MiB)
[01/06/2023-17:32:21] [I] Start parsing network model
[01/06/2023-17:32:21] [I] [TRT] ----------------------------------------------------------------
[01/06/2023-17:32:21] [I] [TRT] Input filename:   ./unet-q.onnx
[01/06/2023-17:32:21] [I] [TRT] ONNX IR version:  0.0.7
[01/06/2023-17:32:21] [I] [TRT] Opset version:    13
[01/06/2023-17:32:21] [I] [TRT] Producer name:    PPL Quantization Tool
[01/06/2023-17:32:21] [I] [TRT] Producer version: 
[01/06/2023-17:32:21] [I] [TRT] Domain:           
[01/06/2023-17:32:21] [I] [TRT] Model version:    0
[01/06/2023-17:32:21] [I] [TRT] Doc string:       
[01/06/2023-17:32:21] [I] [TRT] ----------------------------------------------------------------
[01/06/2023-17:32:21] [E] [TRT] ModelImporter.cpp:720: While parsing node number 23 [QuantizeLinear -> "PPQ_Variable_297"]:
[01/06/2023-17:32:21] [E] [TRT] ModelImporter.cpp:721: --- Begin node ---
[01/06/2023-17:32:21] [E] [TRT] ModelImporter.cpp:722: input: "outc.conv.weight"
input: "PPQ_Variable_295"
input: "PPQ_Variable_296"
output: "PPQ_Variable_297"
name: "PPQ_Operation_98"
op_type: "QuantizeLinear"
attribute {
  name: "axis"
  i: 0
  type: INT
}

[01/06/2023-17:32:21] [E] [TRT] ModelImporter.cpp:723: --- End node ---
[01/06/2023-17:32:21] [E] [TRT] ModelImporter.cpp:726: ERROR: builtin_op_importers.cpp:1096 In function QuantDequantLinearHelper:
[6] Assertion failed: axis == INVALID_AXIS && "Quantization axis attribute is not valid with a single quantization scale"
[01/06/2023-17:32:21] [E] Failed to parse onnx file
[01/06/2023-17:32:21] [I] Finish parsing network model
[01/06/2023-17:32:21] [E] Parsing model failed
[01/06/2023-17:32:21] [E] Failed to create engine from model or file.
[01/06/2023-17:32:21] [E] Engine set up failed
@ZhangZhiPku
Copy link
Collaborator

这个问题出现的原因在于QuantizeLinear这个算子的定义并不被广泛认可。你可以查阅下面的文档内容:
https://github.com/onnx/onnx/blob/main/docs/Operators.md#QuantizeLinear

QuantizeLinear 这个算子需要一个 axis 属性,用来指定量化行为发生的轴。对于 per channel 量化而言这是无可厚非的,但对于激活量化(per tensor)量化而言,它们没有一个有效的 axis 属性,这个时候 onnx 对他的定义是 axis 属性可以被忽略。

然而不同的厂家推理框架写的时候压根不遵循这个标准,比如 openvino 它不管你是 per channel 的还是 per tensor 的,它都要有一个 axis 属性。但是 tensorRT,也就是你现在的这个错误,它显然不希望你有这个 axis 属性。

对于这个问题,你可以手动修改 ppq.parser.onnxruntime_exporter.py,其中函数 insert_quantize_node, insert_dequantize_node 负责插入网络中的 Quantize 与 Dequantize 节点,其中包含代码:

  if config.policy.has_property(QuantizationProperty.PER_CHANNEL):
      created.attributes['axis'] = config.channel_axis
  else: created.attributes['axis'] = None

你可以将 else 子句直接删除,这样 PPQ 将不会对 Per tensor 的量化写入 axis 属性。

@nanmi
Copy link
Author

nanmi commented Jan 9, 2023

我把 insert_quantize_node, insert_dequantize_node中的per_tensor量化axis注释掉,但是还是会生成axis attribute
unet model

@ZhangZhiPku
Copy link
Collaborator

所以它是 per channel 的?

@nanmi
Copy link
Author

nanmi commented Jan 9, 2023

所以它是 per channel 的?

可能是这样的,但是tensorRT似乎对于QuantizeLinear只要有 axis 属性就无法解析,[还未验证]

@nanmi
Copy link
Author

nanmi commented Jan 9, 2023

所以它是 per channel 的?

我发现是激活全部用per tensor,权重是用的per channel,我还不太清楚如何设置全部使用per tensor模式

@ZhangZhiPku
Copy link
Collaborator

ZhangZhiPku commented Jan 9, 2023

如果你想全部设置成 per tensor,则需要进入 ppq.quantization.quantizer 文件夹,找到你所使用的 quantizer 并进行修改,它并不是很困难。

另外如果你使用 tensorRT 部署的话,新的 ppq 好像是通过 onnx+json 的方式传递量化信息的,而你似乎是使用 qdq+onnx 的方式完成部署的,这中间是发生了什么故事吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants