Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于scheduler/dispatcher.py 125行处的bug #51

Closed
Menace-Dragon opened this issue Apr 12, 2022 · 11 comments
Closed

关于scheduler/dispatcher.py 125行处的bug #51

Menace-Dragon opened this issue Apr 12, 2022 · 11 comments

Comments

@Menace-Dragon
Copy link

项目很不错!但是我在跑ONNX官网model zoo 的 efficientnet-lite4-11.onnx 模型有报错。报错在scheduler/dispatcher.py 125行。分析了一下原因是这样:

  1. 该模型的graph里有这么一个流:···-->Conv-->BN-->Clip-->···。PPQ会默认 fuse ConvBN,但是fuse得到的operation 是 append 到 graph.operations末尾的。
  2. 在给Clip绑定platform时,会执行scheduler/dispatcher.py 125行的语句。

综合1、2,也就是说,此时dispatching_table 是没有ConvBN这个operation的信息的,就会导致报错。顺序上的问题,看作者您怎么解决为好

@ZhangZhiPku
Copy link
Collaborator

问题已经知悉,这似乎是上一次修改调度器逻辑留下的后遗症之一。
如你所见dispatcher.py 第125行处所执行的操作是强制激活函数与前序计算算子同平台,对于神经网络部署而言这是必须的。
我会尝试用一个补丁来修复上述问题,于此同时我认为你可以简单地注释掉125行附近的语句看看能不能运行成功。

对于efficientnet而言,我记得我们执行器算出来的结果好像跟onnx runtime的不太一样,虽然我还不知道是什么问题。

@ZhangZhiPku
Copy link
Collaborator

不对啊兄弟,我跑了一下这个模型,它跑的开开心心没啥毛病啊...
使用的代码如下所示:

from ppq import *
from ppq.api import *

d = [torch.rand(1, 224, 224, 3) for _ in range(32)]
quantize_onnx_model(
onnx_import_file='Models/cls_model/efficientnet-lite4-11.onnx',
calib_dataloader=d,
calib_steps=32,
input_shape=[1,224,224,3],
collate_fn=lambda x: x.to('cuda'),
)

模型下载于https://github.com/onnx/models/blob/main/vision/classification/efficientnet-lite4/model/efficientnet-lite4-11.onnx

@Menace-Dragon
Copy link
Author

额,这样的执行过程有涉及到ConvBN fuse吗(我不太了解PPQ细节haha),我的问题我发现是做了ConvBN fuse 会导致的

@ZhangZhiPku
Copy link
Collaborator

ZhangZhiPku commented Apr 12, 2022

好像做了吧...不融合BN应该报别的错误了...
要不瞅一眼你所下载的ppq是否是最新的版本?或者就先直接注释掉那几行应该就好了,你这个网络我看了并不需要做复杂调度,全网上int8就行了。

@Menace-Dragon
Copy link
Author

害嗨,我改了一下代码,判断一下dispatching_table有没有source_op然后分情况处理就好了。

我又遇到另外一个raise Excption:quantization/observer/range.py 254行报的,说torch.quantile can not handle such many values,这是啥意思勒

@Menace-Dragon
Copy link
Author

对了,能叨扰你罗列一下目前能支持(能跑完量化)的模型吗😣

@a1trl9
Copy link
Contributor

a1trl9 commented Apr 12, 2022

torch.quantile

torch.quantile 在 CPU 上有 tensor values 数量的限制,可以用 GPU 运行试试?或者换一下 observer 试试(

@ZhangZhiPku
Copy link
Collaborator

torch.quantile 函数只能处理16777216个数据的tensor,这个好像是pytorch自己写的有问题,这个函数会用来做percentile的observer,你using cuda kernel然后我们自己写了个quantile函数没有这个限制。然后正如a1trl9所说的,你也可以换一个observer,通过下列代码:

QUANT_SETTING = QuantizationSettingFactory.pplcuda_setting()
QUANT_SETTING.quantize_activation_setting.calib_algorithm = 'minmax'

@Menace-Dragon
Copy link
Author

👌👌

@ZhangZhiPku
Copy link
Collaborator

我们是按算子支持的哦,只要你的模型是由下列这些算子组成的,没有超出范围的算子就可以
'BatchNormalization_forward', 'Cast_forward', 'Clip_forward', 'Concat_forward',
'Constant_forward', 'ConstantOfShape_forward', 'Conv_forward', 'Eltwise_forward', 'Equal_forward',
'UnaryEltwise_forward', 'Expand_forward', 'Flatten_forward', 'Gather_forward', 'GatherND_forward', 'Gemm_forward',
'Grid_sampler_forward', 'AveragePool_forward', 'Greater_forward', 'Less_forward', 'MatMul_forward',
'MaxPool2d_forward', '_NMS_forward', 'NonZero_forward', 'Not_forward', 'Range_forward',
'ReduceL2_forward', 'ReduceMax_forward', 'Reshape_forward', 'Resize_forward', 'ScatterElements_forward',
'ScatterND_forward', 'Shape_forward', 'Slice_forward', 'Softmax_forward', 'Squeeze_forward', 'Tile_forward',
'TopK_forward', 'Transpose_forward', 'Unsqueeze_forward', 'Where_forward', 'ReduceSum_forward', 'ArgMax_forward',
'Split_forward', 'ReduceMean_forward', 'PRelu_forward', 'Pad_forward', 'LeakyRelu_forward', 'ConvTranspose_forward',
'Sqrt_forward', 'Log_forward', 'Floor_forward', 'RoiAlign_forward', 'MMCVRoiAlign_forward', 'SpaceToDepth_forward',
'DepthToSpace_forward', 'Tanh_forward', 'Pow_forward', 'Crop_forward', 'ChannelShuffle_forward',
'InstanceNormalization_forward', 'Parameter_forward', 'Interp_forward', 'CaffeArgMax_forward'

不管你中间怎么连接的这些算子,ppq都可以正常量化(应该吧),对于复杂的模型ppq也会自动切图调度。

然后onnx model zoo里面有些模型有loop算子或者 if算子,这种我们不支持,后端也大概率支持不了
总体来说就是我们能量化的网络肯定比后端能跑的多,但这些网络量化出来并没有什么价值。

你需要避免网络中出现那些看起来就很不可思议的算子,比如swish, elu, sigmoid, tanh, if, loop, lstm, gru, nms, resize,后端的支持能力比我们弱得多,你的网络要真正能部署在硬件上才有价值哦。

@Menace-Dragon
Copy link
Author

好的,感谢👍👍👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants