response map fusion implementation #77

meiqua · 2020-05-05T03:01:10Z

Motivation

According to Halide paper, fusion can improve the creation of response map a lot. However, configing Halide is not an easy job, and our response map don't need many features of Halide too. So implementing a simple version of tile-based fusion method is preferred. This is also what opencv4 is doing.

Related issues

Current works

Currently, a simple tile-based fusion pipeline is implemented, and gaussian / sobel / mag / phase / hist / spread ... is finished and tested. Refer to fusion by hand branch for more info. The basic idea is implementing tile-based fusion only, and do the compiling stuff of Halide by hand... Though it seems not as fancy as Halide, it simplifies jobs a lot and is easy to use too.

Results and TODOs

The speed is roughly 10x faster than using opencv. We will use it to create response map in the future.

See test_fusion.cpp for more examples. Also, Any discussion, test, or improvements are welcomed!

Update

Now we pass all tests and match function can be used as usual! It's about 6x faster for full pipeline of creating response map, and no need to crop images to 16n as before.

Update

Now rgb image is also supported, by cvtColor first. After investigating many solutions, we found using opencv is the cleanest way... Compared with using gray image, cvtColor only cost ~5% more.

DennisLiu-elogic · 2020-05-11T09:35:00Z

meiqua大又是我冏

馬上試了下手工fusion，斷在這

圖像同那張很多愛心的

VS進階指令集選SSE2

若選AVX2則斷不同地方

meiqua · 2020-05-11T09:58:07Z

一般来说是因为MIPP在有些指令集上没有实现函数。先关掉能正常跑吗

DennisLiu-elogic · 2020-05-11T10:02:39Z

一般来说是因为MIPP在有些指令集上没有实现函数。先关掉能正常跑吗

換地方了

meiqua · 2020-05-11T10:08:37Z

什么报错？

DennisLiu-elogic · 2020-05-11T10:13:21Z

什么报错？

發現op_row給錯了，改成5之後

meiqua · 2020-05-11T10:31:20Z

这个是最新的代码直接跑的吗？我找个win笔记本试试

DennisLiu-elogic · 2020-05-11T10:37:14Z

fusion by hand branch

這一個fusion.h，改了點指標用到gauss_size的地方讓VS編譯過，MIPP也是從這來的

meiqua · 2020-05-11T14:00:02Z

@DennisLiu-elogic 我试了下，gauss_size那用vector，SIMD关掉可以跑呀。
用SIMD的话，除了AVX2都挂了。。我看看怎么把MIPP没定义的都补全

DennisLiu-elogic · 2020-05-11T15:23:00Z

@DennisLiu-elogic 我试了下，gauss_size那用vector，SIMD关掉可以跑呀。
用SIMD的话，除了AVX2都挂了。。我看看怎么把MIPP没定义的都补全

這麼奇怪，int32_t* parent_buf_ptr [gauss_size] --- > int32_t* parent_buf_ptr [5]導致不開simd也會錯...?

aemior · 2020-05-19T12:21:44Z

@meiqua RGB图的fusion最近有计划更新吗？

meiqua · 2020-05-20T13:26:32Z

@aemior 我打算先把这个SIMD的问题解决掉，然后做RGB2GRAY的fusion。RGB的fusion有点麻烦，感觉不是很必要。

aemior · 2020-05-22T02:19:54Z

@meiqua 好的，我这边做的RGB的pipline，RGB的化如果涉及不同目标的自然场景的检测应该能提高精度，工业场景确实没必要

mangoeffect · 2020-05-22T03:25:09Z

您好，我测试了一些fusion,在vs上无法编译通过呢

这么定义数组可以吗？
参数-Wno-sign-compare在vs上又是无效的

meiqua · 2020-05-24T06:37:26Z

@mangosroom VS编译器不支持变量数组，新commit改成vector可用

mangoeffect · 2020-05-24T07:10:03Z

嗯嗯，我也是这么改的，算法层代码最好还是写标准的c++

meiqua · 2020-05-31T15:11:45Z

@DennisLiu-elogic 现在SSE2应该能跑了。之前测的结果是SSE4 AVX2可以

DennisLiu-elogic · 2020-06-04T03:14:58Z

@DennisLiu-elogic 现在SSE2应该能跑了。之前测的结果是SSE4 AVX2可以
可以幫我顯示下line2Dup.h .cpp的改動嗎?

DennisLiu-elogic · 2020-06-04T06:40:12Z

中斷在這

roi的x, y都是-4，這樣呼叫.ptr ()一定會錯的吧?

meiqua · 2020-06-04T13:02:55Z

这两个文件没改，改的是MIPP，增加了mul<int32_t> abs<int32_t> cvt<int16_t,int32_t>
如果没有这个debug assert没问题，因为之后有范围判断。可以把这句加在范围判断之后，或者直接用in.at(r, c)

DennisLiu-elogic · 2020-06-05T17:09:50Z

这两个文件没改，改的是MIPP，增加了mul<int32_t> abs<int32_t> cvt<int16_t,int32_t>
如果没有这个debug assert没问题，因为之后有范围判断。可以把这句加在范围判断之后，或者直接用in.at(r, c)

居然沒注意到後面有判斷...
不過copyToBound這段if寫在for裡面有點浪費時間，應該可以先讓out填充0，再根據roi填值吧?
還是有什麼我沒注意到的地方

meiqua · 2020-06-06T11:14:28Z

先填0不如这个快，因为会多一遍写入。不过这里不是hot path，时间差不了多少。

DennisLiu-elogic · 2020-06-08T09:51:52Z

更新了fusion branch的line2Dup.h .cpp，走原匹配流程
不用simd的話，
，

out_hearder這個陣列越界了

用simd

test_fusion.cpp跑起來是沒問題的
----更正
test_fusion.cpp，設use_simd=true的話

meiqua · 2020-06-08T11:51:33Z

如果use_simd = true，但没有配置SIMD确实会出错；use_simd = false这个我跑的没问题，是用的最新的代码吗？

DennisLiu-elogic · 2020-06-08T16:16:24Z

如果use_simd = true，但没有配置SIMD确实会出错；use_simd = false这个我跑的没问题，是用的最新的代码吗？

我沒講清楚，Visual Studio編譯器選項都是有開SSE2的，調整的只有use_simd

所以反而是test_fusion在use_simd=true，編譯器選項開SSE2時會報錯
use_simd=false，編譯器開SSE2時正常

fusion.h是新代碼沒錯

新版的line2Dup.h .cpp是用原版的test.cpp的angle_test()測試的，這部分沒有更新到，明天試試

-----0609
檢查了下angle_test ()，只有更新旋轉模板的部分(use_rot)，我這邊已是新的代碼

--
use_simd=false，編譯器也關掉
在高斯node這邊，r=8時out_header的size不對，r=其他值的時候都正常

meiqua · 2020-06-14T03:03:41Z

确实会越界，应该加上条件。之前之所以还能正常跑，是因为越界的时候刚好没用这个值，然后编译器也不会做越界检查。

DennisLiu-elogic · 2020-06-15T02:22:38Z

确实会越界，应该加上条件。之前之所以还能正常跑，是因为越界的时候刚好没用这个值，然后编译器也不会做越界检查。

這個加了檢查後沒問題

但在use_simd=true且編譯器開啟SSE2時還是會報錯。

update_simd ()中的dxint16.r = 0時

測試圖檔
https://drive.google.com/file/d/1FTuiw5dEgCmpNi3bnPTc8QwAmcVS0zFu/view?usp=sharing

meiqua · 2020-06-15T11:22:09Z

什么报错？

DennisLiu-elogic · 2020-06-15T14:30:13Z

看callStack順序是這樣
748行

meiqua · 2020-06-15T15:33:42Z

看起来是未定义low<int16_t>，但其实已经在这里定义过了。这应该会在use_simd=true，同时没有配置SSE2时发生；确定SSE2开了吗？可以跑mipp_test()看看

DennisLiu-elogic · 2020-06-16T02:12:59Z

原來是我的電腦SSE2開了沒作用，AVX2才有...何解?

meiqua · 2020-06-16T11:06:40Z

MIPP通过这里的宏进入SSE分支，不太清楚VS编译器定义了没。

XuleiTao · 2020-06-25T03:38:26Z

我用vs也是只能用avx2，但cpu不支持avx指令集，这个怎么使用MIPP呢？看MIPP那里是支持SSE的。

meiqua · 2020-06-25T13:48:26Z

也是上面说的问题吗，开SSE但MIPP没进入SSE分支？

meiqua · 2020-06-25T14:08:22Z

搜了下，还真是这样:

According to their documentation (msdn.microsoft.com/en-us/library/b0084kay.aspx), Visual Studio doesn’t set the SSEn macros (but they do set AVX and AVX2). – Stephen Canon May 22 '14 at 15:27
Typical, I suppose - everybody else defines the SSEn macros, but not Microsoft. – Paul R May 22 '14 at 15:39

试试这个branch解决了没

XuleiTao · 2020-06-25T16:44:28Z

好像还不行，我这里用的x86编译。看VS里的说明是：只有x86体系结构生成程序时，SSE、SSE2才可用

meiqua · 2020-06-25T23:46:08Z

这个关系不大。SSE2的时候应该把__SSE__的宏也加上，改了下，再试试？

XuleiTao · 2020-06-26T15:10:00Z

可以用了，赞。不过，我测试感觉在VS上，使用MIPP的效果不明显。
测试，模板特征点数都是128

未加MIPP那份代码：我在梯度扩散，梯度响应那里加了两句OpenMP，加速了大概10ms（130ms->120ms）。matchClass那里用你提供的那段并行，提高大概20ms(30ms->10ms)。图像：200w（1600x1200）；CPU:i7-6700；VS2015
有MIPP的master那份代码，开启了AVX2，梯度响应那块大概耗时是110-120ms，匹配大概10ms。图像：200w（1600x1200）；CPU:i7-6700；VS2015

不过，这个在linux上跑很快，设置padding=500，像素大于200w的，大概总耗时80ms。CPU:i7-8700
同样参数下，VS2015，AVX2耗时大概150ms。

然后，VS2017，AVX2，CPU:i5-6300，同样master那份，padding=500，耗时大概280ms。

CPU: i3，VS2017，图像：200w。对比了有MIPP那份代码和未加MIPP的代码，有MIPP的开启了SSE2，耗时大概300-400ms；未加MIPP的耗时也差不多300ms，平均稍快一点儿。

然后，fusion那份代码，（1）图像200w，VS2017，SSE2，CPU: i3，开闭AVX2的耗时都大概100-110ms。（2）图像200w，VS2015，CPU: i7-6700，开闭AVX2的耗时都大概80ms。

这个环境用的有点乱，但VS上使用MIPP速度没怎么提升，Linux上提升明显。看MIPP那里的说明，是需要升级到VS2019吗？
On msvc 14.10 (Microsoft Visual Studio 2017), the performances are reduced compared to the other compilers, the compiler is not able to fully inline all the MIPP methods. This has been fixed on msvc 14.21 (Microsoft Visual Studio 2019) and now you can expect high performances.

meiqua · 2020-06-27T02:50:21Z

MIPP相对最开始SSE实现对速度提升应该不大，是为了在arm上能用加的；linux平台下快一点是有可能的，一是opencv可能不同版本、不同编译选项下的速度不一样，二是可能像这里说的inline做的更好。

XuleiTao · 2020-06-28T02:12:04Z

哦哦。fusion那份代码跑200w像素的图片，用时大概70-80ms，CPU：i7，OpenCV：3.4.6；这个属于正常吗？

meiqua · 2020-06-28T11:28:35Z

不正常，我在ubuntu16.04 i7跑的20ms。可以把这行改成false先关掉MIPP看看是不是inline的问题，我关掉后大概40ms。

XuleiTao · 2020-06-28T15:33:07Z

自带的图像，padding=500，在ubuntu16.04 i7上跑也是20ms，关了MIPP大概50ms。
VS2015 开或关掉MIPP都大概是60-70ms。现在这个CPU：i7-8700，比之前那个i7-6700的80ms快点。难道是VS的问题？需要VS升级一下？

meiqua · 2020-06-28T15:45:47Z

看起来是这样，因为fusion的代码没调用opencv，那可能就是编译器优化不够了。

XuleiTao · 2020-06-28T15:55:06Z

嗯嗯，之后找个装VS2017的电脑试试。感谢感谢。

XuleiTao · 2020-06-30T15:20:03Z

VS2017对速度提升是有效的。看来VS2015对MIPP也是不支持的。

zzqusst · 2021-11-23T08:03:10Z

单张图像内，多个模板实例，需要加上 cv_dnn_nms::NMSBoxes，设置好重叠率，然后再做ICP 配准

wiekern · 2022-05-06T08:21:15Z

测试图片1200x1200
训练 padding=100，角度[-60,60]每一度一个共计121个模板，尺度只有1个（line2Dup::Detector detector(128, {4});）
测试 padding=250，只取top1，stride=16
CPU: Intel Xeon E3-1270 支持AVX2指令集
系统: Win11
编译环境：QT creator（有在定义#define SSE2 后加#praga message打印，可以看到编译时进了这个逻辑因此开记了SSE2）、Qt_6_2_4_MinGW_64，默认release版本开启了 O2优化（从编译输出看到 g++ -c -fno-keep-inline-dllexport -O2）
使用分支: fusion_fix_memo
耗时如下，基本250ms左右，达不到上面提到200W像素70-80ms，不知道哪里没设置对？还请指教，感谢！

----------thread 1---------
bgr2gray: 2.3253ms
gauss1x5: 8.8146ms
gauss5x1: 8.4448ms
sobel1x3_sxx_syx: 1.5282ms
sobel3x1_sxy_syy: 1.4955ms
mag_phase_quant1x1: 15.6051ms
hist3x3: 47.7554ms
spread1xn: 0.595ms
spreadnx1: 1.6778ms
response1x1: 1.9622ms
linearizeTxT: 17.3772ms
-----------------------------------------
fusion time
elasped time:0.114451s

match time
elasped time:0.138171s

wiekern · 2022-05-06T09:58:38Z

使用 fusion_by_hand 分支跑了一下测试程序，结果如下：第一次打印的 fusion 耗时严重

MIPP tests
----------

Instr. type:       SSE
Instr. full type:  SSE3
Instr. version:    3
Instr. size:       128 bits
Instr. lanes:      1
64-bit support:    yes
Byte/word support: yes
in this SIMD, int8 max is not inplemented by MIPP
in this SIMD, int8 shuff is not inplemented by MIPP
----------

test img size: 2356800

fusion time
elasped time:0.100045s

fusion time
elasped time:0.0262209s

match time
elasped time:0.027269s

match total time
elasped time:0.156801s

matches.size(): 7

match.template_id: 340
match.similarity: 100

DennisLiu1993 · 2022-05-08T13:04:18Z

@wiekern
@zzqusst
@XuleiTao
各位可以參考我的github，這裡有個shaped matching的替代方案，可以替換某些應用場域
https://github.com/DennisLiu1993/Fastest_Image_Pattern_Matching

meiqua added the enhancement New feature or request label May 5, 2020

This was referenced May 5, 2020

還有辦法再加速嗎? #54

Open

How to speed up the creation of responsemap? #21

Open

meiqua mentioned this issue Jun 1, 2020

branch有些多了，能否写个文档介绍一下各branch的功能，谢谢了 #84

Open

meiqua pinned this issue Aug 20, 2020

meiqua mentioned this issue Oct 25, 2020

fusion test exception issue #108

Open

response map fusion implementation #77

response map fusion implementation #77

Comments

meiqua commented May 5, 2020 • edited

Motivation

Related issues

Current works

Results and TODOs

Update

Update

DennisLiu-elogic commented May 11, 2020 • edited

VS進階指令集選SSE2

meiqua commented May 11, 2020

DennisLiu-elogic commented May 11, 2020

meiqua commented May 11, 2020

DennisLiu-elogic commented May 11, 2020

meiqua commented May 11, 2020

DennisLiu-elogic commented May 11, 2020

meiqua commented May 11, 2020 • edited

DennisLiu-elogic commented May 11, 2020

aemior commented May 19, 2020

meiqua commented May 20, 2020

aemior commented May 22, 2020

mangoeffect commented May 22, 2020

meiqua commented May 24, 2020

mangoeffect commented May 24, 2020

meiqua commented May 31, 2020

DennisLiu-elogic commented Jun 4, 2020

DennisLiu-elogic commented Jun 4, 2020 • edited

meiqua commented Jun 4, 2020

DennisLiu-elogic commented Jun 5, 2020

meiqua commented Jun 6, 2020

DennisLiu-elogic commented Jun 8, 2020 • edited

meiqua commented Jun 8, 2020

DennisLiu-elogic commented Jun 8, 2020 • edited

fusion.h是新代碼沒錯

meiqua commented Jun 14, 2020 • edited

DennisLiu-elogic commented Jun 15, 2020

meiqua commented Jun 15, 2020

DennisLiu-elogic commented Jun 15, 2020

meiqua commented Jun 15, 2020

DennisLiu-elogic commented Jun 16, 2020

meiqua commented Jun 16, 2020

XuleiTao commented Jun 25, 2020

meiqua commented Jun 25, 2020

meiqua commented Jun 25, 2020

XuleiTao commented Jun 25, 2020

meiqua commented Jun 25, 2020

XuleiTao commented Jun 26, 2020

meiqua commented Jun 27, 2020

XuleiTao commented Jun 28, 2020

meiqua commented Jun 28, 2020

XuleiTao commented Jun 28, 2020

meiqua commented Jun 28, 2020

XuleiTao commented Jun 28, 2020

XuleiTao commented Jun 30, 2020

zzqusst commented Nov 23, 2021

wiekern commented May 6, 2022 • edited

wiekern commented May 6, 2022

DennisLiu1993 commented May 8, 2022

meiqua commented May 5, 2020 •

edited

DennisLiu-elogic commented May 11, 2020 •

edited

meiqua commented May 11, 2020 •

edited

DennisLiu-elogic commented Jun 4, 2020 •

edited

DennisLiu-elogic commented Jun 8, 2020 •

edited

DennisLiu-elogic commented Jun 8, 2020 •

edited

meiqua commented Jun 14, 2020 •

edited

wiekern commented May 6, 2022 •

edited