How to speed up the creation of responsemap? #21

Dyson-Ido · 2019-05-06T13:29:52Z

Hi Meiqua,
测试中发现整个时间，大部分用于了创建responsemap, 请问这个过程可以有办法加速么？例如我有张130万像素的图片，用了150ms创建的，有办法加快么？谢谢！

meiqua · 2019-05-06T15:32:17Z

有，还可以加速一个量级。
目前的速度瓶颈在opencv的filter上，我之前研究过，opencv用到的加速技巧有separeble filter, SIMD。还可以用的技巧有parallelism、kernel fusion(sobel phase pyrdown大部分都能fuse起来)。
其实这个要写的可读性好、速度最快还挺难的。不过有一帮大神有专门为这个设计了个DSL（domain specified language） halide talk, halide github，非常不错，不直接用也可以参考下思路。

DennisLiu-elogic · 2020-04-18T15:20:36Z

有，还可以加速一个量级。
目前的速度瓶颈在opencv的filter上，我之前研究过，opencv用到的加速技巧有separeble filter, SIMD。还可以用的技巧有parallelism、kernel fusion(sobel phase pyrdown大部分都能fuse起来)。
其实这个要写的可读性好、速度最快还挺难的。不过有一帮大神有专门为这个设计了个DSL（domain specified language） halide talk, halide github，非常不错，不直接用也可以参考下思路。

想請問meiqua大對於OpenCV4.0 GAPI有什麼看法?
這是官方的解說
**G-API is a separate OpenCV module so its header files have to be included explicitly. The first four lines of main() create and initialize OpenCV's standard video capture object, which fetches video frames from either an attached camera or a specified file.

G-API pipeline is constructed next. In fact, it is a series of G-API operation calls on cv::GMat data. The important aspect of G-API is that this code block is just a declaration of actions, but not the actions themselves. No processing happens at this point, G-API only tracks which operations form pipeline and how it is connected. G-API Data objects (here it is cv::GMat) are used to connect operations each other. in is an empty cv::GMat signalling that it is a beginning of computation.**

我覺得其中的概念跟Halide很像
寫了下測試程式:
int main()
{
Mat matSrc = imread ("C:\Users\User\source\repos\ShapeBasedMatching\ShapeBasedMatching\test\MyCase0\cross.bmp");
Mat matSmoothed, matSobelX, matSobelY, matMag, matAg;
GMat gmatSrc;
GMat gmatSmoothed = cv::gapi::gaussianBlur (gmatSrc, Size (7, 7), 0, 0, BORDER_REPLICATE);
GMat gmatSobelX = cv::gapi::Sobel (gmatSmoothed, CV_32F, 1, 0, 3, 1.0, 0.0, BORDER_REPLICATE);
GMat gmatSobelY = cv::gapi::Sobel (gmatSmoothed, CV_32F, 0, 1, 3, 1.0, 0.0, BORDER_REPLICATE);
GMat gmatMag = cv::gapi::add (cv::gapi::mul (gmatSobelX, gmatSobelX), cv::gapi::mul (gmatSobelY, gmatSobelY));
//GMat gmatAg = cv::gapi::phase (gmatSobelX, gmatSobelY, true);
GComputation gcomputeMag (gmatSrc, gmatMag);
//GComputation gcomputeAg (gmatSrc, gmatAg);

for (int i = 0 ; i < 4 ; i++)
{
	
	double d1 = clock ();

	GaussianBlur (matSrc, matSmoothed, Size (7, 7), 0, 0, BORDER_REPLICATE);
	Sobel (matSmoothed, matSobelX, CV_32F, 1, 0, 3, 1.0, 0.0, BORDER_REPLICATE);
	Sobel (matSmoothed, matSobelY, CV_32F, 0, 1, 3, 1.0, 0.0, BORDER_REPLICATE);
	add (matSobelX.mul (matSobelX), matSobelY.mul (matSobelY), matMag);
	phase (matSobelX, matSobelY, matAg, true);
	double d2 = clock ();
	
	gcomputeMag.apply (matSrc, matMag);
	//gcomputeAg.apply (matSrc, matAg);
	double d3 = clock ();

	cout << d2 - d1 << "ms, " << d3 - d2 << "ms" << endl;
}

return 0;

}
只做Magnitude，速度快了2倍左右

meiqua · 2020-04-18T16:20:00Z

@DennisLiu-elogic 很赞，相当于简化的halide，没有自己安排的接口但用在一般场合完全够了

meiqua · 2020-05-05T10:51:43Z

最近简单实现了整个fusion的过程，大家有兴趣可以跑跑看。

aemior · 2020-05-09T03:13:22Z

最近尝试用Halide加速responsemap构造的过程，目前仅仅测试了 quantizedOrientations 函数能不能加速，可是发现一个不理解的现象，就是对于一张 2048x2048的RGB图像，如果放到整个匹配的过程里面跑，测时间，量化梯度方向 quantizedOrientations 仅仅需要 120ms 左右。但是如果将 quantizedOrientations 单独拿出来跑却需要 450ms。所谓单独跑是这样的，因为要对比自己写的 Halide 加速的 quantizedOrientations 函数，所以将quantizedOrientations 连带hysteresisGradient 直接从line2Dup.cpp 里复制出来，编译成一个可执行文件去跑一张图像做对比。请问有谁知道，出现这种情况有可能是什么原因，完全摸不着头脑。opencv 版本是 3.4.5 gcc版本7.5，不知道会不会是编译的问题。

aemior · 2020-05-09T03:56:55Z

一直埋头踩坑，才发现@meiqua大大已经手工做好了fusion版本，太高效了

meiqua · 2020-05-10T02:30:37Z

@aemior 优化打开了吗？
哈哈，手工fusion也是我踩坑踩过来的。如果Halide配好了可以比较一下看看。

aemior · 2020-05-11T00:57:29Z

@meiqua 刚按照Halide 的语法写好piplline，只是到量化方向，还没spread，优化只是做了一些parallels和vectors，自动优化也还没尝试，目前就是卡在前面说的单独测试，Halide写的能把450ms的流程加速到160ms，但是放到整个匹配的流程里面，速度竟然没有opencv的快，有点懵圈，还在检查中

aemior · 2020-05-11T01:00:12Z

另外Halide感觉没有想象中的好用，主要是实现图像操作的语法有些不一样，安装也配LLVM也不容易，踩坑踩得吐血

meiqua · 2020-05-11T02:41:35Z

@aemior 哈哈，加油

tingcao-njust · 2020-07-03T01:06:50Z

我覺得其中的概念跟Halide很像
寫了下測試程式:
int main()
{
Mat matSrc = imread ("C:\Users\User\source\repos\ShapeBasedMatching\ShapeBasedMatching\test\MyCase0\cross.bmp");
Mat matSmoothed, matSobelX, matSobelY, matMag, matAg;
GMat gmatSrc;
GMat gmatSmoothed = cv::gapi::gaussianBlur (gmatSrc, Size (7, 7), 0, 0, BORDER_REPLICATE);
GMat gmatSobelX = cv::gapi::Sobel (gmatSmoothed, CV_32F, 1, 0, 3, 1.0, 0.0, BORDER_REPLICATE);
GMat gmatSobelY = cv::gapi::Sobel (gmatSmoothed, CV_32F, 0, 1, 3, 1.0, 0.0, BORDER_REPLICATE);
GMat gmatMag = cv::gapi::add (cv::gapi::mul (gmatSobelX, gmatSobelX), cv::gapi::mul (gmatSobelY, gmatSobelY));
//GMat gmatAg = cv::gapi::phase (gmatSobelX, gmatSobelY, true);
GComputation gcomputeMag (gmatSrc, gmatMag);
//GComputation gcomputeAg (gmatSrc, gmatAg);
for (int i = 0 ; i < 4 ; i++)
{
	
	double d1 = clock ();

	GaussianBlur (matSrc, matSmoothed, Size (7, 7), 0, 0, BORDER_REPLICATE);
	Sobel (matSmoothed, matSobelX, CV_32F, 1, 0, 3, 1.0, 0.0, BORDER_REPLICATE);
	Sobel (matSmoothed, matSobelY, CV_32F, 0, 1, 3, 1.0, 0.0, BORDER_REPLICATE);
	add (matSobelX.mul (matSobelX), matSobelY.mul (matSobelY), matMag);
	phase (matSobelX, matSobelY, matAg, true);
	double d2 = clock ();
	
	gcomputeMag.apply (matSrc, matMag);
	//gcomputeAg.apply (matSrc, matAg);
	double d3 = clock ();

	cout << d2 - d1 << "ms, " << d3 - d2 << "ms" << endl;
}

return 0;
}
只做Magnitude，速度快了2倍左右

嗨，关于这部分有个疑问：
d1到d2区间完成了梯度强度和梯度方向的计算，d2到d3只完成了梯度强度的计算，这样对比时间不太合理吧？
如果，d2到d3区间做梯度强度和梯度方向的计算，则时间比d1到d2区间完成梯度强度和梯度方向的计算更长（怀疑是因为SobelX,SobelY重复计算的原因）

zhirui-gao · 2021-03-18T15:25:58Z

hi，meiqua！想请教您一下，对于多个roi图片的识别，有没有一些离线的步骤，可以减少总的匹配时间？

meiqua · 2021-03-26T13:54:34Z

@zhirui-gao 如果ROI都确定而且重合较少，可以crop成不同的小块匹配

meiqua added enhancement New feature or request question Further information is requested labels May 28, 2019

meiqua mentioned this issue Sep 4, 2019

運行速度問題 #43

Open

meiqua mentioned this issue Sep 11, 2019

arm板（ti5728）匹配速度问题 #45

Closed

meiqua mentioned this issue Oct 14, 2019

還有辦法再加速嗎? #54

Open

meiqua mentioned this issue Nov 15, 2019

本案例对标的是halcon的find_shape_model算子？ #62

Open

meiqua mentioned this issue May 5, 2020

response map fusion implementation #77

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to speed up the creation of responsemap? #21

How to speed up the creation of responsemap? #21

Dyson-Ido commented May 6, 2019

meiqua commented May 6, 2019

DennisLiu-elogic commented Apr 18, 2020 •

edited

meiqua commented Apr 18, 2020

meiqua commented May 5, 2020

aemior commented May 9, 2020 •

edited

aemior commented May 9, 2020

meiqua commented May 10, 2020

aemior commented May 11, 2020

aemior commented May 11, 2020

meiqua commented May 11, 2020

tingcao-njust commented Jul 3, 2020 •

edited

zhirui-gao commented Mar 18, 2021

meiqua commented Mar 26, 2021

How to speed up the creation of responsemap? #21

How to speed up the creation of responsemap? #21

Comments

Dyson-Ido commented May 6, 2019

meiqua commented May 6, 2019

DennisLiu-elogic commented Apr 18, 2020 • edited

meiqua commented Apr 18, 2020

meiqua commented May 5, 2020

aemior commented May 9, 2020 • edited

aemior commented May 9, 2020

meiqua commented May 10, 2020

aemior commented May 11, 2020

aemior commented May 11, 2020

meiqua commented May 11, 2020

tingcao-njust commented Jul 3, 2020 • edited

zhirui-gao commented Mar 18, 2021

meiqua commented Mar 26, 2021

DennisLiu-elogic commented Apr 18, 2020 •

edited

aemior commented May 9, 2020 •

edited

tingcao-njust commented Jul 3, 2020 •

edited