Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1x1 Convolution of 2 stride code issue #95

Closed
sj-leo opened this issue Aug 7, 2017 · 12 comments
Closed

1x1 Convolution of 2 stride code issue #95

sj-leo opened this issue Aug 7, 2017 · 12 comments
Labels

Comments

@sj-leo
Copy link

sj-leo commented Aug 7, 2017

Dear MKLDNN developers,

When I ran Resnet 50 with the Intel Caffe and MKLDNN, there is no change of memory address in 1x1 convolution of stride 2.

In the code, your team defined 1x1 convolution of 2 stride to ‘reduce’ and defined address of ‘reduce’ to ‘jcp.ws’.
When I print the address of ‘jcp.ws’, there is only one address of ‘jcp.ws’(I think it should change).
So, I want to know that it is true the address changes or not.

For a detailed explanation, I write the setup and the file of 1x1 convolution code.

Setup

  • ML framework: Intel Caffe (version: v1.0.3)
  • MKLDNN version: v0.9
  • ML Algorithm: Resnet 50

code
src/cpu/jit_avx512_common_1x1_convolution.cpp (line: 166-175)

Thank you.

@emfomenk
Copy link

emfomenk commented Aug 7, 2017

Hi @sj-leo,

ws stands for workspace. It is internal memory for the primitive.
1x1 convolutions w/ non-unit stride now works as follows:

  • compress (reduce) source image into workspace (take only pixels with given stride)
  • run regular 1x1 conv kernel for unit-stride with workspace as input (instead of original src)

So even if address of the actual src data is changed (i.e. another src is taken) the workspace address remains the same. In other words I would not expect ws to be changed.

@sj-leo
Copy link
Author

sj-leo commented Aug 7, 2017

In workflow, you said that 1x1 convolutions w/ non-unit stride compress source image into workspace.
In order to use workspace, I think the code change input height(ih) and input weight(iw) in the base ws address. But, there is no change of ih and iw (I also see jit_avx512_common_1x1_conv_kernel.cpp).

So, in the situation where ws does not change address, I think that convolution for all space of ws is not performed and I don’t know it is right implementation.

I wonder why the iterative convolution is performed on the same area without changing ih and iw of ws.

@rsdubtso
Copy link

rsdubtso commented Aug 7, 2017

The inner_ker() is called from a loop over the original dimensions. Look at the code starting from src/cpu/jit_avx512_common_1x1_convolution.cpp:180. The data from the original input tensor is then copied to the workspace to remove strides. And you are right that each thread reuses its own part of the workspace across different inner_ker() calls, but the data in the workspace comes from different parts of the input.

@sj-leo
Copy link
Author

sj-leo commented Aug 7, 2017

@rsdubtso As you say, I check the code the copied data to remove strides and workspace comes from different parts of the input in thread level.

But in single thread, there seems to be no part where iw and ih are changed.
If you tell me more about that part, I will refer to it.

@emfomenk
Copy link

emfomenk commented Aug 7, 2017

Convolution kernel goes over the whole image ih * iw, so the pointer shouldn't be changed wrt to the spatial dimension.

@sj-leo
Copy link
Author

sj-leo commented Aug 8, 2017

As far as I know, the convolution kernel goes over only a fraction of the image ih * iw by the bcast_step.
If iwork changes by bcast_step, I think the rp.ws pointer needs to change.

I would like to ask for further information.

@emfomenk
Copy link

emfomenk commented Aug 8, 2017

Yeah, my bad -- you are right, kernel does not always process ih * iw.

According to the line we reduce src from the right place.
Though the strange thing is line 169 -- the condition.
It doesn't seem to be always correct (at least not for all the loop orders).
Let me double check that.

Do you have a particular failing example or this question is mostly due to curiosity? :)

@sj-leo
Copy link
Author

sj-leo commented Aug 8, 2017

In my case, I am looking at how the convolution operation of Resnet 50 is performed and whether there is room for performance optimization on my curiosity..
For this reason, I am in doubt about normal operation.

@ankalinin
Copy link
Contributor

Loop order is set in jit_avx512_common_1x1_conv_kernel::init_conf method from jit_avx512_common_1x1_conv_kernel.cpp. There are two possible loop orders for 1x1 convolutions with non-unit strides: loop_blr (bcast-load-reduce) or loop_rbl (reduce-bcast-load). The condition in line 169 is correct for these loop orders and it controls that reducing performs only once for certain blocks of input channels and certain fraction of the image ih * iw.

@ankalinin
Copy link
Contributor

But in single thread, there seems to be no part where iw and ih are changed.

iw and ih are changed on each iteration of loop over bcast dimension. E.g. line 190

@haidj
Copy link

haidj commented Nov 8, 2017

Hi, @emfomenk

I wonder if this issue within 1x1 convolution is resolved.

Thank you for your help

Yeah, my bad -- you are right, kernel does not always process ih * iw.

According to the line we reduce src from the right place.
Though the strange thing is line 169 -- the condition.
It doesn't seem to be always correct (at least not for all the loop orders).
Let me double check that.

Do you have a particular failing example or this question is mostly due to curiosity? :)

@emfomenk
Copy link

emfomenk commented Nov 8, 2017

Hi @haidj,

My suspicion was incorrect.
According to this comment there is not issue with the current code.
Thanks to @ankalinin for the confirmation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants