Accelerate PyTorch just-in-time compilation using MKL-DNN #23657

Jianhui-Li · 2019-08-01T16:41:56Z

🚀 Feature

Accelerate PyTorch just-in-time compilation using MKL-DNN

Motivation

PyTorch's just-in-time (JIT) compiler rewrites and runs Pytorch model at production-efficiency. MKL-DNN is built to accelerate deep learning applications in production environment. With the high performance primitives like conv, rnn, and gemm, MKL-DNN accelerates most deep learning models significantly on multiple Intel CPU generations using AVX2, AVX512, AVX512-VNNI and future deep learning acceleration technology.

With MKL-DNN enabled in JIT compiler, user can use JIT mode to get best performance with MKL-DNN with minimum change of Pytorch code. In imperative mode, user needs to explicitly insert format conversion for MKL-DNN operations using tensor.to_mkldnn() and to_dense(). In JIT mode, user doesn’t have to do so. User may need to pass an explicit flag or invoke a specific MKL-DNN optimization pass. It automatically converts CPU path op to MKL-DNN op, and propagates mkl-dnn format across neighbor MKL-DNN operations. It includes all performance benefits possibly achieved in imperative mode and additional graph optimization.

Pitch

Use PyTorch just-in-time compilation to get MKL-DNN acceleration with one flag (or function call)

Additional context

The MKL-DNN optimization pass includes mkl-dnn format propagation and fusion as initial step. The mkl-dnn formation propagation converts CPU ops to MKL-DNN ops. Format conversion ops are added in-between CPU and MKL-DNN ops.

The implementation of PyTorch MKL-DNN JIT backend will be located in the ‘backend’ directory in JIT sub-directory

gottbrath · 2019-08-29T19:48:11Z

Jianhui, do I understand this correctly that this optimization only makes sense for inference use cases?

Jianhui-Li · 2019-08-29T20:41:23Z

@gottbrath The optimization we are implementing at this stage is for inference. But the fusion and mkl-dnn format propagation can be extended to work for training as well, which depends on those pending training PRs on imperative mode enabling the MKL-DNN backward operation.

gottbrath · 2019-08-29T21:39:22Z

In the meeting today you said you had a concern with weights being treated as constants or not. Can you articulate this question/request either here or in another issue? I think we have the right people watching.

Jianhui-Li · 2019-08-29T21:58:19Z

Yes. The question is whether you have plan for the support of freezing graph, i.e. mark model weight to be constant to allow constant propagation and subexpression elimination. TF supports this https://www.tensorflow.org/guide/extend/model_files#freezing. User needs to call this explictly to enable optimization for inference.

CaoZhongZ · 2019-08-30T07:46:27Z

We did experiments to 'freeze params', and described the intension and expected results in:
https://gist.github.com/CaoZhongZ/34c2796deef1cc8871039b3d7441f770

Also some code snippets.

jgong5 · 2019-09-02T01:06:47Z

One thing I would like to clarify on "freezing graph" feature support is that it is not specific to MKL-DNN. It is a general feature applicable to all backends.

gottbrath · 2019-09-04T17:09:14Z

@suo I don't think I've heard "freezing" as a stage in our desired JIT workflow but I can see how knowing that the weights are constant could allow optimizations that wouldn't be possible otherwise. What are your thoughts on this topic?

soumith · 2019-09-04T21:15:38Z

@dzhulgakov and I have talked quite a bit about "freezing". It was a topic especially for MKL-DNN packing. Without user-marking code that a weight is locked / frozen, I don't think we have a way to do this across iterations (i.e. we separate the pointers given as inputs from the graph, so we have no guarantees on the pointers corresponding to the same Tensor, or even the same data)

ZolotukhinM · 2019-09-05T00:04:09Z

@soumith @dzhulgakov, how about we perform the MKLDNN conversion as a module to module transformation? In this transform we would create a copy of the original module with all weights packed appropriately and all ops rewritten to mkldnn counterparts?

soumith · 2019-09-05T01:13:28Z

@ZolotukhinM here's sample code that will fail with program transform:

x = torch.randn(20, 10)
y = torch.nn.Linear(10, 20)
y.weight = x

y = torch.jit.script(y) # let's say MKLDNN transform has been applied
x.fill_(0)
out = y(inp) # WRONG because y will use y.weight at transform time, but that's not what user expects.

ZolotukhinM · 2019-09-05T02:03:33Z

I would expect it to be slightly different:

x = torch.randn(20, 10)
y = torch.nn.Linear(10, 20)
y.weight = x

y = torch.jit.script(y)
z = to_mkldnn(y)
x.fill_(0)
out = y(inp) # will use the zeroed version of x
out2 = z(inp) # will use the original version of x that was packed at the 'to_mkldnn' step

Do you think it's still confusing?

In other words, I think it's very important for that step to be explicit, otherwise I would absolutely agree that it would be confusing. If we're going to make this step implicit, then I agree with your and Dima's conclusion. However, note that quantization is performing somewhat similar transformation and its API looks like the one I showed here with similar assumptions.

CaoZhongZ · 2019-09-05T03:12:49Z

Yes, we expect user to explicitly specify which 'params' are constants otherwise there is no safe way we could pre-pack weight for inference. Although we required this feature from MKL-DNN’s perspective however cuDNN would also benefit from it when cudnnReorderFilterAndBias call is possible after user guaranteed they won’t change weight and bias.

soumith · 2019-09-05T03:37:17Z

that's not confusing, that's what we need to figure out, a locking / unlocking API

ZolotukhinM · 2019-09-05T15:02:06Z

FWIW, I think "freezing" feature is valuable independently on whether we use it for MKLDNN or not. I just think that the conversion might be implemented without it and it would be better aligned with how quantization, another model transformation, is planned to be implemented.

soumith · 2019-09-10T19:19:14Z

FWIW, I think "freezing" feature is valuable independently on whether we use it for MKLDNN or not.

Yes, previously it came up in the context of CuDNN / RNN, because CuDNN wanted the weights to be pre-packed in a certain way.

I think as long as we have an undo / unlock mechanism, something like model.freeze() might be worth adding.

gottbrath assigned suo, ZolotukhinM, bddppq and jianyuh and unassigned suo and ZolotukhinM Aug 29, 2019

CaoZhongZ mentioned this issue Oct 16, 2019

Adapt MKL-DNN operators for both imperative and future JIT environment #28067

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate PyTorch just-in-time compilation using MKL-DNN #23657

Accelerate PyTorch just-in-time compilation using MKL-DNN #23657

Jianhui-Li commented Aug 1, 2019

gottbrath commented Aug 29, 2019

Jianhui-Li commented Aug 29, 2019

gottbrath commented Aug 29, 2019

Jianhui-Li commented Aug 29, 2019

CaoZhongZ commented Aug 30, 2019 •

edited

jgong5 commented Sep 2, 2019

gottbrath commented Sep 4, 2019

soumith commented Sep 4, 2019

ZolotukhinM commented Sep 5, 2019

soumith commented Sep 5, 2019

ZolotukhinM commented Sep 5, 2019 •

edited

CaoZhongZ commented Sep 5, 2019

soumith commented Sep 5, 2019

ZolotukhinM commented Sep 5, 2019

soumith commented Sep 10, 2019

Accelerate PyTorch just-in-time compilation using MKL-DNN #23657

Accelerate PyTorch just-in-time compilation using MKL-DNN #23657

Comments

Jianhui-Li commented Aug 1, 2019

🚀 Feature

Motivation

Pitch

Additional context

gottbrath commented Aug 29, 2019

Jianhui-Li commented Aug 29, 2019

gottbrath commented Aug 29, 2019

Jianhui-Li commented Aug 29, 2019

CaoZhongZ commented Aug 30, 2019 • edited

jgong5 commented Sep 2, 2019

gottbrath commented Sep 4, 2019

soumith commented Sep 4, 2019

ZolotukhinM commented Sep 5, 2019

soumith commented Sep 5, 2019

ZolotukhinM commented Sep 5, 2019 • edited

CaoZhongZ commented Sep 5, 2019

soumith commented Sep 5, 2019

ZolotukhinM commented Sep 5, 2019

soumith commented Sep 10, 2019

CaoZhongZ commented Aug 30, 2019 •

edited

ZolotukhinM commented Sep 5, 2019 •

edited