Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jit][tensorexpr] Added aten::batch_norm into fuser when in inference mode #54204

Closed
wants to merge 6 commits into from

Conversation

huiguoo
Copy link
Contributor

@huiguoo huiguoo commented Mar 17, 2021

Stack from ghstack:

Differential Revision: D27134348

@facebook-github-bot facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Mar 17, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 17, 2021

💊 CI failures summary and remediations

As of commit 80161bf (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build clang-tidy (1/1)

Step: "Add annotations" (full log | diagnosis details | 🔁 rerun)

2021-03-23T03:03:51.4993713Z update-alternatives: error: no alternatives for nsight-sys
2021-03-23T03:03:51.4531092Z Setting up x11proto-damage-dev (1:2018.4-4) ...
2021-03-23T03:03:51.4557069Z Setting up cuda-nvjpeg-10-2 (10.2.89-1) ...
2021-03-23T03:03:51.4626440Z Setting up libxcb-dri3-dev:amd64 (1.13-2~ubuntu18.04) ...
2021-03-23T03:03:51.4652923Z Setting up libcublas10 (10.2.3.254-1) ...
2021-03-23T03:03:51.4681220Z Setting up libcublas-dev (10.2.3.254-1) ...
2021-03-23T03:03:51.4711141Z Setting up cuda-cufft-10-2 (10.2.89-1) ...
2021-03-23T03:03:51.4783640Z Setting up cuda-nsight-compute-10-2 (10.2.89-1) ...
2021-03-23T03:03:51.4812669Z Setting up nsight-systems-2020.4.3 (2020.4.3.7-10543b6) ...
2021-03-23T03:03:51.4905754Z update-alternatives: using /opt/nvidia/nsight-systems/2020.4.3/target-linux-x64/nsys to provide /usr/local/bin/nsys (nsys) in auto mode
2021-03-23T03:03:51.4982237Z update-alternatives: error: alternative path /opt/nvidia/nsight-systems/2020.4.3/host-linux-x64/nsight-sys doesn't exist
2021-03-23T03:03:51.4993713Z update-alternatives: error: no alternatives for nsight-sys
2021-03-23T03:03:51.5052400Z update-alternatives: using /opt/nvidia/nsight-systems/2020.4.3/host-linux-x64/nsys-ui to provide /usr/local/bin/nsys-ui (nsys-ui) in auto mode
2021-03-23T03:03:51.5093605Z Setting up libxcb-shape0-dev:amd64 (1.13-2~ubuntu18.04) ...
2021-03-23T03:03:51.5122847Z Setting up cuda-cusparse-10-2 (10.2.89-1) ...
2021-03-23T03:03:51.5194743Z Setting up cuda-cuobjdump-10-2 (10.2.89-1) ...
2021-03-23T03:03:51.5222038Z Setting up libxcb-glx0-dev:amd64 (1.13-2~ubuntu18.04) ...
2021-03-23T03:03:51.5278933Z Setting up libgles2:amd64 (1.0.0-2ubuntu2.3) ...
2021-03-23T03:03:51.5310230Z Setting up libglu1-mesa:amd64 (9.0.0-2.1build1) ...
2021-03-23T03:03:51.5339589Z Setting up libegl-mesa0:amd64 (20.0.8-0ubuntu1~18.04.1) ...
2021-03-23T03:03:51.5422164Z Setting up cuda-sanitizer-api-10-2 (10.2.89-1) ...
2021-03-23T03:03:51.5493774Z Setting up cuda-nvjpeg-dev-10-2 (10.2.89-1) ...

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

huiguoo added a commit that referenced this pull request Mar 17, 2021
… mode

ghstack-source-id: 2393e6582316f865f83c9291d5e5b8e25b7c9d5d
Pull Request resolved: #54204
Copy link

@ZolotukhinM ZolotukhinM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a couple of comments inline!


// parameter list: input, weight, bias, mean, var, training,
// momentum, eps, cudnn_enabled
Tensor* input = tensors_.at(n->inputs()[0]->unique());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: use n->input(X) instead of n->inputs()[X].

Comment on lines 1123 to 1132
Tensor* weight = tensors_.at(n->inputs()[1]->unique());
Tensor* bias = tensors_.at(n->inputs()[2]->unique());

auto inv_var = rsqrt(var->call(c) + eps);
auto weight_v = weight->call(c);
auto bias_v = bias->call(c);
auto alpha = inv_var * weight_v;
auto beta = bias_v - mean->call(c) * alpha;
auto output = input->call(axes) * alpha + beta;
return demoteOutput(output, n->output());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could merge these four cases using weight_v = 1 and bias_v = 0 as default values.

Copy link
Contributor

@bertmaher bertmaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! Can you check two things before landing:

  1. How does that training boolean parameter get set? Is it from the eval() state of the model? What happens if the user switches the model between eval() and train()?
  2. The promoteInputs question I have below; make sure we "do the right thing" if the argument types don't match.

ExprHandle eps = constant(n->inputs()[7]);

// axes: N, C, H, W
std::vector<VarHandle> c(axes.begin() + 1, axes.begin() + 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're just getting one axis here, right (c)? I think I'd be inclined to do VarHandle c = axes[1]; here instead of constructing a one-element vector. call has a variadic overload so nothing below should have to change (at least, I don't think so)

auto weight_v = weight->call(c);
auto alpha = inv_var * weight_v;
auto output = input->call(axes) * alpha - mean->call(c) * alpha;
return demoteOutput(output, n->output());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need some promoteInputs as well. I'm not certain here b/c pytorch type promotion is complicated, but I wonder what you get if, e.g., input is a FloatTensor and mean is a DoubleTensor. (Not something that's likely to happen but we should handle it correctly all the same).

@huiguoo
Copy link
Contributor Author

huiguoo commented Mar 18, 2021

@bertmaher

How does that training boolean parameter get set? Is it from the eval() state of the model? What happens if the user switches the model between eval() and train()?

Great question! Has investigated on this and below is what I found.
There are two types of calls to aten::batch_norm in a pytorch program :
(1) nn.functional.batch_norm(input, mean, var, training=false)
(2) m = nn.BatchNorm2d(c) output=m(input)
In the first case, training parameter in aten::batch_norm is a constant and set to false in default. The state change between eval() and train() doesn't affect its value. We'll check the value of training parameter and only fuse batch_norm when training is false.
In the second case, the training parameter is determined by the model state: false in eval() and true in train(). If the user switches the model between eval() and train(), the value of training parameter will change correspondingly. In the torchscript graph, training is a prim::GetAttr node - we would not be able to fuse batch_norm unless we freeze the graph first so training becomes a constant.

huiguoo added a commit that referenced this pull request Mar 19, 2021
… mode

ghstack-source-id: 5507e00060377a2f519fd392359e792cd9374c16
Pull Request resolved: #54204
huiguoo added a commit that referenced this pull request Mar 19, 2021
… mode

ghstack-source-id: df5e2fdea5ade06ac43c7e28418cd2daff595013
Pull Request resolved: #54204
huiguoo added a commit that referenced this pull request Mar 23, 2021
… mode

ghstack-source-id: 459b504f3065716efc6d3a49a4c0120ed3b2640e
Pull Request resolved: #54204
huiguoo added a commit that referenced this pull request Mar 23, 2021
… mode

ghstack-source-id: 20b079b389b951acfc16651f4b7f5aeeeac5815f
Pull Request resolved: #54204
@facebook-github-bot
Copy link
Contributor

@huiguoo merged this pull request in 2a53897.

@facebook-github-bot facebook-github-bot deleted the gh/huiguoo/12/head branch March 26, 2021 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants