[JIT] Optimize FunctionSchema::checkArg for the Tensor case. #48034

ZolotukhinM · 2020-11-16T20:00:35Z

Stack from ghstack:

[JIT] Pass TypePtr by reference in Argument::type() and Type::isSubtypeOfExt(). #48061 [JIT] Pass TypePtr by reference in Argument::type() and Type::isSubtypeOfExpt().
[JIT] Optimize FunctionSchema::checkArg for the Tensor case. #48034 [JIT] Optimize FunctionSchema::checkArg for the Tensor case.

The Tensor case is one of the most common and the existing check can be
made faster. This results in a ~21% improvement on DeepAndWide model and
would improve other models as well.

Before the change:

505[ms]
491[ms]
514[ms]
538[ms]
514[ms]
554[ms]
556[ms]
512[ms]
516[ms]
527[ms]

After the change:

406[ms]
394[ms]
414[ms]
423[ms]
449[ms]
397[ms]
410[ms]
389[ms]
395[ms]
414[ms]

Differential Revision: D24999486

This results in a ~25% improvement on DeepAndWide model and would improve other models as well. Before the change: ``` 522[ms] 507[ms] 559[ms] 512[ms] 510[ms] 548[ms] 515[ms] 600[ms] 518[ms] 494[ms] ``` After the change: ``` 388[ms] 622[ms] 404[ms] 405[ms] 380[ms] 379[ms] 579[ms] 417[ms] 377[ms] 409[ms] ``` [ghstack-poisoned]

dr-ci · 2020-11-16T20:21:57Z

💊 CI failures summary and remediations

As of commit b48cd83 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 8 times.

zdevito

It's a good idea to try to speed up this pathway for common cases. The caching approach here has a number of issues with it that I commented on below. Maybe tackle improving the performance of checkAndNormalizeInput first. For instance, isSubtypeOf is very slow and probably accidentally atomic reference counts types. For common types like Tensors and tuples of them, this can be made much faster by a fast path in checkArg (

pytorch/aten/src/ATen/core/function_schema_inl.h

Line 188 in c22bbb2

if (!value.type()->isSubtypeOf(argument.type())) {

).

zdevito · 2020-11-16T20:18:06Z

torch/csrc/jit/api/function_impl.cpp

+  bool need_schema_check = true;
+  if (!kwargs.size()) { // Fast path
+    size_t input_types_hash = computeInputTypesHash(stack);
+    if (!schema_checks_cache_.count(input_types_hash)) {


If the hash collides this check produces wrong results. In the fast path (a hit), one would need to check the equality of the types, which would require more computation.

zdevito · 2020-11-16T20:18:54Z

torch/csrc/jit/api/function_impl.cpp

+  if (!kwargs.size()) { // Fast path
+    size_t input_types_hash = computeInputTypesHash(stack);
+    if (!schema_checks_cache_.count(input_types_hash)) {
+      getSchema().checkAndNormalizeInputs(stack, kwargs);


If the schema has default arguments, or other things in the 'NormalizeInputs' bucket, then caching this is invalid because these actions need to be applied to each invocation.

zdevito · 2020-11-16T20:19:41Z

torch/csrc/jit/api/function_impl.cpp

+    size_t input_types_hash = computeInputTypesHash(stack);
+    if (!schema_checks_cache_.count(input_types_hash)) {
+      getSchema().checkAndNormalizeInputs(stack, kwargs);
+      schema_checks_cache_.insert(input_types_hash);


mutating the GraphFunction data structure requires holding a lock because it is invoked from multiple threads.

The Tensor case is one of the most common and the existing check can be made faster. This results in a ~21% improvement on DeepAndWide model and would improve other models as well. Before the change: ``` 505[ms] 491[ms] 514[ms] 538[ms] 514[ms] 554[ms] 556[ms] 512[ms] 516[ms] 527[ms] ``` After the change: ``` 406[ms] 394[ms] 414[ms] 423[ms] 449[ms] 397[ms] 410[ms] 389[ms] 395[ms] 414[ms] ``` Differential Revision: [D24999486](https://our.internmc.facebook.com/intern/diff/D24999486) [ghstack-poisoned]

ZolotukhinM · 2020-11-16T22:35:48Z

Thanks for the feedback! Indeed, we could achieve similar gains with a safer change in checkArg. Please take another look!

zdevito

Cool! I have additional suggestions for further speed improvement below.

zdevito · 2020-11-16T22:46:29Z

aten/src/ATen/core/function_schema_inl.h

@@ -151,6 +151,10 @@ inline void FunctionSchema::checkArg(
    const IValue& value,
    const Argument& argument,
    optional<size_t> pos) const {
+  if (value.isTensor() && argument.type() == TensorType::get()) {


This is correct (and faster, yay!) but still does 2 shared_ptr operations (argument.type() and TypeTensor::get() both return shared_ptr which has to be destructed).
argument.type()->kind() == TensorType::kind() would eliminate 1 shared_ptr creation. Modifying argument.type() to return a const TypePtr& would eliminate the other. Not sure if they will speed things up further, but worth checking.

I thought about checking only the type kind, but wasn't sure it would correctly work in a theoretical case where argument type would have, say, shapes specialized. I.e. if the graph would look like:

graph(%input : Float(10))

In that case, IIUC, argument.type()->kind() would equal TensorType::kind(), but argument.type() would not be equal TensorType::get().

As for your second suggestion, let me try that. FWIW, I also tried replacing signature of Type::isSubtypeOfExt to use a reference for the first argument, but I didn't notice improved performance from that.

Submitted #48061 for that, PTAL @zdevito!

The Tensor case is one of the most common and the existing check can be made faster. This results in a ~21% improvement on DeepAndWide model and would improve other models as well. Before the change: ``` 505[ms] 491[ms] 514[ms] 538[ms] 514[ms] 554[ms] 556[ms] 512[ms] 516[ms] 527[ms] ``` After the change: ``` 406[ms] 394[ms] 414[ms] 423[ms] 449[ms] 397[ms] 410[ms] 389[ms] 395[ms] 414[ms] ``` Differential Revision: [D24999486](https://our.internmc.facebook.com/intern/diff/D24999486) [ghstack-poisoned]

codecov · 2020-11-17T03:50:35Z

Codecov Report

Merging #48034 (b48cd83) into gh/ZolotukhinM/378/base (013e6a3) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@                   Coverage Diff                    @@
##           gh/ZolotukhinM/378/base   #48034   +/-   ##
========================================================
  Coverage                    81.30%   81.30%           
========================================================
  Files                         1839     1839           
  Lines                       198446   198446           
========================================================
+ Hits                        161337   161338    +1     
+ Misses                       37109    37108    -1

facebook-github-bot · 2020-11-17T05:11:31Z

@ZolotukhinM merged this pull request in 3611d26.

ZolotukhinM requested a review from apaszke as a code owner November 16, 2020 20:00

facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Nov 16, 2020

ZolotukhinM requested review from suo and zdevito November 16, 2020 20:01

zdevito reviewed Nov 16, 2020

View reviewed changes

ZolotukhinM changed the title ~~[JIT] Cache schema checks in GraphFunction::operator().~~ [JIT] Optimize FunctionSchema::checkArg for the Tensor case. Nov 16, 2020

ZolotukhinM requested a review from zdevito November 16, 2020 22:35

zdevito approved these changes Nov 16, 2020

View reviewed changes

ZolotukhinM mentioned this pull request Nov 16, 2020

[JIT] Pass TypePtr by reference in Argument::type() and Type::isSubtypeOfExt(). #48061

Closed

facebook-github-bot closed this in 3611d26 Nov 17, 2020

facebook-github-bot added the Merged label Nov 17, 2020

facebook-github-bot deleted the gh/ZolotukhinM/378/head branch November 20, 2020 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JIT] Optimize FunctionSchema::checkArg for the Tensor case. #48034

[JIT] Optimize FunctionSchema::checkArg for the Tensor case. #48034

ZolotukhinM commented Nov 16, 2020 •

edited

dr-ci bot commented Nov 16, 2020 •

edited

zdevito left a comment

zdevito Nov 16, 2020

zdevito Nov 16, 2020

zdevito Nov 16, 2020

ZolotukhinM commented Nov 16, 2020

zdevito left a comment

zdevito Nov 16, 2020

ZolotukhinM Nov 16, 2020

ZolotukhinM Nov 16, 2020

codecov bot commented Nov 17, 2020

facebook-github-bot commented Nov 17, 2020

[JIT] Optimize FunctionSchema::checkArg for the Tensor case. #48034

[JIT] Optimize FunctionSchema::checkArg for the Tensor case. #48034

Conversation

ZolotukhinM commented Nov 16, 2020 • edited

dr-ci bot commented Nov 16, 2020 • edited

💊 CI failures summary and remediations

zdevito left a comment

Choose a reason for hiding this comment

zdevito Nov 16, 2020

Choose a reason for hiding this comment

zdevito Nov 16, 2020

Choose a reason for hiding this comment

zdevito Nov 16, 2020

Choose a reason for hiding this comment

ZolotukhinM commented Nov 16, 2020

zdevito left a comment

Choose a reason for hiding this comment

zdevito Nov 16, 2020

Choose a reason for hiding this comment

ZolotukhinM Nov 16, 2020

Choose a reason for hiding this comment

ZolotukhinM Nov 16, 2020

Choose a reason for hiding this comment

codecov bot commented Nov 17, 2020

Codecov Report

facebook-github-bot commented Nov 17, 2020

ZolotukhinM commented Nov 16, 2020 •

edited

dr-ci bot commented Nov 16, 2020 •

edited