Port `cat` kernel to structured kernels. #68640

ysiraichi · 2021-11-19T09:30:38Z

Stack from ghstack (oldest at bottom):

Tracking issue: #55070

Differential Revision: D34521686

Tracking issue: #55070 [ghstack-poisoned]

pytorch-probot · 2021-11-19T09:30:40Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/94aec9cc15c1aeb35d1a549dab4913140c7fefaa/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/xla`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`	✅ triggered
linux-vulkan-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3.6-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`	✅ triggered
linux-xenial-py3.6-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
docker-builds	`ciflow/all`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`	🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-11-19T09:30:44Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/68640
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

💊 CI failures summary and remediations

As of commit 95b2799 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-04-15T20:33:43.4836391Z The PR is introduc...m to confirm whether this change is wanted or not.

2022-04-15T20:33:43.4822712Z processing existing schema:  text(__torch__.torch.classes.profiling.SourceRef _0) -> (str _0)
2022-04-15T20:33:43.4823566Z processing existing schema:  count(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-04-15T20:33:43.4824818Z processing existing schema:  duration_ns(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-04-15T20:33:43.4825925Z processing existing schema:  source(__torch__.torch.classes.profiling.SourceStats _0) -> (__torch__.torch.classes.profiling.SourceRef _0)
2022-04-15T20:33:43.4827510Z processing existing schema:  line_map(__torch__.torch.classes.profiling.SourceStats _0) -> (Dict(int, __torch__.torch.classes.profiling.InstructionStats) _0)
2022-04-15T20:33:43.4828866Z processing existing schema:  __init__(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-04-15T20:33:43.4829778Z processing existing schema:  enable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-04-15T20:33:43.4830917Z processing existing schema:  disable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-04-15T20:33:43.4832453Z processing existing schema:  _dump_stats(__torch__.torch.classes.profiling._ScriptProfile _0) -> (__torch__.torch.classes.profiling.SourceStats[] _0)
2022-04-15T20:33:43.4834453Z processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (NoneType _0)
2022-04-15T20:33:43.4836391Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
2022-04-15T20:33:43.4836424Z 
2022-04-15T20:33:43.4836506Z Broken ops: [
2022-04-15T20:33:43.4836771Z 	aten::as_strided_copy(Tensor self, int[] size, int[] stride, int? storage_offset=None) -> (Tensor)
2022-04-15T20:33:43.4836922Z 	aten::_values_copy(Tensor self) -> (Tensor)
2022-04-15T20:33:43.4837062Z 	aten::alias_copy(Tensor self) -> (Tensor)
2022-04-15T20:33:43.4837342Z 	aten::_nested_from_padded(Tensor padded, Tensor cpu_nested_shape_example, bool fuse_transform_0213=False) -> (Tensor)
2022-04-15T20:33:43.4837499Z 	aten::_fw_primal_copy(Tensor self, int level) -> (Tensor)
2022-04-15T20:33:43.4837701Z 	aten::_make_dual_copy(Tensor primal, Tensor tangent, int level) -> (Tensor)
2022-04-15T20:33:43.4837851Z 	aten::view_as_real_copy(Tensor self) -> (Tensor)
2022-04-15T20:33:43.4838011Z 	aten::view_as_complex_copy(Tensor self) -> (Tensor)

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Tracking issue: #55070 ghstack-source-id: eb657eaa09ae8ea2ee2ae4c84f7c0a7589efe02a Pull Request resolved: #68640

Tracking issue: #55070 [ghstack-poisoned]

Tracking issue: #55070 ghstack-source-id: 96b892bc36ddf4b82b72af1fcb1496f4d56276c1 Pull Request resolved: #68640

ysiraichi · 2021-11-19T11:20:43Z

aten/src/ATen/native/TensorShape.cpp

+  bool all_contiguous = true;
+  bool all_same_dtype = true;
+  bool all_same_sizes_and_stride = true;
+  auto memory_format = cat_compute_output_memory_format(tensors);


CPU and CUDA kernels computed the memory format in different ways. Here, I adopted how CUDA used to do it. Not sure if this is the best way to go at it. Any thoughts?

cc @ngimel for another opinion. I found the PR that added support for cuda, and a bit later for cpu. There's no explicit mention of why they differ, so, going with the way cuda does it now seems reasonable to me.

@mruberry helped point me to some discussion about this: #62560 (comment).

It sounds like making cuda's behavior the general behavior is the right move in this PR. We should also add a test (if one doesn't exist already), that confirms that memory_format behavior is the same across cpu and cuda.

@mruberry also pointed out another thing we need to fix in this PR: see #64709. Apparently, there parts of the codebase that call cat and expect cat(out=...) to resize the output tensor. That'll start printing warnings now that cat is structured. We should fix all of the places in our codebase that call cat(out=...) and expect the resize to happen, that way users don't start seeing warnings that they can't fix.

The easiest way to do that is probably to copy conditions in resize_output() that create the warn, and run them directly in the cat meta function, but raise an error instead of a warning. Then fix all the errors that show up from CI.

Existing cat memory_format test is here:

pytorch/test/test_tensor_creation_ops.py

Line 721 in 1d269e8

def test_cat_out_memory_format(self, device):

Thanks @ysiraichi, indeed the CUDA and CPU memory format calculation is inconsistent. I opened issue #63998 (talking exactly about this) earlier and wanted to work on a fix for that but never got around that because of some new responsibilities until earlier this week. I just came across this PR by chance and I see you have already incorporated the change.

So, with this PR, #63998 should also be fixed.

@bdhirsh when adding the existing cat memory format test, I kept the different behaviour of memory formats in mind. It will now need slight modification with your update making CPU and CUDA consistent, which should be quite straightforward (Edit: Yukio already covered that!).

ysiraichi · 2021-11-19T11:22:36Z

test/test_tensor_creation_ops.py

@@ -626,16 +626,22 @@ def test_cat_out(self, device):
        y = torch.randn((4, 6), device=device)

        with self.assertRaisesRegex(
-                RuntimeError, r"unsupported operation:.* input tensor 0"):
+                RuntimeError,
+                r"unsupported operation: some elements of the input tensor and "


Error messages changed here, because I'm using at::assert_no_overlap function. Should I revert that to show better error messages?

This seems reasonable. The previous error message was also related to overlap:

>>> torch.cat([x, y], out=x) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: 0unsupported operation: the input tensors cannot refer to any of the output memory locations. Found overlap in input tensor 0

Yeah. I was a bit worried that not having the number of the input tensor that had an overlap would be bad. :)

ysiraichi · 2021-11-19T11:23:19Z

test/test_tensor_creation_ops.py

@@ -751,7 +756,7 @@ def test_cat_out_memory_format(self, device):
        res2_cpu = torch.cat((a_cpu, b_cpu), out=out_cpu)

        self.assertTrue(res2_cuda.is_contiguous(memory_format=torch.contiguous_format))
-        self.assertTrue(res2_cpu.is_contiguous(memory_format=torch.channels_last))
+        self.assertTrue(res2_cpu.is_contiguous(memory_format=torch.contiguous_format))


Since I'm using only one method for inferring the memory format, CPU and CUDA behave consistently.

Tracking issue: #55070 [ghstack-poisoned]

Tracking issue: #55070 ghstack-source-id: 672014c0dd712eb9ff82fac784f2079321fcf39f Pull Request resolved: #68640

Tracking issue: #55070 [ghstack-poisoned]

Tracking issue: #55070 ghstack-source-id: 96bfc16bcb05bfd99eaf0eba1ce1a28265a94802 Pull Request resolved: #68640

Tracking issue: #55070 [ghstack-poisoned]

Tracking issue: #55070 ghstack-source-id: 29a4e1af550449b1b182a73cfea37218684e7e22 Pull Request resolved: #68640

bdhirsh · 2021-11-29T15:56:53Z

aten/src/ATen/native/native_functions.yaml

-    CPU: _cat_out_cpu
-    CUDA: cat_out_cuda
-    QuantizedCPU: cat_out_quantized_cpu
-


@zou3519 do you happen to know the history behind why we had both an at::cat and at::_cat? I couldn't dig up much of a reason from git blame. Although it seems useful to try to kill it as part of this structured kernel port.

I don't know why we have both. I agree if we can kill _cat we should; there doesn't seem to be a need for both _cat and cat to exist

bdhirsh · 2021-11-29T15:58:34Z

aten/src/ATen/native/native_functions.yaml

  dispatch:
-    CompositeExplicitAutograd: cat
+    SparseCPU, SparseCUDA: cat_sparse


cleaning up the sparse logic :)

I guess for consistency we'd like to have an out= variant for sparse kernels, but it looks like that's not true today. It's probably not necessary to try to fix that in this PR.

bdhirsh · 2021-11-29T16:05:15Z

test/test_tensor_creation_ops.py

@@ -739,8 +745,7 @@ def test_cat_out_memory_format(self, device):
        self.assertTrue(res1_cpu.is_contiguous(memory_format=torch.contiguous_format))

        # Case 2: if out= is not the correct shape then the output it is resized internally
-        # - For the CPU variant the memory format is that of the first tensor
-        # - For the CUDA variant it only propagates memory format if all the tensors have
+        # - For both CPU and CUDA variants, it only propagates memory format if all the tensors have


@ngimel, is this change in memory format propagation on CPU an issue? It seems useful to clean this logic up so it's backend-agnostic, but it sounds mildly BC-breaking.

Tracking issue: #55070 [ghstack-poisoned]

ezyang · 2022-03-08T02:06:34Z

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Tracking issue: #55070 Differential Revision: [D34521686](https://our.internmc.facebook.com/intern/diff/D34521686) [ghstack-poisoned]

Tracking issue: pytorch#55070 ghstack-source-id: cc4bb13cb10c842b31b3ff5e3be2b5360e927d39 Pull Request resolved: pytorch#68640

Tracking issue: #55070 Differential Revision: [D34521686](https://our.internmc.facebook.com/intern/diff/D34521686) [ghstack-poisoned]

ysiraichi · 2022-04-10T14:03:00Z

@ngimel @ezyang @bdhirsh
I ran some benchmarks on different branches in this stack of PRs:

bc512253d5: the baseline
#69607: similar to this PR, but generalizes ITensorListRef to IListRef
#73351: uses IListRef in the dispatcher, eliminating wasteful copies and ref-count bumps

Size	Branch	Time CPU (us)	Time CUDA (us)	Instructions
[1,2]	`bc512253d5` #69607 #73351	97.01 (2.28) 105.14 (3.65) 93.15 (4.09)	356.75 (70.23) 330.70 (6.69) 326.67 (3.44)	68323495.50 (12375.54) 74569195.40 (12037.75) 75222670.40 (20938.94)
[1,2,2]	`bc512253d5` #69607 #73351	102.01 (1.11) 106.51 (4.32) 94.33 (0.86)	347.10 (18.40) 363.83 (66.13) 361.96 (73.93)	71919045.40 (2468.85) 78584658.80 (17226.44) 79226327.90 (20665.40)
[3,256,256]	`bc512253d5` #69607 #73351	224157.47 (27083.82) 239175.47 (36902.33) 235462.35 (34238.07)	2970.30 (6.33) 2967.31 (8.96) 2968.43 (1.98)	12363011366.90 (2486.03) 12363671642.80 (3776.23) 12356348936.20 (2990.37)

Benchmark

ntensors = 1000

sizes = [
    (1, 2),
    (1, 2, 2),
    (3, 256, 256),
]

results = []
results_callgrind = []

for size in sizes:
    size_str = f"""[{",".join(str(s) for s in size)}]"""
    timer = benchmark.Timer(
        stmt="torch.cat(xs)",
        setup=f"import torch; xs = [torch.rand(*{size}) for _ in range({ntensors})];",
        label="Cat",
        sub_label=size_str,
        description="time (cpu)"
    )

    timer_cuda = benchmark.Timer(
        stmt="torch.cat(xs); torch.cuda.synchronize()",
        setup=f"import torch; xs = [torch.rand(*{size}, device='cuda') for _ in range({ntensors})]; torch.cuda.synchronize()",
        label="Cat",
        sub_label=size_str,
        description="time (cuda)"
    )

    results.append(timer.blocked_autorange(min_run_time=1))
    results.append(timer_cuda.blocked_autorange(min_run_time=1))
    results_callgrind.append(timer.collect_callgrind())

compare = benchmark.Compare(results)
compare.print()

for r in results_callgrind:
    print(r)

Tracking issue: #55070 Differential Revision: [D34521686](https://our.internmc.facebook.com/intern/diff/D34521686) [ghstack-poisoned]

ezyang · 2022-04-12T02:16:41Z

I would love to start merging this stack but there are still build failures

/var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Shape.cu:17:10: fatal error: ATen/ops/_cat_native.h: No such file or directory
 #include <ATen/ops/_cat_native.h>
          ^~~~~~~~~~~~~~~~~~~~~~~~

ezyang · 2022-04-12T03:00:42Z

Since you've got the benchmark, I wonder if you can run a little experiment, which is to try NOT materializing in the body of the kernel, and being willing to iterate multiple times. It may be that the dynamic allocation is swamping the predictable branches, and so we'd rather pump up the instruction count and avoid the dynamic alloc. Would be nice to know one way or another.

ysiraichi · 2022-04-12T08:06:46Z

I'm working on the build failures.

I wonder if you can run a little experiment, which is to try NOT materializing in the body of the kernel

Sure! I will do that once CI is green.

Tracking issue: #55070 Differential Revision: [D34521686](https://our.internmc.facebook.com/intern/diff/D34521686) [ghstack-poisoned]

ezyang · 2022-04-14T17:48:24Z

@pytorchbot merge this

github-actions · 2022-04-14T17:51:08Z

Hey @ysiraichi.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

ezyang · 2022-04-15T20:18:04Z

this is about to get yanked from the diff train because it breaks internal users. I just need to update call sites to not use native::

Summary: Tracking issue: #55070 Pull Request resolved: #68640 Approved by: https://github.com/ezyang Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/22a10ce51310e690745accce0910d740b82a1503 Reviewed By: dagitses Differential Revision: D34521686 Pulled By: mehtanirav fbshipit-source-id: 58434529551e9e09939f6e28367316a6a20d7774

mehtanirav · 2022-04-19T19:44:35Z

Fixed the internal references and few other linter errors in internal diff stack

ysiraichi · 2022-04-20T08:46:16Z

@ezyang
I tried benchmarking again, as you suggested.
Here are some things you should know before reading the table:

The baseline branch was updated bc512253d5 -> 9b639f263d
Ran 10 times, took the average and standard deviation (in parenthesis)
Not sure why the timings for CUDA are so different from before

Size	Branch	Time CPU (us)	Time CUDA (us)	Instructions
[1,2]	`9b639f263d` #69607 No Materialize #69607	108.26 (6.94) 117.22 (5.04) 112.24 (19.60)	583.42 (15.39) 379.66 (13.22) 425.72 (8.77)	70205164.80 (3087.50) 78516968.20 (4285.81) 74005639.00 (2517.92)
[1,2,2]	`9b639f263d` #69607 No Materialize #69607	112.11 (11.83) 127.76 (8.92) 111.87 (8.63)	656.60 (39.17) 398.07 (19.09) 423.01 (9.66)	73810091.60 (2313.66) 82418765.00 (4782.92) 78007370.90 (8175.53)
[3,256,256]	`9b639f263d` #69607 No Materialize #69607	231682.92 (96260.47) 212985.95 (52979.87) 223969.03 (45350.19)	3276.22 (52.04) 3054.36 (35.59) 3075.21 (20.65)	21130733648.90 (6350595776.36) 22370734623.30 (5351717520.03) 31795904828.90 (1386977188.98)

#68640 broke our build by porting `cat` structured kernels, not sure how CI didn't catch this Differential Revision: [D35780296](https://our.internmc.facebook.com/intern/diff/D35780296/) [ghstack-poisoned]

ezyang · 2022-04-20T18:33:49Z

I'm having a little difficulty interpreting the numbers here. What does the parenthesized number mean? If I go only by the non-parenthesized one, it seems like no materialize is better despite higher instruction count (what I suspected) and we should use that.

Summary: Pull Request resolved: #76111 #68640 broke our build by porting `cat` structured kernels, not sure how CI didn't catch this ghstack-source-id: 154335722 Test Plan: CI Reviewed By: navahgar, ajyu Differential Revision: D35780296 fbshipit-source-id: 0a262eb06a8d619227e5db10b6a775bf0b2e17c1

Summary: Pull Request resolved: #76111 #68640 broke our build by porting `cat` structured kernels, not sure how CI didn't catch this ghstack-source-id: 154335722 Test Plan: CI Reviewed By: navahgar, ajyu Differential Revision: D35780296 fbshipit-source-id: 0a262eb06a8d619227e5db10b6a775bf0b2e17c1 (cherry picked from commit aea6fbf)

ysiraichi · 2022-04-21T09:38:37Z

What does the parenthesized number mean?

It's the standard deviation. I left it there just to give a sense of how much the runs varied.

it seems like no materialize is better despite higher instruction count (what I suspected) and we should use that.

Agreed.

Port cat kernel to structured kernels.

8bdadb6

Tracking issue: #55070 [ghstack-poisoned]

pytorch-probot bot added the ciflow/default label Nov 19, 2021

ysiraichi mentioned this pull request Nov 19, 2021

Add support to Tensor[] for structured kernel codegen. #67964

Closed

facebook-github-bot added the cla signed label Nov 19, 2021

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Nov 19, 2021

ysiraichi added a commit that referenced this pull request Nov 19, 2021

Port cat kernel to structured kernels.

c6870ed

Tracking issue: #55070 ghstack-source-id: eb657eaa09ae8ea2ee2ae4c84f7c0a7589efe02a Pull Request resolved: #68640

pytorchbot added the open source label Nov 19, 2021

Update on "Port cat kernel to structured kernels."

19376ac

Tracking issue: #55070 [ghstack-poisoned]

Update on "Port cat kernel to structured kernels."

e7728cf

Tracking issue: #55070 [ghstack-poisoned]

ysiraichi added a commit that referenced this pull request Nov 19, 2021

Port cat kernel to structured kernels.

7778157

Tracking issue: #55070 ghstack-source-id: 96b892bc36ddf4b82b72af1fcb1496f4d56276c1 Pull Request resolved: #68640

ysiraichi commented Nov 19, 2021

View reviewed changes

ysiraichi added the module: structured kernels Related to new structured kernels functionality label Nov 19, 2021

Update on "Port cat kernel to structured kernels."

93cc25b

Tracking issue: #55070 [ghstack-poisoned]

ysiraichi added a commit that referenced this pull request Nov 24, 2021

Port cat kernel to structured kernels.

588de7e

Tracking issue: #55070 ghstack-source-id: 672014c0dd712eb9ff82fac784f2079321fcf39f Pull Request resolved: #68640

Update on "Port cat kernel to structured kernels."

10654e5

Tracking issue: #55070 [ghstack-poisoned]

ysiraichi added a commit that referenced this pull request Nov 24, 2021

Port cat kernel to structured kernels.

692376c

Tracking issue: #55070 ghstack-source-id: 96bfc16bcb05bfd99eaf0eba1ce1a28265a94802 Pull Request resolved: #68640

Update on "Port cat kernel to structured kernels."

ace85aa

Tracking issue: #55070 [ghstack-poisoned]

ysiraichi added a commit that referenced this pull request Nov 25, 2021

Port cat kernel to structured kernels.

c3cd047

Tracking issue: #55070 ghstack-source-id: 29a4e1af550449b1b182a73cfea37218684e7e22 Pull Request resolved: #68640

ysiraichi marked this pull request as ready for review November 26, 2021 09:44

ysiraichi requested a review from ezyang as a code owner November 26, 2021 09:44

ysiraichi requested a review from bdhirsh November 26, 2021 09:44

bdhirsh reviewed Nov 29, 2021

View reviewed changes

Update on "Port cat kernel to structured kernels."

fdcda77

Tracking issue: #55070 [ghstack-poisoned]

Update on "Port cat kernel to structured kernels."

f48a72e

Tracking issue: #55070 Differential Revision: [D34521686](https://our.internmc.facebook.com/intern/diff/D34521686) [ghstack-poisoned]

ysiraichi added a commit to ysiraichi/pytorch that referenced this pull request Mar 10, 2022

Port cat kernel to structured kernels.

f0e597c

Tracking issue: pytorch#55070 ghstack-source-id: cc4bb13cb10c842b31b3ff5e3be2b5360e927d39 Pull Request resolved: pytorch#68640

Update on "Port cat kernel to structured kernels."

c14cd70

Tracking issue: #55070 Differential Revision: [D34521686](https://our.internmc.facebook.com/intern/diff/D34521686) [ghstack-poisoned]

suo removed the ciflow/default label Mar 22, 2022

Update on "Port cat kernel to structured kernels."

a5377c4

Tracking issue: #55070 Differential Revision: [D34521686](https://our.internmc.facebook.com/intern/diff/D34521686) [ghstack-poisoned]

Update on "Port cat kernel to structured kernels."

90dd20f

Tracking issue: #55070 Differential Revision: [D34521686](https://our.internmc.facebook.com/intern/diff/D34521686) [ghstack-poisoned]

Update on "Port cat kernel to structured kernels."

95b2799

Tracking issue: #55070 Differential Revision: [D34521686](https://our.internmc.facebook.com/intern/diff/D34521686) [ghstack-poisoned]

pytorchmergebot closed this in 22a10ce Apr 14, 2022

ysiraichi added topic: not user facing topic category release notes: cpp release notes category labels Apr 15, 2022

ezyang reopened this Apr 15, 2022

mehtanirav closed this Apr 19, 2022

mikeiovine mentioned this pull request Apr 20, 2022

[SR] Fix broken unit test build #76111

Closed

facebook-github-bot deleted the gh/ysiraichi/37/head branch April 23, 2022 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port `cat` kernel to structured kernels. #68640

Port `cat` kernel to structured kernels. #68640

ysiraichi commented Nov 19, 2021 •

edited

pytorch-probot bot commented Nov 19, 2021 •

edited

⚛️ CI Flow

facebook-github-bot commented Nov 19, 2021 •

edited

pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (1/1)

ysiraichi Nov 19, 2021

bdhirsh Dec 10, 2021

bdhirsh Dec 10, 2021 •

edited

bdhirsh Dec 10, 2021

AnirudhDagar Dec 14, 2021 •

edited

ysiraichi Nov 19, 2021

bdhirsh Nov 29, 2021

ysiraichi Dec 2, 2021

ysiraichi Nov 19, 2021

bdhirsh Nov 29, 2021 •

edited

zou3519 Nov 29, 2021

bdhirsh Nov 29, 2021

bdhirsh Nov 29, 2021

bdhirsh Nov 29, 2021

ezyang commented Mar 8, 2022

ysiraichi commented Apr 10, 2022

ezyang commented Apr 12, 2022

ezyang commented Apr 12, 2022

ysiraichi commented Apr 12, 2022

ezyang commented Apr 14, 2022

github-actions bot commented Apr 14, 2022

ezyang commented Apr 15, 2022

mehtanirav commented Apr 19, 2022

ysiraichi commented Apr 20, 2022

ezyang commented Apr 20, 2022

ysiraichi commented Apr 21, 2022

Port cat kernel to structured kernels. #68640

Port cat kernel to structured kernels. #68640

Conversation

ysiraichi commented Nov 19, 2021 • edited

pytorch-probot bot commented Nov 19, 2021 • edited

⚛️ CI Flow

facebook-github-bot commented Nov 19, 2021 • edited

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (1/1)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdhirsh Dec 10, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AnirudhDagar Dec 14, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdhirsh Nov 29, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ezyang commented Mar 8, 2022

ysiraichi commented Apr 10, 2022

ezyang commented Apr 12, 2022

ezyang commented Apr 12, 2022

ysiraichi commented Apr 12, 2022

ezyang commented Apr 14, 2022

github-actions bot commented Apr 14, 2022

ezyang commented Apr 15, 2022

mehtanirav commented Apr 19, 2022

ysiraichi commented Apr 20, 2022

ezyang commented Apr 20, 2022

ysiraichi commented Apr 21, 2022

Port `cat` kernel to structured kernels. #68640

Port `cat` kernel to structured kernels. #68640

ysiraichi commented Nov 19, 2021 •

edited

pytorch-probot bot commented Nov 19, 2021 •

edited

facebook-github-bot commented Nov 19, 2021 •

edited

bdhirsh Dec 10, 2021 •

edited

AnirudhDagar Dec 14, 2021 •

edited

bdhirsh Nov 29, 2021 •

edited