{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":131370498,"defaultBranch":"master","name":"pytorch","ownerLogin":"huaxz1986","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2018-04-28T03:51:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/3395177?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1613639915.8490849","currentOid":""},"activityList":{"items":[{"before":"be0b12ece576c86c5f059d15a64dcba0eb886ddd","after":"a39ea6f21361e531ce7e703224bfbce7fc564083","ref":"refs/heads/master","pushedAt":"2023-04-19T12:55:34.000Z","pushType":"push","commitsCount":8,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"Delete even more files\n\nEntire branch is about to get deleted, it needs only .github and README","shortMessageHtmlLink":"Delete even more files"}},{"before":"6e1e27fc4e36edc7d8dad602de7e8250ad16073b","after":"be0b12ece576c86c5f059d15a64dcba0eb886ddd","ref":"refs/heads/master","pushedAt":"2023-04-16T23:46:03.000Z","pushType":"push","commitsCount":7,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"make untemplated gemm calls data_ptr-correct (#99184)\n\nmake untemplated gemm calls data_ptr-correct\n\nTest Plan: Rely on CI.\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/99184\nApproved by: https://github.com/ezyang","shortMessageHtmlLink":"make untemplated gemm calls data_ptr-correct (pytorch#99184)"}},{"before":"039faf0dbf75c8e6bb3c097c1b8d257eebb74c45","after":"6e1e27fc4e36edc7d8dad602de7e8250ad16073b","ref":"refs/heads/master","pushedAt":"2023-04-16T04:30:53.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[inductor] Refactor pre-grad passes into inductor.fx_passes (#99130)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/99130\nApproved by: https://github.com/ngimel","shortMessageHtmlLink":"[inductor] Refactor pre-grad passes into inductor.fx_passes (pytorch#…"}},{"before":"157c869026bd0aa866e0138d5ed57d09966863fc","after":"039faf0dbf75c8e6bb3c097c1b8d257eebb74c45","ref":"refs/heads/master","pushedAt":"2023-04-16T02:18:09.000Z","pushType":"push","commitsCount":25,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"Add invariant that all symbolic shapes must be bound in graph (#99089)\n\nPreviously, we had a problem when partitioning forward-backward dynamic graphs, which is that we could end up with a backward graph that mentions a symbol in an input tensor (e.g., `f32[s0 + s1]`), but without this symbol being otherwise bound elsewhere. When this happens, we have no way of actually deriving the values of `s0` and `s1`. Our fix for this in https://github.com/pytorch/pytorch/pull/93059 was to just retrace the graph, so that s0 + s1 got allocated a new symbol s2 and everything was happy. However, this strategy had other problems, namely (1) we lost all information from the previous ShapeEnv, including guards and (2) we end up allocating a LOT of fresh new symbols in backwards.\n\nWith this change, we preserve the same ShapeEnv between forward and backwards. How do we do this? We simply require that every symbol which may be present inside tensors, ALSO be a plain SymInt input to the graph. This invariant is enforced by Dynamo. Once we have done this, we can straightforwardly modify the partitioner to preserve these SymInt as saved for backwards, if they are needed in the backwards graph to preserve the invariant as well.\n\nThis apparently breaks yolov3, but since everything else is OK I'm merging this as obviously good and investigating later.\n\nSigned-off-by: Edward Z. Yang \n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/99089\nApproved by: https://github.com/voznesenskym","shortMessageHtmlLink":"Add invariant that all symbolic shapes must be bound in graph (pytorc…"}},{"before":"3c4622c0ec3528449f834c4ba1fe44c662bc406e","after":"157c869026bd0aa866e0138d5ed57d09966863fc","ref":"refs/heads/master","pushedAt":"2023-04-15T06:00:42.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"Enable FSDP ``use_orig_params=True`` mixed precision training when some ranks have no (non-zero sized) parameter shards (#99175)\n\nFixes #99174\n\n## Enable FSDP ``use_orig_params=True`` mixed precision training when some ranks have no (non-zero sized) parameter shards\n\n### The issue\n\nNow that ``use_orig_params=True`` allows non-uniform ``requires_grad`` (:tada: :rocket: thanks @awgu!!!) with [#98221](https://github.com/pytorch/pytorch/pull/98221), there will be circumstances wherein some ranks have no (non-zero sized) local shards of the original parameters (and hence no associated gradients).\n\n### Use Cases\nFor a simple Transformer case, imagine a user wraps all encoder layers in separate FSDP instances but allows the classifier head to be wrapped in the same FSDP instance as the relatively large embeddings layers. While this is a sub-optimal wrapping strategy for most use-cases, I believe it is expected to be supported (full precision training works in that context).\n\nI originally encountered this issue while extending a package I maintain, leveraging the relaxed ``requires_grad`` contstraint to simplify multi-phase scheduled fine-tuning FSDP configuration, so a [concrete example is there](https://finetuning-scheduler.readthedocs.io/en/latest/advanced/fsdp_scheduled_fine_tuning.html#basic-scheduled-fine-tuning-with-fsdp).\n\n### Reproduction and Remediation\nCurrently, ``ShardedGradScaler`` does not accommodate these situations, failing to initialize ``optimizer_state[\"found_inf_per_device\"]`` when ``unscale_`` is called.\n\nIn this PR, I extend the existing ``ShardedGradScaler`` tests with an ``use_orig_params=True`` dimension added to the parameterization and test scenarios wherein one rank possesses no (non-zero sized) parameter shards.\n\nThe relevant issue can be reproduced with the tests I'm adding in this PR. The current (pre-PR) execution of these tests fail in ``use_orig_params=True`` mode with this error:\n\n```python\n./test_fsdp_sharded_grad_scaler.py::TestShardedGradScalerParityWithDDP::test_fsdp_ddp_parity_with_grad_scaler_offload_false_none_mixed_precision_use_orig_params Failed with Error: Process 0 exited with error code 10 and exception:\nTraceback (most recent call last):\n File \"/home/speediedan/repos/pytorch/torch/testing/_internal/common_distributed.py\", line 657, in run_test\n getattr(self, test_name)()\n File \"/home/speediedan/repos/pytorch/torch/testing/_internal/common_distributed.py\", line 543, in wrapper\n fn()\n File \"/home/speediedan/repos/pytorch/torch/testing/_internal/common_utils.py\", line 259, in instantiated_test\n test(self, **param_kwargs)\n File \"/home/speediedan/repos/pytorch/torch/testing/_internal/common_distributed.py\", line 174, in wrapper\n return func(*args, **kwargs)\n File \"/home/speediedan/repos/pytorch/test/distributed/fsdp/test_fsdp_sharded_grad_scaler.py\", line 187, in test_fsdp_ddp_parity_with_grad_scaler\n self._test_fsdp_parity(\n File \"/home/speediedan/repos/pytorch/torch/testing/_internal/common_fsdp.py\", line 1152, in _test_fsdp_parity\n fsdp_loss = self._train_for_several_steps(\n File \"/home/speediedan/repos/pytorch/torch/testing/_internal/common_fsdp.py\", line 1016, in _train_for_several_steps\n sharded_grad_scaler.step(optim)\n File \"/home/speediedan/repos/pytorch/torch/distributed/fsdp/sharded_grad_scaler.py\", line 291, in step\n return super().step(optimizer, *args, **kwargs)\n File \"/home/speediedan/repos/pytorch/torch/cuda/amp/grad_scaler.py\", line 368, in step\n assert len(optimizer_state[\"found_inf_per_device\"]) > 0, \"No inf checks were recorded for this optimizer.\"\nAssertionError: No inf checks were recorded for this optimizer.\n```\n\nA few implementation notes/considerations and questions:\n\n1. Rather than just initialize ``per_device_found_inf``, one could disable the grad scalar altogether for relevant ranks, altering ``unscale_`` to reduce with a subgroup or some rank mask construct to avoid the ``all_reduce`` s in ``distributed/fsdp/sharded_grad_scaler.py:unscale_()`` from hanging. Given that users may subsequently add parameter groups to an optimizer that would require re-enabling the scaler and the complexity associated with maintaining a separate mask construct or process subgroup, I thought this implementation was cleaner.\n2. I extended ``_train_for_several_steps`` and ``_test_fsdp_parity`` in ``/torch/testing/_internal/common_fsdp.py`` with the ability to configure ``sharded_grad_scaler_kwargs`` for future testing flexibility.\n3. Should the user be warned that no parameter shards were associated with a given rank? My initial thought is that this should be considered an implementation detail, part of supporting ``use_orig_params`` with heterogeneous ``requires_grad``, and therefore should be transparently handled by PyTorch. Should a DEBUG level message be added? If so, likely further upstream rather than at the scaler step level.\n4. Rather than extend the existing ``ShardedGradScaler`` tests with an ``use_orig_params=True`` dimension added to the parameterization, let me know if you prefer that I instead narrow the scope of the new testing to a single additional test, e.g.:\n\t```python\n\t# from typing import Optional\n\tfrom typing import Optional, List\n\t# ...\n\t# use_orig_params = [\"enable_use_orig_params\", None]\n\tuse_orig_params: List[Optional[str]] = [None]\n\t# ...\n\tconfigs = list(itertools.product(cpu_offload_config, sharding_strategy_config, mixed_precision, use_orig_params))\n\tconfigs.append((CPUOffload(offload_params=False), None, \"enable_mixed_precision\", \"enable_use_orig_params\"))\n\t```\nThanks as always to the PyTorch distributed team for your astonishingly impressive and valuable contributions to the open-source ML engineering community!\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/99175\nApproved by: https://github.com/awgu","shortMessageHtmlLink":"Enable FSDP use_orig_params=True mixed precision training when so…"}},{"before":"c0d9a0268dff39d797d551ac3c4c8a70f79d9dc2","after":"3c4622c0ec3528449f834c4ba1fe44c662bc406e","ref":"refs/heads/master","pushedAt":"2023-04-15T02:31:06.000Z","pushType":"push","commitsCount":13,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"Patch failing slow-test logic for inductor-dynamic (#99182)\n\nFixes #98954\n\nBut.. I'm not sure what the right fix is\nPull Request resolved: https://github.com/pytorch/pytorch/pull/99182\nApproved by: https://github.com/huydhn","shortMessageHtmlLink":"Patch failing slow-test logic for inductor-dynamic (pytorch#99182)"}},{"before":"670c5cf96249db28cde757da5a6aa97569760102","after":"c0d9a0268dff39d797d551ac3c4c8a70f79d9dc2","ref":"refs/heads/master","pushedAt":"2023-04-14T23:06:02.000Z","pushType":"push","commitsCount":92,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[inductor] Use FakeTensorMode() when creating patterns (#99128)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/99128\nApproved by: https://github.com/ngimel","shortMessageHtmlLink":"[inductor] Use FakeTensorMode() when creating patterns (pytorch#99128)"}},{"before":"0c0e5c574e2b47d251ac4537633842d03a265cdb","after":"670c5cf96249db28cde757da5a6aa97569760102","ref":"refs/heads/master","pushedAt":"2023-04-13T12:43:15.000Z","pushType":"push","commitsCount":54,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"AOTAutograd: fix 'Trying to backward through the graph a second time' error (#98960)\n\nFixes https://github.com/pytorch/pytorch/issues/97745. See discussion and comment in the PR for more details.\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/98960\nApproved by: https://github.com/bertmaher, https://github.com/albanD","shortMessageHtmlLink":"AOTAutograd: fix 'Trying to backward through the graph a second time'…"}},{"before":"d3a1a772b5df977e6f257940b7627c84fefe8d34","after":"0c0e5c574e2b47d251ac4537633842d03a265cdb","ref":"refs/heads/master","pushedAt":"2023-04-12T14:14:53.000Z","pushType":"push","commitsCount":6,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[inductor] Consolidate constant_args and cpp_constant_args (#98742)\n\nSummary: Refactor code to simplify the logic. Support convolution as an\nextern call in CudaWrapperCodeGen.\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/98742\nApproved by: https://github.com/jgong5, https://github.com/jansel","shortMessageHtmlLink":"[inductor] Consolidate constant_args and cpp_constant_args (pytorch#9…"}},{"before":"6ff32b5575abf07ebee74c9e17fc5113d5dd0652","after":"d3a1a772b5df977e6f257940b7627c84fefe8d34","ref":"refs/heads/master","pushedAt":"2023-04-12T06:15:59.000Z","pushType":"push","commitsCount":5,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"inductor: rewrite mkldnn fx fusion using pattern_matcher(conv_transpose_unary) (#97140)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/97140\nApproved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/jansel","shortMessageHtmlLink":"inductor: rewrite mkldnn fx fusion using pattern_matcher(conv_transpo…"}},{"before":"d3a35956de8c33959efe6ac4512468f46d2235a7","after":"6ff32b5575abf07ebee74c9e17fc5113d5dd0652","ref":"refs/heads/master","pushedAt":"2023-04-12T04:31:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[MPS] Expose mps package in torch (#98837)\n\nFixes #98740\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/98837\nApproved by: https://github.com/albanD, https://github.com/Neilblaze","shortMessageHtmlLink":"[MPS] Expose mps package in torch (pytorch#98837)"}},{"before":"6145964ec9a556bc8bfd9a41024fed5501b9c5b6","after":"d3a35956de8c33959efe6ac4512468f46d2235a7","ref":"refs/heads/master","pushedAt":"2023-04-12T04:10:57.000Z","pushType":"push","commitsCount":49,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"Skip dtensor ops on CPU-only runner due to flaky timeout (#98868)\n\n`distributed/_tensor/test_dtensor_ops` is still flaky in trunk with a curious timeout issue, for example https://hud.pytorch.org/pytorch/pytorch/commit/ce4df4cc596aa10534ac6d54912f960238264dfd. It seems that the test just hang without any failure. The root cause is unclear. On the other hang, https://github.com/pytorch/pytorch/issues/98816 might offer a solution for this. Anyway, I'm disable the test on CPU for now while the investigation is being done.\n\nThe test is still being run on CUDA-available runner because it's not flaky there.\nPull Request resolved: https://github.com/pytorch/pytorch/pull/98868\nApproved by: https://github.com/clee2000","shortMessageHtmlLink":"Skip dtensor ops on CPU-only runner due to flaky timeout (pytorch#98868)"}},{"before":"c377a8590b9b654a2cf03c005f5171ca63c38534","after":"6145964ec9a556bc8bfd9a41024fed5501b9c5b6","ref":"refs/heads/master","pushedAt":"2023-04-11T11:37:56.000Z","pushType":"push","commitsCount":5,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"distinguish implementation of data() and mutable_data() on TensorImpl (#98732)\n\nThe old style had them both going through a mutable method on Storage,\nwhich would prevent us from implementing checks differently depending\non whether we are writing or reading.\n\nDifferential Revision: [D44831044](https://our.internmc.facebook.com/intern/diff/D44831044/)\nPull Request resolved: https://github.com/pytorch/pytorch/pull/98732\nApproved by: https://github.com/ezyang","shortMessageHtmlLink":"distinguish implementation of data() and mutable_data() on TensorImpl ("}},{"before":"537c346117967da690c9fe719e27d08ce9d43424","after":"c377a8590b9b654a2cf03c005f5171ca63c38534","ref":"refs/heads/master","pushedAt":"2023-04-11T05:30:59.000Z","pushType":"push","commitsCount":40,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"Add `nonzero_static()` op to pytorch to unblock export (#97417)\n\nSummary: Add new experimental python op (`torch.nonzero_static`) for export. There is NO cuda impl included in this PR\n\nExample:\n\nSay input tensor is `x = torch.tensor([[1, 0], [3, 2]])`\n\ncall regular `nonzero()` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1])`\ncall `nonzero_static(x, size=4)` on x will give you a tensor `tensor([[0, 0], [1, 0], [1, 1], [fill_value, fill_value])` (padded)\ncall `nonzero_static(x, size=2)` on x will give you a tensor `tensor([[0, 0], [1, 0])` (truncated)\n\nTest Plan:\n**Unit Tests**\n```\nbuck test @mode/dev-nosan //caffe2/test:test_dynamo -- 'caffe2/test:test_dynamo - test_export.py::ExportTests::test_export_with_nonzero_static' -- 'caffe2/test:test_dynamo - test_misc.py::MiscTests::test_nonzero_static'\n```\n\n**PT2 Export with `nonzero_static()`**\nExample of `GraphModule` in the exported graph\n```\ndef forward(self, x):\n arg0, = fx_pytree.tree_flatten_spec(([x], {}), self._in_spec)\n nonzero_static_default = torch.ops.aten.nonzero_static.default(arg0, size = 4); arg0 = None\n return pytree.tree_unflatten([nonzero_static_default], self._out_spec)\n```\n\nDifferential Revision: D44324808\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/97417\nApproved by: https://github.com/ezyang","shortMessageHtmlLink":"Add nonzero_static() op to pytorch to unblock export (pytorch#97417)"}},{"before":"2ce6ad9aa9643d9593cd70cce76d43ad06f3e9a1","after":"537c346117967da690c9fe719e27d08ce9d43424","ref":"refs/heads/master","pushedAt":"2023-04-10T13:38:22.000Z","pushType":"push","commitsCount":478,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"feat(add method is_private_use1() in class Device) (#98123)\n\nAs the title\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/98123\nApproved by: https://github.com/bdhirsh","shortMessageHtmlLink":"feat(add method is_private_use1() in class Device) (pytorch#98123)"}},{"before":"4ae4c6f68ab64ddf04026419b39d6095fe2f0513","after":"2ce6ad9aa9643d9593cd70cce76d43ad06f3e9a1","ref":"refs/heads/master","pushedAt":"2023-03-29T13:11:37.135Z","pushType":"push","commitsCount":1,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[inductor] make `run_and_get_cpp_code` signature match `run_and_get_triton_code` (#97826)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/97826\nApproved by: https://github.com/ezyang","shortMessageHtmlLink":"[inductor] make run_and_get_cpp_code signature match `run_and_get_t…"}},{"before":"faccd87658bc558fcb9ef93ce9969fb88c2d3ab4","after":"4ae4c6f68ab64ddf04026419b39d6095fe2f0513","ref":"refs/heads/master","pushedAt":"2023-03-29T12:20:53.215Z","pushType":"push","commitsCount":3,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"Fix typo when setting FSDP state dict config (#97110)\n\n`get_state_dict_type` in FSDP looks for a key called `_optim_state_dict_config` when getting the optimizer state dict config. However, `set_state_dict_type` sets the config at a key called `_optimstate_dict_config`. This looks like a typo.\n\nThis fixes the discrepancy, so that when you set the state dict type, it is correctly used.\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/97110\nApproved by: https://github.com/awgu, https://github.com/fegin","shortMessageHtmlLink":"Fix typo when setting FSDP state dict config (pytorch#97110)"}},{"before":"6871665a973b33aedd0376294645e217978d1495","after":"faccd87658bc558fcb9ef93ce9969fb88c2d3ab4","ref":"refs/heads/master","pushedAt":"2023-03-29T09:50:59.958Z","pushType":"push","commitsCount":1,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[NNC] Fix the issue that the void** could not store a scalar if the bit width of the scalar is greater than 32bit on a 32bit platform (#97669)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/97669\nApproved by: https://github.com/jgong5","shortMessageHtmlLink":"[NNC] Fix the issue that the void** could not store a scalar if the b…"}},{"before":"542fb0b1fad6bf61929df16e2133e9a296820f08","after":"6871665a973b33aedd0376294645e217978d1495","ref":"refs/heads/master","pushedAt":"2023-03-29T08:46:30.510Z","pushType":"push","commitsCount":124,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"Avoid copies in matmul (no ghstack) (#97355)\n\nResubmit of https://github.com/pytorch/pytorch/pull/76828 without using ghstack so that @ngimel can import it and help me debug the issue why it was reverted.\nPull Request resolved: https://github.com/pytorch/pytorch/pull/97355\nApproved by: https://github.com/ngimel, https://github.com/malfet","shortMessageHtmlLink":"Avoid copies in matmul (no ghstack) (pytorch#97355)"}},{"before":"4c0dce50fd002d59c9f2ad85dcf57e7fbfb05071","after":"542fb0b1fad6bf61929df16e2133e9a296820f08","ref":"refs/heads/master","pushedAt":"2023-03-26T22:52:24.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"Specify file encoding in test_torch.py (#97628)\n\nAttempt to fix\n```\nUnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 5260: ordinal not in range(128)\n```\nin https://github.com/pytorch/pytorch/actions/runs/4522628359/jobs/7965372405\n\nIn general, it's a good practice to explicitly specify encoding, as otherwise it depends on environment variable and makes tests failures unpredicatble\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/97628\nApproved by: https://github.com/dagitses, https://github.com/kit1980","shortMessageHtmlLink":"Specify file encoding in test_torch.py (pytorch#97628)"}},{"before":"dc45ad702466e4a73d972d3e1dc0c12ed80d2eef","after":"4c0dce50fd002d59c9f2ad85dcf57e7fbfb05071","ref":"refs/heads/master","pushedAt":"2023-03-26T07:10:54.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[BE] Apply ufmt to run_test and GitHub Python util scripts (#97588)\n\nThis has been bugging me for a while as I'm working on these Python scripts and they are not tracked by ufmt linter. So I add these script into that linter.\n\n```\n[[linter]]\ncode = 'UFMT'\ninclude_patterns = [\n '.github/**/*.py',\n 'test/run_test.py',\n```\n\nThis change should just work and not break anything as ufmt (black + usort) linter is very safe to use for standalone util scripts.\nPull Request resolved: https://github.com/pytorch/pytorch/pull/97588\nApproved by: https://github.com/kit1980","shortMessageHtmlLink":"[BE] Apply ufmt to run_test and GitHub Python util scripts (pytorch#9…"}},{"before":"12da0c70378b5be9135c6fda62a9863bce4a4818","after":"dc45ad702466e4a73d972d3e1dc0c12ed80d2eef","ref":"refs/heads/master","pushedAt":"2023-03-26T02:32:29.000Z","pushType":"push","commitsCount":69,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[inductor] support SymPy exprs in `reflection_pad2d_backward` lowering (#97604)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/97604\nApproved by: https://github.com/ezyang","shortMessageHtmlLink":"[inductor] support SymPy exprs in reflection_pad2d_backward lowering ("}},{"before":"a331cd4314ce50af3e2b251b3d85f738e80b3d40","after":"12da0c70378b5be9135c6fda62a9863bce4a4818","ref":"refs/heads/master","pushedAt":"2023-03-23T23:38:30.000Z","pushType":"push","commitsCount":19,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"Revert \"remove dead torch_pb.h library (#97323)\"\n\nThis reverts commit 364d92f9b6864ce284fa13519c7ca5c87460e477.\n\nReverted https://github.com/pytorch/pytorch/pull/97323 on behalf of https://github.com/malfet due to Reverting as PR dependent on https://github.com/pytorch/pytorch/pull/97322 that has been reverted","shortMessageHtmlLink":"Revert \"remove dead torch_pb.h library (pytorch#97323)\""}},{"before":"a1c46e5f8fe64530af2cc8ee4526855faf14347b","after":"a331cd4314ce50af3e2b251b3d85f738e80b3d40","ref":"refs/heads/master","pushedAt":"2023-03-23T10:29:40.000Z","pushType":"push","commitsCount":165,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[inductor] fix cpp legalize bf16 reduction (#97228)\n\nWhen legalizing bf16 for reduction, operators with result dtype of torch.int64, like argmax, would encounter an assertion error now. The PR fixes for the case of int64, enabling several bf16 models (hf_Reformer, doctr_reco_predictor) to run successfully.\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/97228\nApproved by: https://github.com/jgong5, https://github.com/EikanWang, https://github.com/desertfire","shortMessageHtmlLink":"[inductor] fix cpp legalize bf16 reduction (pytorch#97228)"}},{"before":"34256bc73080d7898138c821273b9f31fab777f8","after":"a1c46e5f8fe64530af2cc8ee4526855faf14347b","ref":"refs/heads/master","pushedAt":"2023-03-18T04:52:22.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"component-level configurable logging for dynamo, inductor, aot (#94858)\n\nSummary:\n\nAdds NNC-like logging that is configured through an env var `TORCH_COMPILE_LOGS`\nExamples:\n`TORCH_LOGS=\"dynamo,guards\" python script.py` - prints dynamo logs at level INFO with guards of all functions that are compiled\n\n`TORCH_LOGS=\"+dynamo,guards,graph\" python script.py` - prints dynamo logs at level DEBUG with guards and graphs (in tabular) format of all graphs that are compiled\n\n[More examples with full output](https://gist.github.com/mlazos/b17f474457308ce15e88c91721ac1cce)\n\nImplementation:\nThe implementation parses the log settings from the environment, finds any components (aot, dynamo, inductor) or other loggable objects (guards, graph, etc.) and generates a log_state object. This object contains all of the enabled artifacts, and a qualified log name -> level mapping. _init_logs then adds handlers to the highest level logs (the registered logs), and sets any artifact loggers to level DEBUG if the artifact is enabled.\n\nNote: set_logs is an alternative for manipulating the log_state, but if the environment contains TORCH_LOGS, the environment settings will be prioritized.\n\nAdding a new log:\nTo add a new log, a dev should add their log name to torch._logging._registrations (there are examples there already).\n\nAdding a new artifact:\nTo add a new artifact, a dev should add their artifact name to torch._logging._registrations as well.\nAdditionally, wherever the artifact is logged, `torch._logging.getArtifactLogger(__name__, )` should be used instead of the standard logging implementation.\n\n[design doc](https://docs.google.com/document/d/1ZRfTWKa8eaPq1AxaiHrq4ASTPouzzlPiuquSBEJYwS8/edit#)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/94858\nApproved by: https://github.com/ezyang","shortMessageHtmlLink":"component-level configurable logging for dynamo, inductor, aot (pytor…"}},{"before":"5842e5c175b6c1681c03271074ce5ae89a2d9012","after":"34256bc73080d7898138c821273b9f31fab777f8","ref":"refs/heads/master","pushedAt":"2023-03-18T02:51:10.000Z","pushType":"push","commitsCount":41,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[inductor] do benchmark in sub processes for max autotuning (#96410)\n\nThis PR implements the support to benchmark max-autotune choices in subprocesses. This way crash like https://github.com/openai/triton/issues/1298 will only abort the autotuning child process but the parent process can continue.\n\nThere are a few things to note:\n- cuda runtime does not work with fork. So we have to use spawn to create child processes. Check the best practice from pytorch multithreading module: https://pytorch.org/docs/stable/notes/multiprocessing.html\n- to run a job in a child process, the multiprocessing module needs to pickle both the target function and arguments and pass them to child process. This is the major complexity of this prototype since there are quite a lot of corner cases making pickle fail.\n\nHere I list the pickle related issues I encountered:\n- pickle a StorageBox cause infinite recursion. Error: https://gist.github.com/171e5ab404b7855dee2dfa1d9f093442 . Work around by pickle the inner buffer.\n- IRNode store fx.Node's in its origin fields. However, we can not pickle a fx.Node. It fails when with the following error when picking the fx.Node.graph: https://gist.github.com/9c289e895d7091d7ec787c67bc3c0d70. Work around by skip origins when pickling a IRNode.\n- jinja Template in TritonTemplateKernel can not be pickled: `TypeError: Template.__new__() missing 1 required positional argument: 'source' `. Workaround by pickle the source rather than jinjia Template. During unpickling, rebuild the jinja template.\n- due to how select_algorithm.template_kernels is populated, in child process, it's empty. Work around by passing select_algorithm.template_kernels from parent process to child process directly.\n - There is some change in TritonTemplate.generate to make a TritonTemplateKernel pickle'able. A TritonTemplate is refered to in the closure for a TritonTemplateKernel object.\n- We can not pass choice to child process directly because of pickle failure for lambda/local function being used. However cloudpickle can handle lambda. Work around by passing the cloudpickle'd choice object to child process. The child project need to unpickle it explictly.\n\nTest:\n```\npython test/inductor/test_max_autotune.py -k test_max_autotune_mm_plus_mm\n```\nThis is basically the repro I get from Bert Maher.\n\nBenchmark in sub process is about 4x slower than benchmark in the same process. Without doing any profiling, I feel the time may be cost by starting a new process and doing initialization. Some ~thread~ process pool may help.\n\n```\nAUTOTUNE ref_mm_plus_mm(2048x64, 64x1536, 2048x64, 64x1536)\n triton_mm_plus_mm_0 0.0276s 100.0%\n triton_mm_plus_mm_6 0.0287s 96.4%\n triton_mm_plus_mm_5 0.0317s 87.1%\n triton_mm_plus_mm_1 0.0328s 84.4%\n ref_mm_plus_mm 0.0379s 73.0%\n triton_mm_plus_mm_7 0.0379s 73.0%\n triton_mm_plus_mm_2 0.0399s 69.2%\n triton_mm_plus_mm_3 0.0410s 67.5%\n triton_mm_plus_mm_4 0.0410s 67.5%\nAUTOTUNE takes 12.001659393310547 seconds\n\nAUTOTUNE ref_mm_plus_mm(2048x64, 64x1536, 2048x64, 64x1536)\n triton_mm_plus_mm_0 0.0276s 100.0%\n triton_mm_plus_mm_6 0.0287s 96.4%\n triton_mm_plus_mm_1 0.0317s 87.1%\n triton_mm_plus_mm_5 0.0317s 87.1%\n ref_mm_plus_mm 0.0379s 73.0%\n triton_mm_plus_mm_7 0.0389s 71.1%\n triton_mm_plus_mm_2 0.0399s 69.2%\n triton_mm_plus_mm_3 0.0410s 67.5%\n triton_mm_plus_mm_4 0.0410s 67.5%\nAUTOTUNE takes 51.39659810066223 seconds\n```\n\nThe feature is disabled by default and can be enabled by setting the following config or envvar:\n```\nautotune_in_subproc = os.environ.get(\"TORCHINDUCTOR_AUTOTUNE_IN_SUBPROC\") == \"1\"\n```\n\nDifferential Revision: [D43996048](https://our.internmc.facebook.com/intern/diff/D43996048)\nPull Request resolved: https://github.com/pytorch/pytorch/pull/96410\nApproved by: https://github.com/jansel","shortMessageHtmlLink":"[inductor] do benchmark in sub processes for max autotuning (pytorch#…"}},{"before":"bf08d1387c11b3dfb2e3f26f88e8c7801e5cc9b8","after":"5842e5c175b6c1681c03271074ce5ae89a2d9012","ref":"refs/heads/master","pushedAt":"2023-03-16T22:44:49.000Z","pushType":"push","commitsCount":13,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"vmap support for torch.tril and torch.triu (#94287)\n\nSummary:\nAdd vmap support for torch.tril and torch.triu.\n\nFix: #91403\n\nTest Plan: GitHub pipeline\n\nDifferential Revision: D43016624\n\n### Expected behavior\nSame as using for-loop:\n\n```python\nimport torch\n\nx = torch.randn(32, 3)\nresults = []\nfor xi in x:\n y = torch.triu(xi)\n results.append(y)\n\"\"\"\ntriu: input tensor must have at least 2 dimensions\n---------------------------------------------------------------------------\nRuntimeError Traceback (most recent call last)\n in \n 4 results = []\n 5 for xi in x:\n----> 6 y = torch.triu(xi)\n 7 results.append(y)\nRuntimeError: triu: input tensor must have at least 2 dimensions\n\"\"\"\n```\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/94287\nApproved by: https://github.com/Skylion007, https://github.com/zou3519","shortMessageHtmlLink":"vmap support for torch.tril and torch.triu (pytorch#94287)"}},{"before":"3162f71787d92852414fd7ac9f0f7b4970bb9ab6","after":"bf08d1387c11b3dfb2e3f26f88e8c7801e5cc9b8","ref":"refs/heads/master","pushedAt":"2023-03-16T10:24:00.000Z","pushType":"push","commitsCount":8,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[primTorch] handle out in `sort` meta function (#96719)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/96719\nApproved by: https://github.com/ezyang","shortMessageHtmlLink":"[primTorch] handle out in sort meta function (pytorch#96719)"}},{"before":"308a58ebcaa703fa2fc4d543ac531283df2e51a8","after":"3162f71787d92852414fd7ac9f0f7b4970bb9ab6","ref":"refs/heads/master","pushedAt":"2023-03-16T04:21:16.000Z","pushType":"push","commitsCount":4,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[memory debugging] Extract frame information from inductor (#95753)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/95753\nApproved by: https://github.com/Chillee","shortMessageHtmlLink":"[memory debugging] Extract frame information from inductor (pytorch#9…"}},{"before":"707d8925646d2f42dedcde6be2f8662ba98d04b2","after":"308a58ebcaa703fa2fc4d543ac531283df2e51a8","ref":"refs/heads/master","pushedAt":"2023-03-16T00:55:56.000Z","pushType":"push","commitsCount":45,"pusher":{"login":"huaxz1986","name":null,"path":"/huaxz1986","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3395177?s=80&v=4"},"commit":{"message":"[FSDP] Rename to _get_orig_buffer_dtypes (#96790)\n\nReland this PR\n\nDifferential Revision: [D44078430](https://our.internmc.facebook.com/intern/diff/D44078430/)\nPull Request resolved: https://github.com/pytorch/pytorch/pull/96790\nApproved by: https://github.com/awgu","shortMessageHtmlLink":"[FSDP] Rename to _get_orig_buffer_dtypes (pytorch#96790)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADHAMnOwA","startCursor":null,"endCursor":null}},"title":"Activity · huaxz1986/pytorch"}