{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":346494449,"defaultBranch":"master","name":"pytorch","ownerLogin":"lithuak","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2021-03-10T21:17:13.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/225642?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1641830627.71297","currentOid":""},"activityList":{"items":[{"before":"0cfc5899f9bade72c7e18666e2006b003b5848bc","after":"0f88d93b10fbe715afa7affb7ee0a3da90f406cd","ref":"refs/heads/master","pushedAt":"2023-09-09T13:37:47.000Z","pushType":"push","commitsCount":352,"pusher":{"login":"lithuak","name":"Ilya Persky","path":"/lithuak","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/225642?s=80&v=4"},"commit":{"message":"decomposition spectral ops fixes (#108360)\n\nFixes https://github.com/pytorch/pytorch/issues/105986, https://github.com/pytorch/pytorch/issues/108204, https://github.com/pytorch/pytorch/issues/108205\n\nFix all issues flagged when making changes for https://github.com/pytorch/pytorch/pull/107421\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/108360\nApproved by: https://github.com/ezyang","shortMessageHtmlLink":"decomposition spectral ops fixes (pytorch#108360)"}},{"before":"5251ae6fb7995f0d26f2438179fb5d5a75b9eafd","after":"d4230e55748c66c72e7a17b1cd08540b742b20a5","ref":"refs/heads/viable/strict","pushedAt":"2023-09-09T13:37:38.000Z","pushType":"push","commitsCount":365,"pusher":{"login":"lithuak","name":"Ilya Persky","path":"/lithuak","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/225642?s=80&v=4"},"commit":{"message":"reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/108883\nApproved by: https://github.com/voznesenskym, https://github.com/huydhn","shortMessageHtmlLink":"reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and…"}},{"before":"11602ac564c0e3178b38a65e09be13644322d303","after":"5251ae6fb7995f0d26f2438179fb5d5a75b9eafd","ref":"refs/heads/viable/strict","pushedAt":"2023-08-29T06:39:28.000Z","pushType":"push","commitsCount":389,"pusher":{"login":"lithuak","name":"Ilya Persky","path":"/lithuak","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/225642?s=80&v=4"},"commit":{"message":"Explicitly include iostream (#108103)\n\nSummary: Similar to D48568760\n\nTest Plan: Sandcastle\n\nReviewed By: osalpekar\n\nDifferential Revision: D48758708\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/108103\nApproved by: https://github.com/osalpekar","shortMessageHtmlLink":"Explicitly include iostream (pytorch#108103)"}},{"before":"5ed60477a7cd930298dbdab71046ec62024427c4","after":"0cfc5899f9bade72c7e18666e2006b003b5848bc","ref":"refs/heads/master","pushedAt":"2023-08-29T06:39:05.000Z","pushType":"push","commitsCount":370,"pusher":{"login":"lithuak","name":"Ilya Persky","path":"/lithuak","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/225642?s=80&v=4"},"commit":{"message":"[inductor] Improved grid_sampler_2d decomposition for cuda (#104710)\n\nDescription:\n- Improved grid_sampler_2d decomposition code to generate single cuda kernel instead of two\n\nRelated to https://github.com/pytorch/pytorch/issues/104296\n\nPerfs:\n- speed-up on cuda (~x5) and cpu (~x2) for bicubic mode\n\n```\nSpeed-up PR vs Nightly = ratio between columns \"Compiled (2.1.0a0+git52598e9) PR\" and \"Compiled (2.1.0a0+gitcf76938) Nightly\"\n\n[------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cpu -------------------------------------------------------------------------------------------------------------------------------]\n | Eager (2.1.0a0+git52598e9) PR | Compiled (2.1.0a0+git52598e9) PR | Compiled (2.1.0a0+gitcf76938) Nightly | speed-up PR vs Nightly | Eager (2.1.0a0+gitcf76938) Nightly\n1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear | 38.010 (+-0.118) | 51.466 (+-1.257) | 47.867 (+-0.124) | 0.930 (+-0.000) | 33.654 (+-0.411)\n Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear | 35.532 (+-0.236) | 52.189 (+-0.093) | 58.979 (+-0.206) | 1.130 (+-0.000) | 32.543 (+-0.198)\n Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear | 38.187 (+-0.112) | 47.892 (+-0.117) | 45.833 (+-0.081) | 0.957 (+-0.000) | 33.752 (+-0.116)\n Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear | 36.708 (+-0.244) | 51.680 (+-0.104) | 58.360 (+-0.108) | 1.129 (+-0.000) | 32.576 (+-0.751)\n Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest | 24.201 (+-0.088) | 27.451 (+-0.059) | 27.937 (+-0.081) | 1.018 (+-0.000) | 24.367 (+-0.074)\n Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest | 19.266 (+-0.105) | 26.070 (+-0.085) | 26.092 (+-0.054) | 1.001 (+-0.000) | 20.144 (+-0.064)\n Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest | 24.293 (+-0.125) | 26.085 (+-0.064) | 26.575 (+-0.061) | 1.019 (+-0.000) | 24.515 (+-0.095)\n Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest | 19.440 (+-0.075) | 25.252 (+-0.059) | 25.259 (+-0.051) | 1.000 (+-0.000) | 19.770 (+-0.070)\n Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic | 114.900 (+-0.508) | 113.416 (+-1.271) | 248.679 (+-1.431) | 2.193 (+-0.000) | 114.609 (+-0.515)\n Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic | 115.973 (+-0.555) | 124.711 (+-1.596) | 282.187 (+-2.418) | 2.263 (+-0.000) | 115.368 (+-0.652)\n Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic | 111.730 (+-0.562) | 110.914 (+-0.865) | 253.899 (+-2.226) | 2.289 (+-0.000) | 111.285 (+-1.226)\n Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic | 112.859 (+-0.487) | 131.696 (+-1.298) | 294.124 (+-1.963) | 2.233 (+-0.000) | 110.910 (+-0.969)\n\nTimes are in milliseconds (ms).\n\n[------------------------------------------------------------------------------------------------------------------------------- Affine grid sampling, cuda ------------------------------------------------------------------------------------------------------------------------------]\n | Eager (2.1.0a0+git52598e9) PR | Compiled (2.1.0a0+git52598e9) PR | Compiled (2.1.0a0+gitcf76938) Nightly | speed-up PR vs Nightly | Eager (2.1.0a0+gitcf76938) Nightly\n1 threads: --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bilinear | 228.811 (+-0.037) | 92.990 (+-0.446) | 92.648 (+-0.286) | 0.996 (+-0.000) | 228.274 (+-0.067)\n Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bilinear | 222.107 (+-0.076) | 93.247 (+-0.387) | 92.528 (+-0.423) | 0.992 (+-0.000) | 221.922 (+-0.297)\n Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bilinear | 235.654 (+-0.055) | 75.781 (+-0.566) | 115.865 (+-0.419) | 1.529 (+-0.000) | 236.032 (+-0.111)\n Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bilinear | 226.752 (+-0.088) | 76.312 (+-0.328) | 116.468 (+-0.477) | 1.526 (+-0.000) | 226.950 (+-0.027)\n Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=nearest | 225.540 (+-0.013) | 75.638 (+-0.341) | 72.621 (+-0.292) | 0.960 (+-0.000) | 225.937 (+-0.017)\n Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=nearest | 217.425 (+-0.024) | 75.484 (+-0.545) | 73.518 (+-0.296) | 0.974 (+-0.000) | 217.793 (+-0.008)\n Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=nearest | 231.474 (+-0.020) | 75.972 (+-0.339) | 73.030 (+-0.387) | 0.961 (+-0.000) | 231.991 (+-0.184)\n Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=nearest | 223.408 (+-0.016) | 75.622 (+-0.279) | 73.542 (+-0.336) | 0.973 (+-0.000) | 223.893 (+-0.021)\n Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=True, mode=bicubic | 319.382 (+-0.023) | 149.060 (+-0.190) | 772.116 (+-0.266) | 5.180 (+-0.000) | 320.549 (+-0.387)\n Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=True, mode=bicubic | 319.987 (+-0.134) | 154.443 (+-0.014) | 797.651 (+-0.232) | 5.165 (+-0.000) | 320.665 (+-0.397)\n Input: (8, 3, 345, 456) torch.float32, torch.contiguous_format, align_corners=False, mode=bicubic | 326.138 (+-0.439) | 149.092 (+-0.036) | 772.508 (+-0.259) | 5.181 (+-0.000) | 325.751 (+-0.398)\n Input: (8, 3, 345, 456) torch.float32, torch.channels_last, align_corners=False, mode=bicubic | 326.024 (+-0.118) | 154.452 (+-0.209) | 797.756 (+-0.229) | 5.165 (+-0.000) | 326.870 (+-0.372)\n\nTimes are in microseconds (us).\n\n```\n\n[Source](https://raw.githubusercontent.com/vfdev-5/pth-inductor-dev/master/output/20230828-134459-affine-grid-sampler-PR-vs-Nightly-speedup.md)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/104710\nApproved by: https://github.com/lezcano","shortMessageHtmlLink":"[inductor] Improved grid_sampler_2d decomposition for cuda (pytorch#1…"}},{"before":"89de0485638bd1e4c8819f9e6a349a58a54b8ab9","after":"11602ac564c0e3178b38a65e09be13644322d303","ref":"refs/heads/viable/strict","pushedAt":"2023-08-21T11:49:08.000Z","pushType":"push","commitsCount":50,"pusher":{"login":"lithuak","name":"Ilya Persky","path":"/lithuak","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/225642?s=80&v=4"},"commit":{"message":"[dynamo] fix disable_saved_tensors_hooks - graph break (#106875)\n\n```python\ndef wrapper_fn(x):\n with torch.autograd.graph.disable_saved_tensors_hooks(\"ERROR\"):\n y = x + 1\n print(\"HI\")\n return y + 2\n\nx = torch.randn(())\n\na = wrapper_fn(x)\nopt = torch.compile(wrapper_fn, backend='eager', fullgraph=False)\ne = opt(x)\n```\n\nWithout the fix fails with,\n```\nTraceback (most recent call last):\n File \"/home/kshiteej/Pytorch/pytorch_functorch/test/test_trace_grad.py\", line 182, in \n e = opt(x)\n File \"/home/kshiteej/Pytorch/pytorch_functorch/torch/_dynamo/eval_frame.py\", line 333, in _fn\n return fn(*args, **kwargs)\n File \"/home/kshiteej/Pytorch/pytorch_functorch/test/test_trace_grad.py\", line 165, in wrapper_fn\n def wrapper_fn(x):\nAttributeError: module 'torch.autograd.graph' has no attribute 'disable_saved_tensors_hook'\n```\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/106875\nApproved by: https://github.com/zou3519","shortMessageHtmlLink":"[dynamo] fix disable_saved_tensors_hooks - graph break (pytorch#106875)"}},{"before":"fd214aa8be7160cba5c6efec7a1b06c9c33b37a4","after":"5ed60477a7cd930298dbdab71046ec62024427c4","ref":"refs/heads/master","pushedAt":"2023-08-21T11:48:53.000Z","pushType":"push","commitsCount":169,"pusher":{"login":"lithuak","name":"Ilya Persky","path":"/lithuak","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/225642?s=80&v=4"},"commit":{"message":"Optimize load inline via pch (#106696)\n\nAdd PreCompiled Header(PCH) to reduce load_inline build time.\nPCH is gcc built-in mechanism: https://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Precompiled-Headers.html\n\nAdd PCH for '#include '. This file will used in all load_inline modules. All load_inline modules can take benifit from this PR.\n\nChanges:\n1. Add PCH signature to guarantee PCH(gch) file take effect.\n2. Unification get cxx compiler funtions.\n3. Unification get build flags funtions.\n\nBefore this PR:\n![image](https://github.com/pytorch/pytorch/assets/8433590/f190cdcb-236c-4312-b165-d419a7efafe3)\n\nAdded this PR:\n![image](https://github.com/pytorch/pytorch/assets/8433590/b45c5ad3-e902-4fc8-b450-743cf73505a4)\n\nCompiling time is reduced from 14.06s to 7.36s.\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/106696\nApproved by: https://github.com/jgong5, https://github.com/jansel","shortMessageHtmlLink":"Optimize load inline via pch (pytorch#106696)"}},{"before":"2624da638d989c902ef9e1a5cff6028ab816605c","after":"89de0485638bd1e4c8819f9e6a349a58a54b8ab9","ref":"refs/heads/viable/strict","pushedAt":"2023-08-18T07:58:37.000Z","pushType":"push","commitsCount":96,"pusher":{"login":"lithuak","name":"Ilya Persky","path":"/lithuak","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/225642?s=80&v=4"},"commit":{"message":"[BE] Use allocator to allocate workspace (#107178)\n\nAs suggested in https://github.com/pytorch/pytorch/pull/106844#discussion_r1293839247 it's better to just allocate DataPtr than the whole tensor\nPull Request resolved: https://github.com/pytorch/pytorch/pull/107178\nApproved by: https://github.com/albanD\nghstack dependencies: #106977, #106844","shortMessageHtmlLink":"[BE] Use allocator to allocate workspace (pytorch#107178)"}},{"before":"e787872a477b688c9cf0e38b090b7db3d0d7dc3c","after":"2624da638d989c902ef9e1a5cff6028ab816605c","ref":"refs/heads/viable/strict","pushedAt":"2023-08-15T13:45:46.000Z","pushType":"push","commitsCount":10000,"pusher":{"login":"lithuak","name":"Ilya Persky","path":"/lithuak","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/225642?s=80&v=4"},"commit":{"message":"Support third-party devices to use the init_process_group method with… (#107113)\n\n…out specifying the Backend\n\nWhen init_process_group is not been done before, it will automatically apply init_process_group within Devicemesh without specifying the backend. Thus, when a third-party device want to use Devicemesh without doing init_process_group before, there comes a problem. In this PR, add a default_device_backend_map for third-party device users to add their backends to this map when they register their backends to pytorch firstly. When doing init_process_group without parameter backend, it will init the backends in this map. Thus, a third-party user can use init_process_group method without specifying the Backend.\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/107113\nApproved by: https://github.com/wanchaol","shortMessageHtmlLink":"Support third-party devices to use the init_process_group method with… ("}},{"before":"9266b2af73311792a746a2f741cc13145b3701df","after":"fd214aa8be7160cba5c6efec7a1b06c9c33b37a4","ref":"refs/heads/master","pushedAt":"2023-08-15T11:29:42.000Z","pushType":"push","commitsCount":10000,"pusher":{"login":"lithuak","name":"Ilya Persky","path":"/lithuak","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/225642?s=80&v=4"},"commit":{"message":"Revert \"Add OnCompletion Hook to ProcessGroup (#106988)\"\n\nThis reverts commit ba1da47e8fa95ca0dd8b2d63430f7eb54fdbbccb.\n\nReverted https://github.com/pytorch/pytorch/pull/106988 on behalf of https://github.com/huydhn due to Sorry for reverting you change, but it is failing Windows build with some linker error. The Windows failures on PR looks legit ([comment](https://github.com/pytorch/pytorch/pull/106988#issuecomment-1678580899))","shortMessageHtmlLink":"Revert \"Add OnCompletion Hook to ProcessGroup (pytorch#106988)\""}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADfOB9egA","startCursor":null,"endCursor":null}},"title":"Activity · lithuak/pytorch"}