{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":116207739,"defaultBranch":"master","name":"pytorch","ownerLogin":"sighingnow","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2018-01-04T03:02:41.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/7144772?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1697507308.0","currentOid":""},"activityList":{"items":[{"before":"e0e15a4ac61648cc8f63f0ab102c32e8884fb5d1","after":"93a9b1314b4bc88ccddc0aa438d4d332955027a8","ref":"refs/heads/master","pushedAt":"2023-10-20T02:08:19.000Z","pushType":"push","commitsCount":121,"pusher":{"login":"sighingnow","name":"Tao He","path":"/sighingnow","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7144772?s=80&v=4"},"commit":{"message":"Make step() faster by passing in a tensor vs scalar 1 (#111084)\n\nThis is the culminated result of https://github.com/pytorch/pytorch/pull/110954#issuecomment-1758520411.\n\nWe are making the code slightly more complicated to gain some perf in minimizing calls to `.copy_()` and `.to()`.\n\n### Code\n```\nimport torch\nwith torch.cuda.device(0):\n steps = [torch.zeros((), device=\"cpu\", dtype=torch.float32) for i in range(1000)]\n\n with torch.profiler.profile(\n activities=[\n torch.profiler.ProfilerActivity.CPU,\n torch.profiler.ProfilerActivity.CUDA,\n ]\n ) as p:\n # New code:\n # step_device = steps[0].device\n # one = torch.tensor(1.0, device=step_device) if str(step_device) == \"cpu\" else 1\n # torch._foreach_add_(steps, one, 1.0)\n\n # Old code:\n torch._foreach_add_(steps, 1)\n\n print(p.key_averages().table(sort_by=\"cpu_time_total\"))\n```\n\n### Profiles\n**with old code**\n```\n------------------------- ------------ ------------ ------------ ------------ ------------ ------------\n Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls\n------------------------- ------------ ------------ ------------ ------------ ------------ ------------\n aten::_foreach_add_ 35.31% 52.089ms 99.99% 147.495ms 147.495ms 1\n aten::add_ 25.05% 36.949ms 64.68% 95.406ms 95.406us 1000\n aten::to 3.97% 5.852ms 39.63% 58.457ms 58.457us 1000\n aten::_to_copy 10.11% 14.917ms 35.66% 52.605ms 52.605us 1000\n aten::copy_ 21.65% 31.939ms 21.65% 31.939ms 31.939us 1000\n aten::empty_strided 3.90% 5.749ms 3.90% 5.749ms 5.749us 1000\n cudaDeviceSynchronize 0.01% 18.000us 0.01% 18.000us 18.000us 1\n------------------------- ------------ ------------ ------------ ------------ ------------ ------------\nSelf CPU time total: 147.513ms\n```\n\n**with new code**\n```\n------------------------- ------------ ------------ ------------ ------------ ------------ ------------\n Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls\n------------------------- ------------ ------------ ------------ ------------ ------------ ------------\n aten::_foreach_add_ 55.06% 49.963ms 99.86% 90.625ms 90.625ms 1\n aten::add_ 44.81% 40.662ms 44.81% 40.662ms 40.662us 1000\n aten::detach_ 0.01% 8.000us 0.05% 45.000us 45.000us 1\n detach_ 0.04% 37.000us 0.04% 37.000us 37.000us 1\n aten::empty 0.03% 30.000us 0.03% 30.000us 30.000us 1\n aten::to 0.03% 23.000us 0.03% 23.000us 23.000us 1\n cudaDeviceSynchronize 0.02% 22.000us 0.02% 22.000us 22.000us 1\n aten::lift_fresh 0.01% 6.000us 0.01% 6.000us 6.000us 1\n------------------------- ------------ ------------ ------------ ------------ ------------ ------------\nSelf CPU time total: 90.751ms\n```\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/111084\nApproved by: https://github.com/albanD\nghstack dependencies: #111079","shortMessageHtmlLink":"Make step() faster by passing in a tensor vs scalar 1 (pytorch#111084)"}},{"before":"74dcd1fb5fd62185205bec276b6e0e2e0e41e306","after":null,"ref":"refs/heads/ht/fix-typo","pushedAt":"2023-10-17T01:48:28.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"sighingnow","name":"Tao He","path":"/sighingnow","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7144772?s=80&v=4"}},{"before":"ee8accfc62fa14b19da7e51033b76d3076c984a8","after":"74dcd1fb5fd62185205bec276b6e0e2e0e41e306","ref":"refs/heads/ht/fix-typo","pushedAt":"2023-10-16T05:39:45.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"sighingnow","name":"Tao He","path":"/sighingnow","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7144772?s=80&v=4"},"commit":{"message":"Fixes a typo in docstring\n\nSigned-off-by: Tao He ","shortMessageHtmlLink":"Fixes a typo in docstring"}},{"before":null,"after":"ee8accfc62fa14b19da7e51033b76d3076c984a8","ref":"refs/heads/ht/fix-typo","pushedAt":"2023-10-16T05:22:05.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"sighingnow","name":"Tao He","path":"/sighingnow","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7144772?s=80&v=4"},"commit":{"message":"Fixes a typo in docstring\n\nSigned-off-by: Tao He ","shortMessageHtmlLink":"Fixes a typo in docstring"}},{"before":"cc70a33e747ef38c5242476e34af63086f5600aa","after":"e0e15a4ac61648cc8f63f0ab102c32e8884fb5d1","ref":"refs/heads/master","pushedAt":"2023-10-16T05:21:23.000Z","pushType":"push","commitsCount":10000,"pusher":{"login":"sighingnow","name":"Tao He","path":"/sighingnow","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7144772?s=80&v=4"},"commit":{"message":"update int4 tinygemm kernels (#111327)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/111327\nApproved by: https://github.com/msaroufim\nghstack dependencies: #111314","shortMessageHtmlLink":"update int4 tinygemm kernels (pytorch#111327)"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADm3562gA","startCursor":null,"endCursor":null}},"title":"Activity ยท sighingnow/pytorch"}