{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":116207739,"defaultBranch":"master","name":"pytorch","ownerLogin":"sighingnow","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2018-01-04T03:02:41.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/7144772?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1697507308.0","currentOid":""},"activityList":{"items":[{"before":"e0e15a4ac61648cc8f63f0ab102c32e8884fb5d1","after":"93a9b1314b4bc88ccddc0aa438d4d332955027a8","ref":"refs/heads/master","pushedAt":"2023-10-20T02:08:19.000Z","pushType":"push","commitsCount":121,"pusher":{"login":"sighingnow","name":"Tao He","path":"/sighingnow","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7144772?s=80&v=4"},"commit":{"message":"Make step() faster by passing in a tensor vs scalar 1 (#111084)\n\nThis is the culminated result of https://github.com/pytorch/pytorch/pull/110954#issuecomment-1758520411.\n\nWe are making the code slightly more complicated to gain some perf in minimizing calls to `.copy_()` and `.to()`.\n\n### Code\n```\nimport torch\nwith torch.cuda.device(0):\n    steps = [torch.zeros((), device=\"cpu\", dtype=torch.float32) for i in range(1000)]\n\n    with torch.profiler.profile(\n        activities=[\n            torch.profiler.ProfilerActivity.CPU,\n            torch.profiler.ProfilerActivity.CUDA,\n        ]\n    ) as p:\n        # New code:\n        # step_device = steps[0].device\n        # one = torch.tensor(1.0, device=step_device) if str(step_device) == \"cpu\" else 1\n        # torch._foreach_add_(steps, one, 1.0)\n\n        # Old code:\n        torch._foreach_add_(steps, 1)\n\n    print(p.key_averages().table(sort_by=\"cpu_time_total\"))\n```\n\n### Profiles\n**with old code**\n```\n-------------------------  ------------  ------------  ------------  ------------  ------------  ------------\n                     Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls\n-------------------------  ------------  ------------  ------------  ------------  ------------  ------------\n      aten::_foreach_add_        35.31%      52.089ms        99.99%     147.495ms     147.495ms             1\n               aten::add_        25.05%      36.949ms        64.68%      95.406ms      95.406us          1000\n                 aten::to         3.97%       5.852ms        39.63%      58.457ms      58.457us          1000\n           aten::_to_copy        10.11%      14.917ms        35.66%      52.605ms      52.605us          1000\n              aten::copy_        21.65%      31.939ms        21.65%      31.939ms      31.939us          1000\n      aten::empty_strided         3.90%       5.749ms         3.90%       5.749ms       5.749us          1000\n    cudaDeviceSynchronize         0.01%      18.000us         0.01%      18.000us      18.000us             1\n-------------------------  ------------  ------------  ------------  ------------  ------------  ------------\nSelf CPU time total: 147.513ms\n```\n\n**with new code**\n```\n-------------------------  ------------  ------------  ------------  ------------  ------------  ------------\n                     Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls\n-------------------------  ------------  ------------  ------------  ------------  ------------  ------------\n      aten::_foreach_add_        55.06%      49.963ms        99.86%      90.625ms      90.625ms             1\n               aten::add_        44.81%      40.662ms        44.81%      40.662ms      40.662us          1000\n            aten::detach_         0.01%       8.000us         0.05%      45.000us      45.000us             1\n                  detach_         0.04%      37.000us         0.04%      37.000us      37.000us             1\n              aten::empty         0.03%      30.000us         0.03%      30.000us      30.000us             1\n                 aten::to         0.03%      23.000us         0.03%      23.000us      23.000us             1\n    cudaDeviceSynchronize         0.02%      22.000us         0.02%      22.000us      22.000us             1\n         aten::lift_fresh         0.01%       6.000us         0.01%       6.000us       6.000us             1\n-------------------------  ------------  ------------  ------------  ------------  ------------  ------------\nSelf CPU time total: 90.751ms\n```\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/111084\nApproved by: https://github.com/albanD\nghstack dependencies: #111079","shortMessageHtmlLink":"Make step() faster by passing in a tensor vs scalar 1 (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"1938695965\" data-permission-text=\"Title is private\" data-url=\"https://github.com/pytorch/pytorch/issues/111084\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/pytorch/pytorch/pull/111084/hovercard\" href=\"https://github.com/pytorch/pytorch/pull/111084\">pytorch#111084</a>)"}},{"before":"74dcd1fb5fd62185205bec276b6e0e2e0e41e306","after":null,"ref":"refs/heads/ht/fix-typo","pushedAt":"2023-10-17T01:48:28.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"sighingnow","name":"Tao He","path":"/sighingnow","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7144772?s=80&v=4"}},{"before":"ee8accfc62fa14b19da7e51033b76d3076c984a8","after":"74dcd1fb5fd62185205bec276b6e0e2e0e41e306","ref":"refs/heads/ht/fix-typo","pushedAt":"2023-10-16T05:39:45.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"sighingnow","name":"Tao He","path":"/sighingnow","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7144772?s=80&v=4"},"commit":{"message":"Fixes a typo in docstring\n\nSigned-off-by: Tao He <sighingnow@gmail.com>","shortMessageHtmlLink":"Fixes a typo in docstring"}},{"before":null,"after":"ee8accfc62fa14b19da7e51033b76d3076c984a8","ref":"refs/heads/ht/fix-typo","pushedAt":"2023-10-16T05:22:05.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"sighingnow","name":"Tao He","path":"/sighingnow","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7144772?s=80&v=4"},"commit":{"message":"Fixes a typo in docstring\n\nSigned-off-by: Tao He <sighingnow@gmail.com>","shortMessageHtmlLink":"Fixes a typo in docstring"}},{"before":"cc70a33e747ef38c5242476e34af63086f5600aa","after":"e0e15a4ac61648cc8f63f0ab102c32e8884fb5d1","ref":"refs/heads/master","pushedAt":"2023-10-16T05:21:23.000Z","pushType":"push","commitsCount":10000,"pusher":{"login":"sighingnow","name":"Tao He","path":"/sighingnow","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7144772?s=80&v=4"},"commit":{"message":"update int4 tinygemm kernels (#111327)\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/111327\nApproved by: https://github.com/msaroufim\nghstack dependencies: #111314","shortMessageHtmlLink":"update int4 tinygemm kernels (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"1943800856\" data-permission-text=\"Title is private\" data-url=\"https://github.com/pytorch/pytorch/issues/111327\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/pytorch/pytorch/pull/111327/hovercard\" href=\"https://github.com/pytorch/pytorch/pull/111327\">pytorch#111327</a>)"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADm3562gA","startCursor":null,"endCursor":null}},"title":"Activity · sighingnow/pytorch"}