-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs][MPS] Add mps environment variable table #129008
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129008
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 408c627 with merge base 9a7e251 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 6905ac57e91d92d3c30d9be625cdfa7e403bdcf4 Pull Request resolved: #129008
ghstack-source-id: 98f71f85ac9b9ea53c44cac317d22b2245b7db90 Pull Request resolved: #129008
ghstack-source-id: e7fa7fbfd215d9727b61d530241a16ecd564cb49 Pull Request resolved: #129008
ghstack-source-id: b7f196ebe495fd814d817a84b50cea35f48c0365 Pull Request resolved: #129008
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
ghstack-source-id: 006b50c0d43befe8ec719905d6dba73b7f80b3e3 Pull Request resolved: #129008
@pytorchbot merge -f "doc tests passed" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Allow users to decide whether they want to have fast math enabled via env var Pull Request resolved: #129007 Approved by: https://github.com/malfet ghstack dependencies: #129006, #129008
This PR generalizes the multi_tensor_apply function for other fused optimizers Pull Request resolved: #129105 Approved by: https://github.com/malfet ghstack dependencies: #129006, #129008, #129007
``` [-------------------------------------- Fused SGD --------------------------------------] | Fused: True | Fused: False 1 threads: ------------------------------------------------------------------------------ numel: 1024, num_tensors: 100, momentum: True | 2 | 15 numel: 1024, num_tensors: 100, momentum: False | 2 | 5 numel: 65536, num_tensors: 100, momentum: True | 3 | 16 numel: 65536, num_tensors: 100, momentum: False | 2 | 5 numel: 1048576, num_tensors: 100, momentum: True | 11 | 16 numel: 1048576, num_tensors: 100, momentum: False | 8 | 6 numel: 1024, num_tensors: 500, momentum: True | 29 | 70 numel: 1024, num_tensors: 500, momentum: False | 20 | 24 numel: 65536, num_tensors: 500, momentum: True | 33 | 76 numel: 65536, num_tensors: 500, momentum: False | 22 | 26 numel: 1048576, num_tensors: 500, momentum: True | 70 | 80 numel: 1048576, num_tensors: 500, momentum: False | 43 | 40 numel: 1024, num_tensors: 1000, momentum: True | 108 | 139 numel: 1024, num_tensors: 1000, momentum: False | 72 | 48 numel: 65536, num_tensors: 1000, momentum: True | 116 | 150 numel: 65536, num_tensors: 1000, momentum: False | 77 | 52 numel: 1048576, num_tensors: 1000, momentum: True | 190 | 170 numel: 1048576, num_tensors: 1000, momentum: False | 120 | 50 ``` ```python def profile_fused_sgd(): from torch.optim.sgd import sgd import torch.utils.benchmark as benchmark import itertools def profile(fn, params, grads, momentum_buffer_list, fused): fn( params, grads, momentum_buffer_list, momentum=True if len(momentum_buffer_list) > 0 else False, dampening=0.0, nesterov=False, foreach=False, fused=fused, lr=1e-3, weight_decay=.0, maximize=False, grad_scale=None, found_inf=None, ) torch.mps.synchronize() device = "mps" results = [] for num_tensors, numel, momentum in itertools.product([100, 500, 1000], [1024, 65536, 1048576], [True, False]): sublabel = f"numel: {numel}, num_tensors: {num_tensors}, momentum: {momentum}" print(sublabel) params, grads = [[torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] for _ in range(2)] momentum_buffer_list = [torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] if momentum else [] fn = sgd for fused in [True, False]: t = benchmark.Timer( stmt='profile(fn, params, grads, momentum_buffer_list, fused)', label='Fused SGD', sub_label=sublabel, globals=locals(), description= f"Fused: {fused}", ).blocked_autorange(min_run_time=5) results.append(t) compare = benchmark.Compare(results) compare.trim_significant_figures() compare.colorize(rowwise=True) compare.print() ``` Pull Request resolved: #129350 Approved by: https://github.com/janeyx99 ghstack dependencies: #129006, #129008, #129007, #129105
Stack from ghstack (oldest at bottom):