add torchbench for Distributed Shampoo Optimizer v2 #2616

minddrummer · 2024-05-21T20:04:40Z

Summary:

There is no optimizer that has been integrated into TorchBench. Distributed Shampoo is quite complicate, and has a direct dependency on Pytorch. This creates a need to add it to torchbench to guardrail it from Pytorch 2.0 changes.
This diff is to realize this feature, and particularly to enable Distributed Shampoo on Torchbench in Eager mode. I will create a follow up diff to add py2 compile feature.
For the current design of integration:
-- Pick Ads DHEN CMF 5x model, since CMF is a major MC model
-- choose optimizer stage alone benchmarking, rather than a full e2e benchmarking. This is because the computation of optimizer step itself is relatively ligher than fwd and bwd; and picking the e2e would make the optimizer step stage benchmarking results being shadowed by other stages(fwd, bwd) and make the benchmarking result not sensitive
-- build on top of originall ads_dhen_5x pipeline, and skip the fwd and bwd stage, and also set up the Shampoo config inside the Model init stage
-- For Distributed Shampoo, there is a matrix root inverse computation, and in production, this is decided by precondition_frequency and its presence is trivial in the overall computation. And here for torchbench, we also skip it: by add the iteration count to bypass first root inverse compute. I.e.: Inside _prepare_before_optimizer func.
-- Eventually the torchbench would do the following: 1. initialize the ads_dhen_cmf 5x model on a local gpu, preload the data, and do fwd and bwd; 2. change some state variable of Shampoo(iteration step for preconditioning etc), and get the optimizer ready; 3. benchmarking the optimizer with torchbench pipeline, and return the results back

05/16:

update the diff given the Shampoo v2 impl

Reviewed By: xuzhao9

Differential Revision: D51192560

facebook-github-bot · 2024-05-21T20:04:49Z

This pull request was exported from Phabricator. Differential Revision: D51192560

netlify · 2024-05-21T20:04:58Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`2a576d9`
🔍 Latest deploy log	https://app.netlify.com/sites/pytorch-fbgemm-docs/deploys/664d1c8c0a392d0008d11da4
😎 Deploy Preview	https://deploy-preview-2616--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Summary: - There is no optimizer that has been integrated into TorchBench. Distributed Shampoo is quite complicate, and has a direct dependency on Pytorch. This creates a need to add it to torchbench to guardrail it from Pytorch 2.0 changes. - This diff is to realize this feature, and particularly to enable Distributed Shampoo on Torchbench in Eager mode. I will create a follow up diff to add py2 compile feature. - For the current design of integration: -- Pick Ads DHEN CMF 5x model, since CMF is a major MC model -- choose optimizer stage alone benchmarking, rather than a full e2e benchmarking. This is because the computation of optimizer step itself is relatively ligher than fwd and bwd; and picking the e2e would make the optimizer step stage benchmarking results being shadowed by other stages(fwd, bwd) and make the benchmarking result not sensitive -- build on top of originall ads_dhen_5x pipeline, and skip the fwd and bwd stage, and also set up the Shampoo config inside the Model __init__ stage -- For Distributed Shampoo, there is a matrix root inverse computation, and in production, this is decided by precondition_frequency and its presence is trivial in the overall computation. And here for torchbench, we also skip it: by add the iteration count to bypass first root inverse compute. I.e.: Inside _prepare_before_optimizer func. -- Eventually the torchbench would do the following: 1. initialize the ads_dhen_cmf 5x model on a local gpu, preload the data, and do fwd and bwd; 2. change some state variable of Shampoo(iteration step for preconditioning etc), and get the optimizer ready; 3. benchmarking the optimizer with torchbench pipeline, and return the results back 05/16: - update the diff given the Shampoo v2 impl Reviewed By: xuzhao9 Differential Revision: D51192560

facebook-github-bot · 2024-05-21T22:13:16Z

This pull request was exported from Phabricator. Differential Revision: D51192560

facebook-github-bot · 2024-05-21T22:13:39Z

This pull request was exported from Phabricator. Differential Revision: D51192560

facebook-github-bot · 2024-05-22T04:37:03Z

This pull request has been merged in d7a5500.

facebook-github-bot added the cla signed label May 21, 2024

facebook-github-bot added the fb-exported label May 21, 2024

minddrummer force-pushed the export-D51192560 branch from 5873dc6 to c38db34 Compare May 21, 2024 22:13

minddrummer force-pushed the export-D51192560 branch from c38db34 to 2a576d9 Compare May 21, 2024 22:13

facebook-github-bot closed this in d7a5500 May 22, 2024

facebook-github-bot added the Merged label May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add torchbench for Distributed Shampoo Optimizer v2 #2616

add torchbench for Distributed Shampoo Optimizer v2 #2616

minddrummer commented May 21, 2024

facebook-github-bot commented May 21, 2024

netlify bot commented May 21, 2024 •

edited

Loading

facebook-github-bot commented May 21, 2024

facebook-github-bot commented May 21, 2024

facebook-github-bot commented May 22, 2024

add torchbench for Distributed Shampoo Optimizer v2 #2616

add torchbench for Distributed Shampoo Optimizer v2 #2616

Conversation

minddrummer commented May 21, 2024

facebook-github-bot commented May 21, 2024

netlify bot commented May 21, 2024 • edited Loading

✅ Deploy Preview for pytorch-fbgemm-docs ready!

facebook-github-bot commented May 21, 2024

facebook-github-bot commented May 21, 2024

facebook-github-bot commented May 22, 2024

netlify bot commented May 21, 2024 •

edited

Loading