[RELAND] Force synced KJT to trace unbacked SymInt (#108960) #109216

ezyang · 2023-09-13T17:45:32Z

Summary:

The basic concept behind this diff is to modify Dynamo's tracing behavior when it encounters a KeyedJaggedTensor that is synced (aka has _length_per_key and _offset_per_key populated). These fields are lists of integers; ordinarily, Dynamo will optimistically try to specialize on integers, however, for KJTs, we know that these integers will definitely vary from run-to-run. Furthermore, ordinarily, we would also specialize these integers if they are 0/1, but we will frequently expect features in KJTs to be 0/1.

The fix is to detect KJTs and treat these integers as unbacked integers. This is NOT a universally sound optimization: when treating these integers as unbacked, we never report them as equal to zero or one. In return, we always generate graphs that generalize no matter the length of values on features. This is enough to trace through APS sparse arch, torchrec_dlrm and some small split-cat examples.

The special integer behavior is triggered by a dynamically scoped force_unspec_int_unbacked_size_like variable on TracingContext, which we trigger when we wrap a KJT. There probably are other ways to do this, but this was simple and worked.

Test Plan:

buck2 test mode/dev-nosan //pytorch/benchmark/fb/test_gpu:run_test_gpu

from aakhundov

first build feed_lower_benchmark:

buck2 build --show-output mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true hpc/new/models/feed/benchmark:feed_lower_benchmark

then run the lowering of the model with it:

TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" TORCH_COMPILE_DEBUG=1 ../buck-out/v2/gen/fbcode/79c6b019ee0f9469/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/960999465/60/gpu_lowering/input.predictor --skip-trt --skip-ait --sync-mode=0 --enable-aot-inductor --lower-presets="ig_stories" --gpu-trace

cf https://docs.google.com/document/d/1yD30xYrdmM8r2HTdmXnZTg0-MHVexfVrAa0294m1AUE/edit?pli=1#heading=h.qiv3fp7e6zg0

From torchrec: https://www.internalfb.com/intern/wiki/Torchrec/Development/Testing_production_models/

From ge0405
baseline (without your diff): f477293168
your diff: f477292363

buck2 test //caffe2/test/dynamo:test_dynamo_torchrec
buck2 run 'fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- 'pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu.test_train_blue_reels_vdd_v3_inductor_speedup'

Differential Revision: D49236757

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov @kadeng

pytorch-bot · 2023-09-13T17:45:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109216

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 93c783b with merge base afad0d0 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2023-09-13T17:45:39Z

This pull request was exported from Phabricator. Differential Revision: D49236757

ezyang · 2023-09-13T17:46:53Z

torch/_dynamo/config.py

+# they (1) generalize immediately and (2) unsoundly never compare equal to
+# 0/1.  This is not on by default as AOTAutograd/Inductor cannot currently
+# compile this code; however, this can be useful for export.
+force_unspec_int_unbacked_size_like_on_torchrec_kjt = False


The delta is we are now flag guarded.

Summary: The basic concept behind this diff is to modify Dynamo's tracing behavior when it encounters a KeyedJaggedTensor that is synced (aka has `_length_per_key` and `_offset_per_key` populated). These fields are lists of integers; ordinarily, Dynamo will optimistically try to specialize on integers, however, for KJTs, we know that these integers will definitely vary from run-to-run. Furthermore, ordinarily, we would also specialize these integers if they are 0/1, but we will frequently expect features in KJTs to be 0/1. The fix is to detect KJTs and treat these integers as *unbacked integers*. This is NOT a universally sound optimization: when treating these integers as unbacked, we never report them as equal to zero or one. In return, we always generate graphs that generalize no matter the length of values on features. This is enough to trace through APS sparse arch, torchrec_dlrm and some small split-cat examples. The special integer behavior is triggered by a dynamically scoped `force_unspec_int_unbacked_size_like` variable on TracingContext, which we trigger when we wrap a KJT. There probably are other ways to do this, but this was simple and worked. Test Plan: ``` buck2 test mode/dev-nosan //pytorch/benchmark/fb/test_gpu:run_test_gpu ``` from aakhundov 1. first build feed_lower_benchmark: ``` buck2 build --show-output mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true hpc/new/models/feed/benchmark:feed_lower_benchmark ``` 2. then run the lowering of the model with it: ``` TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" TORCH_COMPILE_DEBUG=1 ../buck-out/v2/gen/fbcode/79c6b019ee0f9469/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/960999465/60/gpu_lowering/input.predictor --skip-trt --skip-ait --sync-mode=0 --enable-aot-inductor --lower-presets="ig_stories" --gpu-trace ``` cf https://docs.google.com/document/d/1yD30xYrdmM8r2HTdmXnZTg0-MHVexfVrAa0294m1AUE/edit?pli=1#heading=h.qiv3fp7e6zg0 From torchrec: https://www.internalfb.com/intern/wiki/Torchrec/Development/Testing_production_models/ From ge0405 baseline (without your diff): f477293168 your diff: f477292363 ``` buck2 test //caffe2/test/dynamo:test_dynamo_torchrec buck2 run 'fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- 'pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu.test_train_blue_reels_vdd_v3_inductor_speedup' ``` Differential Revision: D49236757

facebook-github-bot · 2023-09-13T17:48:25Z

This pull request was exported from Phabricator. Differential Revision: D49236757

facebook-github-bot · 2023-09-15T16:35:55Z

This pull request was exported from Phabricator. Differential Revision: D49236757

Summary: This is a re-land of pytorch#108960 The basic concept behind this diff is to modify Dynamo's tracing behavior when it encounters a KeyedJaggedTensor that is synced (aka has `_length_per_key` and `_offset_per_key` populated). These fields are lists of integers; ordinarily, Dynamo will optimistically try to specialize on integers, however, for KJTs, we know that these integers will definitely vary from run-to-run. Furthermore, ordinarily, we would also specialize these integers if they are 0/1, but we will frequently expect features in KJTs to be 0/1. The fix is to detect KJTs and treat these integers as *unbacked integers*. This is NOT a universally sound optimization: when treating these integers as unbacked, we never report them as equal to zero or one. In return, we always generate graphs that generalize no matter the length of values on features. This is enough to trace through APS sparse arch, torchrec_dlrm and some small split-cat examples. The special integer behavior is triggered by a dynamically scoped `force_unspec_int_unbacked_size_like` variable on TracingContext, which we trigger when we wrap a KJT. There probably are other ways to do this, but this was simple and worked. Test Plan: ``` buck2 test mode/dev-nosan //pytorch/benchmark/fb/test_gpu:run_test_gpu ``` from aakhundov 1. first build feed_lower_benchmark: ``` buck2 build --show-output mode/opt -c python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true hpc/new/models/feed/benchmark:feed_lower_benchmark ``` 2. then run the lowering of the model with it: ``` TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 TORCH_LOGS="output_code,graph_code" TORCH_COMPILE_DEBUG=1 ../buck-out/v2/gen/fbcode/79c6b019ee0f9469/hpc/new/models/feed/benchmark/__feed_lower_benchmark__/feed_lower_benchmark.par --load=manifold://ig_inference_model/tree/user/facebook/fblearner/predictor/960999465/60/gpu_lowering/input.predictor --skip-trt --skip-ait --sync-mode=0 --enable-aot-inductor --lower-presets="ig_stories" --gpu-trace ``` cf https://docs.google.com/document/d/1yD30xYrdmM8r2HTdmXnZTg0-MHVexfVrAa0294m1AUE/edit?pli=1#heading=h.qiv3fp7e6zg0 From torchrec: https://www.internalfb.com/intern/wiki/Torchrec/Development/Testing_production_models/ From ge0405 baseline (without your diff): f477293168 your diff: f477292363 ``` buck2 test //caffe2/test/dynamo:test_dynamo_torchrec buck2 run 'fbcode//mode/opt' fbcode//pytorch/benchmark/fb/test_gpu:run_test_gpu -- 'pytorch.benchmark.fb.test_gpu.test_gpu.TestBenchmarkFbGpu.test_train_blue_reels_vdd_v3_inductor_speedup' ``` Reviewed By: voznesenskym Differential Revision: D49236757

facebook-github-bot · 2023-09-15T21:08:14Z

This pull request was exported from Phabricator. Differential Revision: D49236757

facebook-github-bot · 2023-09-15T21:08:47Z

This pull request was exported from Phabricator. Differential Revision: D49236757

facebook-github-bot · 2023-09-18T14:37:58Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2023-09-18T14:39:38Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

facebook-github-bot added the fb-exported label Sep 13, 2023

github-actions bot added module: dynamo ciflow/inductor labels Sep 13, 2023

ezyang changed the title ~~Force synced KJT to trace unbacked SymInt (#108960)~~ [RELAND] Force synced KJT to trace unbacked SymInt (#108960) Sep 13, 2023

ezyang commented Sep 13, 2023

View reviewed changes

ezyang force-pushed the export-D49236757 branch from 59fb225 to 003af11 Compare September 13, 2023 17:48

ezyang added topic: new features topic category release notes: dynamo labels Sep 14, 2023

voznesenskym approved these changes Sep 14, 2023

View reviewed changes

ezyang force-pushed the export-D49236757 branch from 003af11 to 67f4d18 Compare September 15, 2023 16:35

ezyang force-pushed the export-D49236757 branch from 67f4d18 to 93c783b Compare September 15, 2023 21:08

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 18, 2023

pytorchmergebot added the merging label Sep 18, 2023

pytorchmergebot added Merged and removed merging labels Sep 18, 2023

pytorchmergebot closed this in 88600e7 Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELAND] Force synced KJT to trace unbacked SymInt (#108960) #109216

[RELAND] Force synced KJT to trace unbacked SymInt (#108960) #109216

ezyang commented Sep 13, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 13, 2023 •

edited

Loading

facebook-github-bot commented Sep 13, 2023

ezyang Sep 13, 2023

facebook-github-bot commented Sep 13, 2023

facebook-github-bot commented Sep 15, 2023

facebook-github-bot commented Sep 15, 2023

facebook-github-bot commented Sep 15, 2023

facebook-github-bot commented Sep 18, 2023

pytorchmergebot commented Sep 18, 2023

[RELAND] Force synced KJT to trace unbacked SymInt (#108960) #109216

[RELAND] Force synced KJT to trace unbacked SymInt (#108960) #109216

Conversation

ezyang commented Sep 13, 2023 • edited by pytorch-bot bot Loading

pytorch-bot bot commented Sep 13, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109216

✅ No Failures

facebook-github-bot commented Sep 13, 2023

ezyang Sep 13, 2023

Choose a reason for hiding this comment

facebook-github-bot commented Sep 13, 2023

facebook-github-bot commented Sep 15, 2023

facebook-github-bot commented Sep 15, 2023

facebook-github-bot commented Sep 15, 2023

facebook-github-bot commented Sep 18, 2023

pytorchmergebot commented Sep 18, 2023

Merge started

ezyang commented Sep 13, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 13, 2023 •

edited

Loading