Skip to content

Conversation

mengluy0125
Copy link
Contributor

@mengluy0125 mengluy0125 commented Jul 31, 2024

Summary:
We observed that many introduced nodes during split cat and batch fusion pattern optimization did not have example value meta data, which will cause problems in our follow up pattern optimizations, thus we add all missing values.

We also fix bugs in some meta update and corner case bug for the old pattern, which caused problems in the follow up pattern optimization.

We delete merge_stack_tahn_unbind_pass pattern, which was designed for cmf model, and it could be replaced by the more advanced pattern we added, thus we remove it for easy maintenance.

Test Plan:

unit test

buck2 test //caffe2/test/inductor:split_cat_fx_passes

Test UI: https://www.internalfb.com/intern/testinfra/testrun/15481123762720165
Network: Up: 230KiB Down: 702KiB (reSessionID-756346bf-6da3-4fa0-8d03-1b4fd61e0a7a)
Jobs completed: 30. Time elapsed: 7:23.9s.
Cache hits: 20%. Commands: 5 (cached: 1, remote: 0, local: 4)
Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0

buck2 test @mode/opt pytorch/diff_train_tests/ads/optimus:local_pt2_runner

Network: Up: 1.3GiB Down: 84MiB (reSessionID-ff135cdd-e42c-4ab5-8217-907ada465f01)
Jobs completed: 61. Time elapsed: 21:56.5s.
Cache hits: 0%. Commands: 39 (cached: 0, remote: 0, local: 39)
Tests finished: Pass 8. Fail 0. Fatal 0. Skip 0. Build failure 0

benchmark

CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run @mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697

Counter({'pattern_matcher_nodes': 752, 'pattern_matcher_count': 732, 'normalization_pass': 328, 'normalization_aten_pass': 12, 'scmerge_cat_removed': 5, 'scmerge_cat_added': 4, 'scmerge_split_removed': 3, 'unbind_stack_pass': 3, 'batch_tanh': 2, 'scmerge_split_sections_removed': 2, 'scmerge_split_added': 2, 'optimize_cat_inputs_pass': 1, 'unbind_cat_to_view_pass': 1, 'fxgraph_cache_miss': 1})

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Copy link

pytorch-bot bot commented Jul 31, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132297

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (8 Unrelated Failures)

As of commit 6254680 with merge base 9853c04 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60364165

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60364165

mengluy0125 added a commit to mengluy0125/pytorch that referenced this pull request Jul 31, 2024
…h#132297)

Summary:
Pull Request resolved: pytorch#132297

We observed that many introduced nodes during split cat and batch fusion pattern optimization did not have example value meta data, which will cause problems in our follow up pattern optimizations, thus we add all missing values.

We also fix bugs in some meta update and corner case bug for the old pattern, which caused problems in the follow up pattern optimization.

We delete merge_stack_tahn_unbind_pass pattern, which was designed for cmf model, and it could be replaced by the more advanced pattern we added, thus we remove it for easy maintenance.

Test Plan:
# unit test
```
buck2 test //caffe2/test/inductor:split_cat_fx_passes
```

Test UI: https://www.internalfb.com/intern/testinfra/testrun/3940649915381395
Network: Up: 90KiB  Down: 118KiB  (reSessionID-a4a438a0-b043-4ae0-8cf4-a83a5150d0fa)
Jobs completed: 28. Time elapsed: 4:57.9s.
Cache hits: 0%. Commands: 2 (cached: 0, remote: 0, local: 2)
Tests finished: Pass 10. Fail 0. Fatal 0. Skip 1. Build failure 0

```
buck2 test mode/opt pytorch/diff_train_tests/ads/optimus:local_pt2_runner
```

Network: Up: 1.3GiB  Down: 84MiB  (reSessionID-ff135cdd-e42c-4ab5-8217-907ada465f01)
Jobs completed: 61. Time elapsed: 21:56.5s.
Cache hits: 0%. Commands: 39 (cached: 0, remote: 0, local: 39)
Tests finished: Pass 8. Fail 0. Fatal 0. Skip 0. Build failure 0

# benchmark

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697
```

Counter({'pattern_matcher_nodes': 752, 'pattern_matcher_count': 732, 'normalization_pass': 328, 'normalization_aten_pass': 12, 'scmerge_cat_removed': 5, 'scmerge_cat_added': 4, 'scmerge_split_removed': 3, 'unbind_stack_pass': 3, 'batch_tanh': 2, 'scmerge_split_sections_removed': 2, 'scmerge_split_added': 2, 'optimize_cat_inputs_pass': 1, 'unbind_cat_to_view_pass': 1, 'fxgraph_cache_miss': 1})

Differential Revision: D60364165
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60364165

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60364165

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60364165

pytorch-bot bot pushed a commit that referenced this pull request Jul 31, 2024
Summary:
Pull Request resolved: #132297

We observed that many introduced nodes during split cat and batch fusion pattern optimization did not have example value meta data, which will cause problems in our follow up pattern optimizations, thus we add all missing values.

We also fix bugs in some meta update and corner case bug for the old pattern, which caused problems in the follow up pattern optimization.

We delete merge_stack_tahn_unbind_pass pattern, which was designed for cmf model, and it could be replaced by the more advanced pattern we added, thus we remove it for easy maintenance.

Test Plan:
# unit test
```
buck2 test //caffe2/test/inductor:split_cat_fx_passes
```

Test UI: https://www.internalfb.com/intern/testinfra/testrun/3940649915381395
Network: Up: 90KiB  Down: 118KiB  (reSessionID-a4a438a0-b043-4ae0-8cf4-a83a5150d0fa)
Jobs completed: 28. Time elapsed: 4:57.9s.
Cache hits: 0%. Commands: 2 (cached: 0, remote: 0, local: 2)
Tests finished: Pass 10. Fail 0. Fatal 0. Skip 1. Build failure 0

```
buck2 test mode/opt pytorch/diff_train_tests/ads/optimus:local_pt2_runner
```

Network: Up: 1.3GiB  Down: 84MiB  (reSessionID-ff135cdd-e42c-4ab5-8217-907ada465f01)
Jobs completed: 61. Time elapsed: 21:56.5s.
Cache hits: 0%. Commands: 39 (cached: 0, remote: 0, local: 39)
Tests finished: Pass 8. Fail 0. Fatal 0. Skip 0. Build failure 0

# benchmark

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697
```

Counter({'pattern_matcher_nodes': 752, 'pattern_matcher_count': 732, 'normalization_pass': 328, 'normalization_aten_pass': 12, 'scmerge_cat_removed': 5, 'scmerge_cat_added': 4, 'scmerge_split_removed': 3, 'unbind_stack_pass': 3, 'batch_tanh': 2, 'scmerge_split_sections_removed': 2, 'scmerge_split_added': 2, 'optimize_cat_inputs_pass': 1, 'unbind_cat_to_view_pass': 1, 'fxgraph_cache_miss': 1})

Differential Revision: D60364165
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 31, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60364165

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60364165

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60364165

mengluy0125 added a commit to mengluy0125/pytorch that referenced this pull request Aug 1, 2024
…h#132297)

Summary:
Pull Request resolved: pytorch#132297

We observed that many introduced nodes during split cat and batch fusion pattern optimization did not have example value meta data, which will cause problems in our follow up pattern optimizations, thus we add all missing values.

We also fix bugs in some meta update and corner case bug for the old pattern, which caused problems in the follow up pattern optimization.

We delete merge_stack_tahn_unbind_pass pattern, which was designed for cmf model, and it could be replaced by the more advanced pattern we added, thus we remove it for easy maintenance.

Test Plan:
# unit test
```
buck2 test //caffe2/test/inductor:split_cat_fx_passes
```

Test UI: https://www.internalfb.com/intern/testinfra/testrun/15481123762720165
Network: Up: 230KiB  Down: 702KiB  (reSessionID-756346bf-6da3-4fa0-8d03-1b4fd61e0a7a)
Jobs completed: 30. Time elapsed: 7:23.9s.
Cache hits: 20%. Commands: 5 (cached: 1, remote: 0, local: 4)
Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0

```
buck2 test mode/opt pytorch/diff_train_tests/ads/optimus:local_pt2_runner
```

Network: Up: 1.3GiB  Down: 84MiB  (reSessionID-ff135cdd-e42c-4ab5-8217-907ada465f01)
Jobs completed: 61. Time elapsed: 21:56.5s.
Cache hits: 0%. Commands: 39 (cached: 0, remote: 0, local: 39)
Tests finished: Pass 8. Fail 0. Fatal 0. Skip 0. Build failure 0

# benchmark

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697
```

Counter({'pattern_matcher_nodes': 752, 'pattern_matcher_count': 732, 'normalization_pass': 328, 'normalization_aten_pass': 12, 'scmerge_cat_removed': 5, 'scmerge_cat_added': 4, 'scmerge_split_removed': 3, 'unbind_stack_pass': 3, 'batch_tanh': 2, 'scmerge_split_sections_removed': 2, 'scmerge_split_added': 2, 'optimize_cat_inputs_pass': 1, 'unbind_cat_to_view_pass': 1, 'fxgraph_cache_miss': 1})

Reviewed By: jackiexu1992

Differential Revision: D60364165
…h#132297)

Summary:
Pull Request resolved: pytorch#132297

We observed that many introduced nodes during split cat and batch fusion pattern optimization did not have example value meta data, which will cause problems in our follow up pattern optimizations, thus we add all missing values.

We also fix bugs in some meta update and corner case bug for the old pattern, which caused problems in the follow up pattern optimization.

We delete merge_stack_tahn_unbind_pass pattern, which was designed for cmf model, and it could be replaced by the more advanced pattern we added, thus we remove it for easy maintenance.

Test Plan:
# unit test
```
buck2 test //caffe2/test/inductor:split_cat_fx_passes
```

Test UI: https://www.internalfb.com/intern/testinfra/testrun/15481123762720165
Network: Up: 230KiB  Down: 702KiB  (reSessionID-756346bf-6da3-4fa0-8d03-1b4fd61e0a7a)
Jobs completed: 30. Time elapsed: 7:23.9s.
Cache hits: 20%. Commands: 5 (cached: 1, remote: 0, local: 4)
Tests finished: Pass 9. Fail 0. Fatal 0. Skip 1. Build failure 0

```
buck2 test mode/opt pytorch/diff_train_tests/ads/optimus:local_pt2_runner
```

Network: Up: 1.3GiB  Down: 84MiB  (reSessionID-ff135cdd-e42c-4ab5-8217-907ada465f01)
Jobs completed: 61. Time elapsed: 21:56.5s.
Cache hits: 0%. Commands: 39 (cached: 0, remote: 0, local: 39)
Tests finished: Pass 8. Fail 0. Fatal 0. Skip 0. Build failure 0

# benchmark

```
CUDA_VISIBLE_DEVICES=3 OC_CAUSE=1 buck2 run mode/opt //scripts/jackiexu0313/pt2:local_model_with_pt2 -- --test_mode batch-split --model_type "ig_ctr" --flow_id 584880697
```

Counter({'pattern_matcher_nodes': 752, 'pattern_matcher_count': 732, 'normalization_pass': 328, 'normalization_aten_pass': 12, 'scmerge_cat_removed': 5, 'scmerge_cat_added': 4, 'scmerge_split_removed': 3, 'unbind_stack_pass': 3, 'batch_tanh': 2, 'scmerge_split_sections_removed': 2, 'scmerge_split_added': 2, 'optimize_cat_inputs_pass': 1, 'unbind_cat_to_view_pass': 1, 'fxgraph_cache_miss': 1})

Reviewed By: jackiexu1992

Differential Revision: D60364165
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60364165

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team Raised by workflow job

@mengluy0125
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team Raised by workflow job

@dshi7
Copy link
Contributor

dshi7 commented Aug 2, 2024

@pytorchbot merge -f 'unrelated'

Copy link

pytorch-bot bot commented Aug 2, 2024

You need to provide a reason for using force merge, in the format @pytorchbot merge -f 'Explanation'.
The explanation needs to be clear on why this is needed. Here are some good examples:

  • Bypass checks due to unrelated upstream failures from ...
  • This is a minor fix to ..., which shouldn't break anything
  • This is pre-tested in a previous CI run
  • Bypass flaky ... check

@dshi7
Copy link
Contributor

dshi7 commented Aug 2, 2024

@pytorchbot merge -f 'bypass flaky tests'

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants