Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mkl implementation for exponential on CPU #69967

Closed
wants to merge 6 commits into from

Conversation

CaoE
Copy link
Collaborator

@CaoE CaoE commented Dec 15, 2021

Description

Add mkl implementation for exponential on CPU to improve the performance of exponential.

Testing

data type: float32
single socket (28cores):

before: torch.Size([10, 128, 10, 124])  0.065 s
        torch.Size([10, 128, 20, 124])  0.130 s
            
after:  torch.Size([10, 128, 10, 124])  5.9e-05 s
        torch.Size([10, 128, 20, 124])  0.000113 s

single core:

before: torch.Size([10, 128, 10, 124])  0.065 s
        torch.Size([10, 128, 20, 124])  0.130 s 

after:  torch.Size([10, 128, 10, 124])  0.00117 s
        torch.Size([10, 128, 20, 124])  0.002347 s

cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@pytorch-probot
Copy link

pytorch-probot bot commented Dec 15, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/CaoE/pytorch/blob/f3412187fedaa72f0f4f06a6ac68c10b279e260e/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries/conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries/libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries/libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries/wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk, ciflow/xla ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/linux, ciflow/rocm, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@jbschlosser
Copy link
Contributor

Hey @mruberry - not sure who the best person to review this internally was - please feel free to reassign as appropriate :)

@jbschlosser jbschlosser added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 15, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 16, 2021

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit 54c8ef6 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@mruberry
Copy link
Collaborator

cc @VitalyFedyunin

@CaoE CaoE force-pushed the exp_optimization2 branch 3 times, most recently from b6c6ec9 to ea43e5f Compare December 31, 2021 07:41
@CaoE CaoE force-pushed the exp_optimization2 branch 2 times, most recently from 9fd25c2 to a7abb44 Compare January 10, 2022 02:04
@CaoE CaoE force-pushed the exp_optimization2 branch 5 times, most recently from f341218 to f75ba5b Compare January 24, 2022 03:01
@izaitsevfb
Copy link
Contributor

https://github.com/facebookresearch/multimodal/blob/afe52dec1987e5a3764539606a55f430994ebbc8/test/modules/losses/test_mdetr_losses.py#L115.
May I know why fixed numbers are used here since test_exponential_sample in test_distribution.py already uses scipy.stats to test sampling results for exponential.
The fixed expected_l1_loss = torch.tensor(0.7721) and expected_giou_loss = torch.tensor(1.1768) is not applicable for new exponential.

Perhaps the test author @ebsmothers can answer that?

facebook-github-bot pushed a commit to facebookresearch/multimodal that referenced this pull request Nov 21, 2022
…nential implementation of PyTorch (#364)

Summary:
pytorch/pytorch#69967 adds a MKL implementation for exponential  in PyTorch. Even the new implementation fits the exponential distribution but it is not the same as the old one.
Change the expected values to accommodate the new exponential implementation of PyTorch.

Pull Request resolved: #364

Reviewed By: pikapecan

Differential Revision: D41430249

Pulled By: ebsmothers

fbshipit-source-id: c3333fd7f4449881a239c6b0c26d22fd63009b68
@CaoE
Copy link
Collaborator Author

CaoE commented Nov 23, 2022

@malfet @izaitsevfb As suggested by @ebsmothers, we change the expected value to match the exact value from the new exponential implementationhttps://github.com/facebookresearch/multimodal/pull/364. Could you please check if this PR will cause internal breakage ?

@facebook-github-bot
Copy link
Contributor

@izaitsevfb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@izaitsevfb
Copy link
Contributor

@malfet @izaitsevfb As suggested by @ebsmothers, we change the expected value to match the exact value from the new exponential implementationhttps://github.com/facebookresearch/multimodal/pull/364. Could you please check if this PR will cause internal breakage ?

I've manually imported this PR to run the internal tests. For Meta employees, see D41587318

@malfet
Copy link
Contributor

malfet commented Nov 29, 2022

@izaitsevfb you need to re-import again, as @CaoE just pushed several commits to the PR after the import.

@izaitsevfb
Copy link
Contributor

izaitsevfb commented Nov 29, 2022

@izaitsevfb you need to re-import again, as @CaoE just pushed several commits to the PR after the import.

I think GH timeline is misleading, the commits were made 2 hours ago, before the import. I'll double check via the code.

UPD: I believe we have the latest version imported.

@izaitsevfb
Copy link
Contributor

@CaoE, there are some failures, related to the values, expected by the internal models that I can't share specifics about. Please don't merge this yet. I'm trying to find a PoC that can provide more context.

@izaitsevfb
Copy link
Contributor

@CaoE, I've confirmed with the test owners, it's all good. The tests didn't expect the RNG change, but they can be forward fixed.

For the diff train oncall: please see the details in D41587318.

@CaoE
Copy link
Collaborator Author

CaoE commented Dec 5, 2022

@CaoE, I've confirmed with the test owners, it's all good. The tests didn't expect the RNG change, but they can be forward fixed.

Thank you @izaitsevfb. May I know if these tests will be fixed for the new RNG in the near future?

@izaitsevfb
Copy link
Contributor

@CaoE, I've confirmed with the test owners, it's all good. The tests didn't expect the RNG change, but they can be forward fixed.

Thank you @izaitsevfb. May I know if these tests will be fixed for the new RNG in the near future?

I think there are plans to rewrite the tests to not rely on RNG specifics, but I don't have the specifics. Please don't worry about that, as I've confirmed that the internal test failures are expected and can be forward-fixed.

@CaoE
Copy link
Collaborator Author

CaoE commented Dec 12, 2022

I think there are plans to rewrite the tests to not rely on RNG specifics, but I don't have the specifics. Please don't worry about that, as I've confirmed that the internal test failures are expected and can be forward-fixed.

@izaitsevfb That's great ! Can we merge this PR first and then fix the tests?

@izaitsevfb
Copy link
Contributor

I think there are plans to rewrite the tests to not rely on RNG specifics, but I don't have the specifics. Please don't worry about that, as I've confirmed that the internal test failures are expected and can be forward-fixed.

@izaitsevfb That's great ! Can we merge this PR first and then fix the tests?

Yep, that's what I meant, sorry for not being clear.

@CaoE
Copy link
Collaborator Author

CaoE commented Dec 13, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

facebook-github-bot pushed a commit to facebookresearch/multimodal that referenced this pull request Feb 9, 2023
Summary:
Due to a change in pytorch's random distribution calculation (see pytorch/pytorch#69967), `torch.multinomial` is producing different values for unit tests. `GenerationUtil` uses multinomial for autoregressive decoding and token prediction. The `test_sample` unit test now tests for output shape instead of exact tokens to mitigate sensitivities to random distributions.

Pull Request resolved: #403

Test Plan:
```
python -m pytest -v tests/utils/test_generate.py

tests/utils/test_generate.py::TestLogitsMask::test_normal PASSED                                                                                                                         [  6%]
tests/utils/test_generate.py::TestLogitsMask::test_zero_dims PASSED                                                                                                                      [ 13%]
tests/utils/test_generate.py::TestLogitsMask::test_in_seq_only PASSED                                                                                                                    [ 20%]
tests/utils/test_generate.py::TestLogitsMask::test_out_seq_only PASSED                                                                                                                   [ 26%]
tests/utils/test_generate.py::TestGenerationUtil::test_model_eval_warning PASSED                                                                                                         [ 33%]
tests/utils/test_generate.py::TestGenerationUtil::test_sample PASSED                                                                                                                     [ 40%]
tests/utils/test_generate.py::TestGenerationUtil::test_filter_logits PASSED                                                                                                              [ 46%]
tests/utils/test_generate.py::TestLogitsFilterTopK::test_min_tokens_to_keep PASSED                                                                                                       [ 53%]
tests/utils/test_generate.py::TestLogitsFilterTopK::test_top_k_invalid PASSED                                                                                                            [ 60%]
tests/utils/test_generate.py::TestLogitsFilterTopK::test_default PASSED                                                                                                                  [ 66%]
tests/utils/test_generate.py::TestLogitsFilterTopK::test_top_k PASSED                                                                                                                    [ 73%]
tests/utils/test_generate.py::TestLogitsFilterTopP::test_min_tokens_to_keep PASSED                                                                                                       [ 80%]
tests/utils/test_generate.py::TestLogitsFilterTopP::test_top_p_invalid PASSED                                                                                                            [ 86%]
tests/utils/test_generate.py::TestLogitsFilterTopP::test_default PASSED                                                                                                                  [ 93%]
tests/utils/test_generate.py::TestLogitsFilterTopP::test_top_p PASSED                                                                                                                    [100%]

```

Reviewed By: pbontrager

Differential Revision: D43110663

Pulled By: RdoubleA

fbshipit-source-id: d6c8665cb1ea102352e09c4557ff6982ffa40f84
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request cla signed intel priority matters to intel architecture from performance wise intel This tag is for PR from Intel Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source Reverted Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet