Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ASAN and TSAN handling in cmake #93147

Closed
wants to merge 17 commits into from

Conversation

cyyever
Copy link
Collaborator

@cyyever cyyever commented Jan 27, 2023

No description provided.

@pytorch-bot
Copy link

pytorch-bot bot commented Jan 27, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/93147

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 51ea450:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@cyyever
Copy link
Collaborator Author

cyyever commented Jan 27, 2023

Now we can test windows VS2022 with ASAN enabled but I don't know how to setup CI targets.

@cyyever
Copy link
Collaborator Author

cyyever commented Jan 27, 2023

The test failures are due to ASAN finally taking effect, and the old ASAN switch has no effect...:joy:
The failures can be solved by adding ASAN_OPTIONS=detect_container_overflow=0

@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 28, 2023
@Skylion007 Skylion007 added module: infra Relates to CI infrastructure test-config/tsan labels Jan 28, 2023
@cyyever cyyever force-pushed the google_sanitizer branch 2 times, most recently from b9bc807 to 9eb07d4 Compare January 29, 2023 00:51
@cyyever
Copy link
Collaborator Author

cyyever commented Jan 29, 2023

I have fix some leaks and there remains several JIT indirect leaks to digest. I think enable leak detection by default is not a bad idea now. But I need someone who is familiar with CI to help me setup the related config.

@Skylion007
Copy link
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased google_sanitizer onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout google_sanitizer && git pull --rebase)

@Skylion007
Copy link
Collaborator

Hmm, @malfet Any idea what is going on with the build_docs error?

@malfet
Copy link
Contributor

malfet commented Feb 3, 2023

Hmm, @malfet Any idea what is going on with the build_docs error?

Just a hiccup, it could not download build artifacts from the previous step. Re-run should fix it (hopefully)

@malfet
Copy link
Contributor

malfet commented Feb 3, 2023

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased google_sanitizer onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout google_sanitizer && git pull --rebase)

@cyyever cyyever force-pushed the google_sanitizer branch 2 times, most recently from f222675 to 0246128 Compare February 4, 2023 11:58
@cyyever
Copy link
Collaborator Author

cyyever commented Feb 4, 2023

@malfet asan tests fail with Indirect leak detected. A big success. The next step is to fix them. But JIT code is more complicated, it will take more time.

@Skylion007
Copy link
Collaborator

Skylion007 commented Feb 4, 2023

@malfet asan tests fail with Indirect leak detected. A big success. The next step is to fix them. But JIT code is more complicated, it will take more time.

@cyyever Unfortunately, it looks like most of the leaks are from pybind11::enum which is known to be leaky pybind/pybind11#3865 . If you can find the root cause and fix it, we would gladly accept as a PR in pybind11 though. Otherwise, i suppose we have to add supressions.

@cyyever
Copy link
Collaborator Author

cyyever commented Feb 5, 2023

@Skylion007 It didn't report pybind11 leaks on 3.10, maybe fixed by official.

@cyyever
Copy link
Collaborator Author

cyyever commented Mar 7, 2023

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@cyyever cyyever mentioned this pull request Mar 7, 2023
ydwu4 pushed a commit to ydwu4/pytorch that referenced this pull request Mar 10, 2023
cyyever added a commit to cyyever/pytorch_private that referenced this pull request Mar 12, 2023
cyyever added a commit to cyyever/pytorch_private that referenced this pull request Mar 12, 2023
ydwu4 added a commit to ydwu4/pytorch that referenced this pull request Mar 13, 2023

set(CMAKE_REQUIRED_QUIET ON)
set(_run_res 0)
include(CheckSourceRuns)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong here, but this requires CMake version upper than 3.18 (minimum required specified in pytorch CMakeLists.txt):

cmake_minimum_required(VERSION 3.18 FATAL_ERROR)

According to the logs when pytorch-linux-focal-py3-clang7-asan is built, CI is running CMake version 3.22.1.

Should we update it or at least specify cmake_minimum_required(VERSION 3.22 FATAL_ERROR) in this file ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_source_runs can be changed to check_cxx_source_runs which exists in 3.18

pytorchmergebot pushed a commit that referenced this pull request Apr 22, 2023
As indicated by the last comment from PR #93147, we should replace CheckSourceRuns in **cmake/Modules/FindSanitizer.cmake**  with older versions to avoid dependency on CMake 3.19+
Pull Request resolved: #97073
Approved by: https://github.com/vfdev-5, https://github.com/Skylion007
@cyyever cyyever deleted the google_sanitizer branch August 8, 2023 05:08
malfet added a commit that referenced this pull request Oct 14, 2023
If `USE_ASAN` is set, compile FBGEMM with ASAN as well, by setting `USE_SANITIZER` to `address,undefined`

This fixes regression in sanitizer coverage introduced by #93147  that change effects of sanitizer from the entire project to just torch libraries
pytorchmergebot pushed a commit that referenced this pull request Oct 14, 2023
If `USE_ASAN` is set, compile FBGEMM with ASAN as well, by setting `USE_SANITIZER` to `address,undefined`

This fixes regression in sanitizer coverage introduced by #93147  that change effects of sanitizer from the entire project to just torch libraries
pytorchmergebot pushed a commit that referenced this pull request Oct 14, 2023
If `USE_ASAN` is set, compile FBGEMM with ASAN as well, by setting `USE_SANITIZER` to `address,undefined`

This fixes regression in sanitizer coverage introduced by #93147  that change effects of sanitizer from the entire project to just torch libraries, and finally allows one to reliably catch regression reported in #111189

Pull Request resolved: #111266
Approved by: https://github.com/huydhn
yeounoh pushed a commit to yeounoh/pytorch that referenced this pull request Oct 16, 2023
If `USE_ASAN` is set, compile FBGEMM with ASAN as well, by setting `USE_SANITIZER` to `address,undefined`

This fixes regression in sanitizer coverage introduced by pytorch#93147  that change effects of sanitizer from the entire project to just torch libraries, and finally allows one to reliably catch regression reported in pytorch#111189

Pull Request resolved: pytorch#111266
Approved by: https://github.com/huydhn
yeounoh pushed a commit to yeounoh/pytorch that referenced this pull request Oct 16, 2023
If `USE_ASAN` is set, compile FBGEMM with ASAN as well, by setting `USE_SANITIZER` to `address,undefined`

This fixes regression in sanitizer coverage introduced by pytorch#93147  that change effects of sanitizer from the entire project to just torch libraries, and finally allows one to reliably catch regression reported in pytorch#111189

Pull Request resolved: pytorch#111266
Approved by: https://github.com/huydhn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: infra Relates to CI infrastructure open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants