Skip to content

Conversation

Polyomino
Copy link
Contributor

Summary:

Key Changes Made

making num_threads const ensures there is no data race.

Benefits of This Fix

  • Eliminates the Data Race: The tsan error should no longer occur because the threadpool initialization is now atomic
  • Thread Safety: Multiple threads can safely call get_threadpool() concurrently
  • Maintains Compatibility: All existing functionality is preserved

Verification

  • ✅ Code compiles without errors
  • ✅ Buck build succeeds
  • ✅ No diagnostic issues
  • ✅ Maintains existing functionality

This fix should resolve the tsan failures encountered when running assistant integration tests under ThreadSanitizer. The data race that was occurring between threads T391 and T393 on the num_threads variable at address 0x000016aa6cf0 should now be eliminated.

Differential Revision: D82560656

Copy link

pytorch-bot bot commented Sep 16, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14358

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 2 Cancelled Jobs, 2 Unrelated Failures

As of commit 21cf901 with merge base b3f3111 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 16, 2025
@facebook-github-bot
Copy link
Contributor

@Polyomino has exported this pull request. If you are a Meta employee, you can view the originating diff in D82560656.

Copy link
Contributor

@swolchok swolchok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

*/
constexpr int tsan_thread_limit = 63;
num_threads = std::min(num_threads, tsan_thread_limit);
static const int num_threads = ([]() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pinged you to get link to tsan failure to understand the issue better

Comment on lines 96 to 97
// get_threadpool is not thread safe due to leak_corrupted_threadpool
// Make this part threadsafe: TODO(kimishpatel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is unrelated. I think (trying to remember) the reason why i mentioned this issue is because at fork we will hit leak_corrupted_threadpool = false at line 126 and that is being carried out in thread unsafe way. So after process fork if there are two threads call get_threadpool then leak_corrupted_threadpool = false can potentially be executed at the same time and can lead to unsafe behaviors

Copy link
Contributor

@kimishpatel kimishpatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restore the thread unsafe comment

Polyomino added a commit to Polyomino/executorch that referenced this pull request Sep 17, 2025
Summary:

### Key Changes Made

making num_threads const ensures there is no data race.

### Benefits of This Fix

*   **Eliminates the Data Race**: The tsan error should no longer occur because the threadpool initialization is now atomic
*   **Thread Safety**: Multiple threads can safely call `get_threadpool()` concurrently
*   **Maintains Compatibility**: All existing functionality is preserved

### Verification

*   ✅ Code compiles without errors
*   ✅ Buck build succeeds
*   ✅ No diagnostic issues
*   ✅ Maintains existing functionality

This fix should resolve the tsan failures encountered when running assistant integration tests under ThreadSanitizer. The data race that was occurring between threads T391 and T393 on the `num_threads` variable at address `0x000016aa6cf0` should now be eliminated.

Differential Revision: D82560656
@facebook-github-bot
Copy link
Contributor

@Polyomino has exported this pull request. If you are a Meta employee, you can view the originating diff in D82560656.

Polyomino added a commit to Polyomino/executorch that referenced this pull request Sep 17, 2025
Summary:

### Key Changes Made

making num_threads const ensures there is no data race.

### Benefits of This Fix

*   **Eliminates the Data Race**: The tsan error should no longer occur because the threadpool initialization is now atomic
*   **Thread Safety**: Multiple threads can safely call `get_threadpool()` concurrently
*   **Maintains Compatibility**: All existing functionality is preserved

### Verification

*   ✅ Code compiles without errors
*   ✅ Buck build succeeds
*   ✅ No diagnostic issues
*   ✅ Maintains existing functionality

This fix should resolve the tsan failures encountered when running assistant integration tests under ThreadSanitizer. The data race that was occurring between threads T391 and T393 on the `num_threads` variable at address `0x000016aa6cf0` should now be eliminated.

Differential Revision: D82560656
@facebook-github-bot
Copy link
Contributor

@Polyomino has exported this pull request. If you are a Meta employee, you can view the originating diff in D82560656.

kimishpatel
kimishpatel previously approved these changes Sep 17, 2025
@Polyomino Polyomino requested a review from swolchok September 17, 2025 21:56
swolchok
swolchok previously approved these changes Sep 18, 2025
Polyomino added a commit to Polyomino/executorch that referenced this pull request Sep 18, 2025
Summary:

### Key Changes Made

making num_threads const ensures there is no data race.

### Benefits of This Fix

*   **Eliminates the Data Race**: The tsan error should no longer occur because the threadpool initialization is now atomic
*   **Thread Safety**: Multiple threads can safely call `get_threadpool()` concurrently
*   **Maintains Compatibility**: All existing functionality is preserved

### Verification

*   ✅ Code compiles without errors
*   ✅ Buck build succeeds
*   ✅ No diagnostic issues
*   ✅ Maintains existing functionality

This fix should resolve the tsan failures encountered when running assistant integration tests under ThreadSanitizer. The data race that was occurring between threads T391 and T393 on the `num_threads` variable at address `0x000016aa6cf0` should now be eliminated.

Differential Revision: D82560656
@facebook-github-bot
Copy link
Contributor

@Polyomino has exported this pull request. If you are a Meta employee, you can view the originating diff in D82560656.

Copy link
Contributor

@swolchok swolchok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

Polyomino added a commit to Polyomino/executorch that referenced this pull request Sep 18, 2025
Summary:

### Key Changes Made

making num_threads const ensures there is no data race.

### Benefits of This Fix

*   **Eliminates the Data Race**: The tsan error should no longer occur because the threadpool initialization is now atomic
*   **Thread Safety**: Multiple threads can safely call `get_threadpool()` concurrently
*   **Maintains Compatibility**: All existing functionality is preserved

### Verification

*   ✅ Code compiles without errors
*   ✅ Buck build succeeds
*   ✅ No diagnostic issues
*   ✅ Maintains existing functionality

This fix should resolve the tsan failures encountered when running assistant integration tests under ThreadSanitizer. The data race that was occurring between threads T391 and T393 on the `num_threads` variable at address `0x000016aa6cf0` should now be eliminated.

Differential Revision: D82560656
@facebook-github-bot
Copy link
Contributor

@Polyomino has exported this pull request. If you are a Meta employee, you can view the originating diff in D82560656.

swolchok
swolchok previously approved these changes Sep 18, 2025
Copy link
Contributor

@swolchok swolchok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

@Polyomino Polyomino closed this Sep 18, 2025
@Polyomino Polyomino reopened this Sep 18, 2025
@pytorch-bot pytorch-bot bot dismissed stale reviews from kimishpatel, swolchok, and swolchok September 18, 2025 22:06

This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.

Polyomino added a commit to Polyomino/executorch that referenced this pull request Sep 18, 2025
Summary:

### Key Changes Made

making num_threads const ensures there is no data race.

### Benefits of This Fix

*   **Eliminates the Data Race**: The tsan error should no longer occur because the threadpool initialization is now atomic
*   **Thread Safety**: Multiple threads can safely call `get_threadpool()` concurrently
*   **Maintains Compatibility**: All existing functionality is preserved

### Verification

*   ✅ Code compiles without errors
*   ✅ Buck build succeeds
*   ✅ No diagnostic issues
*   ✅ Maintains existing functionality

This fix should resolve the tsan failures encountered when running assistant integration tests under ThreadSanitizer. The data race that was occurring between threads T391 and T393 on the `num_threads` variable at address `0x000016aa6cf0` should now be eliminated.

Reviewed By: swolchok

Differential Revision: D82560656
@facebook-github-bot
Copy link
Contributor

@Polyomino has exported this pull request. If you are a Meta employee, you can view the originating diff in D82560656.

Copy link
Contributor

@swolchok swolchok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

Polyomino added a commit to Polyomino/executorch that referenced this pull request Sep 19, 2025
Summary:

### Key Changes Made

making num_threads const ensures there is no data race.

### Benefits of This Fix

*   **Eliminates the Data Race**: The tsan error should no longer occur because the threadpool initialization is now atomic
*   **Thread Safety**: Multiple threads can safely call `get_threadpool()` concurrently
*   **Maintains Compatibility**: All existing functionality is preserved

### Verification

*   ✅ Code compiles without errors
*   ✅ Buck build succeeds
*   ✅ No diagnostic issues
*   ✅ Maintains existing functionality

This fix should resolve the tsan failures encountered when running assistant integration tests under ThreadSanitizer. The data race that was occurring between threads T391 and T393 on the `num_threads` variable at address `0x000016aa6cf0` should now be eliminated.

Reviewed By: swolchok

Differential Revision: D82560656
@facebook-github-bot
Copy link
Contributor

@Polyomino has exported this pull request. If you are a Meta employee, you can view the originating diff in D82560656.

Polyomino added a commit to Polyomino/executorch that referenced this pull request Sep 22, 2025
Summary:

### Key Changes Made

making num_threads const ensures there is no data race.

### Benefits of This Fix

*   **Eliminates the Data Race**: The tsan error should no longer occur because the threadpool initialization is now atomic
*   **Thread Safety**: Multiple threads can safely call `get_threadpool()` concurrently
*   **Maintains Compatibility**: All existing functionality is preserved

### Verification

*   ✅ Code compiles without errors
*   ✅ Buck build succeeds
*   ✅ No diagnostic issues
*   ✅ Maintains existing functionality

This fix should resolve the tsan failures encountered when running assistant integration tests under ThreadSanitizer. The data race that was occurring between threads T391 and T393 on the `num_threads` variable at address `0x000016aa6cf0` should now be eliminated.

Reviewed By: swolchok

Differential Revision: D82560656
@facebook-github-bot
Copy link
Contributor

@Polyomino has exported this pull request. If you are a Meta employee, you can view the originating diff in D82560656.

Polyomino added a commit to Polyomino/executorch that referenced this pull request Sep 22, 2025
Summary:

### Key Changes Made

making num_threads const ensures there is no data race.

### Benefits of This Fix

*   **Eliminates the Data Race**: The tsan error should no longer occur because the threadpool initialization is now atomic
*   **Thread Safety**: Multiple threads can safely call `get_threadpool()` concurrently
*   **Maintains Compatibility**: All existing functionality is preserved

### Verification

*   ✅ Code compiles without errors
*   ✅ Buck build succeeds
*   ✅ No diagnostic issues
*   ✅ Maintains existing functionality

This fix should resolve the tsan failures encountered when running assistant integration tests under ThreadSanitizer. The data race that was occurring between threads T391 and T393 on the `num_threads` variable at address `0x000016aa6cf0` should now be eliminated.

Reviewed By: swolchok

Differential Revision: D82560656
@facebook-github-bot
Copy link
Contributor

@Polyomino has exported this pull request. If you are a Meta employee, you can view the originating diff in D82560656.

@facebook-github-bot
Copy link
Contributor

@Polyomino has imported this pull request. If you are a Meta employee, you can view this in D82560656.

Polyomino added a commit to Polyomino/executorch that referenced this pull request Sep 24, 2025
Summary:
### Key Changes Made

making num_threads const ensures there is no data race.

### Benefits of This Fix

*   **Eliminates the Data Race**: The tsan error should no longer occur because the threadpool initialization is now atomic
*   **Thread Safety**: Multiple threads can safely call `get_threadpool()` concurrently
*   **Maintains Compatibility**: All existing functionality is preserved

### Verification

*   ✅ Code compiles without errors
*   ✅ Buck build succeeds
*   ✅ No diagnostic issues
*   ✅ Maintains existing functionality

This fix should resolve the tsan failures encountered when running assistant integration tests under ThreadSanitizer. The data race that was occurring between threads T391 and T393 on the `num_threads` variable at address `0x000016aa6cf0` should now be eliminated.


Reviewed By: swolchok

Differential Revision: D82560656

Pulled By: Polyomino
@facebook-github-bot
Copy link
Contributor

@Polyomino has exported this pull request. If you are a Meta employee, you can view the originating diff in D82560656.

Summary:
### Key Changes Made

making num_threads const ensures there is no data race.

### Benefits of This Fix

*   **Eliminates the Data Race**: The tsan error should no longer occur because the threadpool initialization is now atomic
*   **Thread Safety**: Multiple threads can safely call `get_threadpool()` concurrently
*   **Maintains Compatibility**: All existing functionality is preserved

### Verification

*   ✅ Code compiles without errors
*   ✅ Buck build succeeds
*   ✅ No diagnostic issues
*   ✅ Maintains existing functionality

This fix should resolve the tsan failures encountered when running assistant integration tests under ThreadSanitizer. The data race that was occurring between threads T391 and T393 on the `num_threads` variable at address `0x000016aa6cf0` should now be eliminated.


Reviewed By: swolchok

Differential Revision: D82560656

Pulled By: Polyomino
@facebook-github-bot
Copy link
Contributor

@Polyomino has exported this pull request. If you are a Meta employee, you can view the originating diff in D82560656.

@Polyomino Polyomino merged commit 50ab90a into pytorch:main Sep 25, 2025
121 of 132 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants