Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix segfault issue #283 #284

Merged
merged 2 commits into from
Oct 26, 2023
Merged

fix segfault issue #283 #284

merged 2 commits into from
Oct 26, 2023

Conversation

mavenlin
Copy link
Member

@mavenlin mavenlin commented Oct 26, 2023

Description

Describe your changes in detail.

Motivation and Context

Let's accept #282 before this one.

This closes #283

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • New environment (non-breaking change which adds 3rd-party environment)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of example)

Implemented Tasks

The XlaSend call requires envpool to make a copy of the action to prevent action from being recycled by the XLA runtime before envpool finishes using it. Originally, I use cudaMemcpy to make sure the copy is finished synchronously. However, it seems to cause a problem at issue #283.

Here I replace the original cudaMemcpy call with the async version and an explicit streamSynchronize.

It is not clear how cudaMemcpy in the default stream in a custom call interacts with the stream managed by pjrt. However, from the code here, I can hypothesize that an explicit stream synchronization in the custom call is safe.

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have reformatted the code using make format (required)
  • I have checked the code using make lint (required)
  • I have ensured make bazel-test pass. (required)

@mavenlin mavenlin mentioned this pull request Oct 26, 2023
3 tasks
@Trinkle23897 Trinkle23897 merged commit a1249e0 into main Oct 26, 2023
2 of 4 checks passed
@Trinkle23897 Trinkle23897 deleted the fix_segfault branch October 26, 2023 16:50
@ethanluoyc
Copy link
Contributor

A new release maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] XLA Segmentation Fault
3 participants