Skip to content

Conversation

@bharel
Copy link
Contributor

@bharel bharel commented Aug 25, 2024

The variable is now set and reset only within the lock.

Previously, it was a global variable that was reset outside of the lock and caused an access violation at line 265:

if (!config->legacy_windows_stdio && sys_stdin == stdin)

Happened with higher chance during thread closure. tstate at line 260 turned null, which caused the config to be null.

I'd appreciate help with the tests, I chose to test it on PDB but it's a difficult catch.
I also do not have a Windows machine readily available which is required to trigger.

@bharel
Copy link
Contributor Author

bharel commented Aug 25, 2024

By the way, I believe there's an issue with the pdb tests, I'm not sure if the doctests actually run or not.

"""

if not SKIP_ASYNCIO_TESTS:
def test_pdb_issue_gh_123321():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is failing in the free-threaded build, probably because the initialization of the _PyOS_ReadlineLock lock isn't thread-safe without the GIL (and a few other related issues).

That shouldn't be too hard to fix, but we should do it after this PR has landed because the bug you are fixing exists in 3.12, and the free-threading fixes should only go back to 3.13.

In the meantime, you can add a @support.requires_gil_enabled() decorator to skip this test in the free-threaded build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks, I'll do that.
I'm changing the tests now and moving them to the readline or builtins module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was fixed when I took the lock inside the GIL.

I was worried about a potential deadlock, but I don't see any other uses for the lock except this function so it's fine.

Copy link
Contributor Author

@bharel bharel Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it does deadlock:

First thread:

GIL.acq() -> lock.acq() -> GIL.rls() -> |GIL.acq()| -> lock.rls()

Second thread:
GIL.acq() -> |lock.acq()| -> GIL.rls() -> GIL.acq() -> lock.rls()

Taking the lock before releasing the GIL will require releasing the lock without the GIL, which is very dangerous if the internal function acquires the GIL in any way. i.e. this will work, but dangerous if GIL is taken again after the release:

GIL.acq() -> lock.acq() -> GIL.rls() -> |dangerous if GIL taken here| -> lock.rls() -> GIL.acq()

I will have to take the lock outside the GIL and we need to ensure that taking the lock is threadsafe outside the GIL.


PyThread_acquire_lock(_PyOS_ReadlineLock, 1);
Py_BEGIN_ALLOW_THREADS
_PyOS_ReadlineTState = tstate;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set _PyOS_ReadlineTState while you are still holding the GIL, before Py_BEGIN_ALLOW_THREADS. It should not make a big difference, since other threads should be blocked at PyThread_acquire_lock(_PyOS_ReadlineLock, 1) :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have to release the lock outside the GIL and preferably take the lock too, otherwise it deadlocks.

It fails on the free-threaded as according to @colesbury the taking of the lock isn't safe outside the GIL.

I'm a little confused as theoretically if it isn't safe it should fail on the normal build too but I'm not entirely sure how that works.

@bharel bharel marked this pull request as ready for review August 30, 2024 20:31
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

If I understood @colesbury correctly, the @requires_gil_enabled() decorator avoids a test failure on Free Threaded build for now, and so the fix is good enough for "Python with GIL (enabled)".

@bharel
Copy link
Contributor Author

bharel commented Aug 30, 2024

LGTM.

If I understood @colesbury correctly, the @requires_gil_enabled() decorator avoids a test failure on Free Threaded build for now, and so the fix is good enough for "Python with GIL (enabled)".

So here's the thing - if it requires the GIL in order to take the lock then it shouldn't work in the normal build after ALLOW_THREADS. If it doesn't require the GIL and it is threadsafe (atomic / other lock) then it should work on both builds.

I fear that it isn't safe outside the GIL and then we create a new issue that we aren't aware of. If it is then the Free Threaded should work too.

Either way, how would one solve it in the free threaded build? Shall I open a new issue for that or will there be a large undertaking to take care of everything containing that decorator at a later time?

Py_END_ALLOW_THREADS

// gh-123321: Must set the variable and then release the lock before
// taking the GIL. Otherwise a deadlock or segfault may occur.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can a deadlock occur due to this variable not being NULL?

Copy link
Contributor Author

@bharel bharel Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release lock then take GIL.
If you take GIL and then release lock there's a deadlock.

If you set the variable before releasing the lock you segfault.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah the link you posted is exactly what I wanted to know. Thanks!

@colesbury
Copy link
Contributor

If I understood colesbury correctly, the @requires_gil_enabled() decorator avoids a test failure on Free Threaded build for now, and so the fix is good enough for "Python with GIL (enabled)".

Yeah, the fix for the free-threaded build will require some code that's not available in 3.12. So my thinking is that we fix it first of for the default build like this and backport it all the way to 3.12, and then I can put up a PR that fixes it for the free-threaded build (and removes the skip annotation). That one won't get backported to 3.12.

@vstinner
Copy link
Member

vstinner commented Sep 4, 2024

@colesbury @pablogsal: Are you ok with this change?

@vstinner vstinner merged commit a4562fe into python:main Sep 4, 2024
@vstinner vstinner added needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Sep 4, 2024
@miss-islington-app
Copy link

Thanks @bharel for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks @bharel for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 4, 2024
… a multi-threaded race (pythonGH-123323)

(cherry picked from commit a4562fe)

Co-authored-by: Bar Harel <bharel@barharel.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 4, 2024
… a multi-threaded race (pythonGH-123323)

(cherry picked from commit a4562fe)

Co-authored-by: Bar Harel <bharel@barharel.com>
@bedevere-app
Copy link

bedevere-app bot commented Sep 4, 2024

GH-123676 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Sep 4, 2024
@bedevere-app
Copy link

bedevere-app bot commented Sep 4, 2024

GH-123677 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 only security fixes label Sep 4, 2024
@bharel
Copy link
Contributor Author

bharel commented Sep 4, 2024

Sweet. I'd appreciate a mention and a link to this issue in the free threaded build.

Thanks everyone for the quick review and suggestions, amazing work 🥂

And to @amit-w for helping me debug it ❤️

pablogsal pushed a commit that referenced this pull request Sep 4, 2024
…g a multi-threaded race (GH-123323) (#123676)

gh-123321: Fix Parser/myreadline.c to prevent a segfault during a multi-threaded race (GH-123323)
(cherry picked from commit a4562fe)

Co-authored-by: Bar Harel <bharel@barharel.com>
@bharel bharel deleted the gh-123321 branch September 5, 2024 02:01
vstinner pushed a commit that referenced this pull request Sep 5, 2024
…g a multi-threaded race (GH-123323) (#123677)

* gh-123321: Fix Parser/myreadline.c to prevent a segfault during a multi-threaded race (GH-123323)
(cherry picked from commit a4562fe)

Co-authored-by: Bar Harel <bharel@barharel.com>

* Remove @requires_gil_enabled for 3.12

---------

Co-authored-by: Bar Harel <bharel@barharel.com>
Co-authored-by: Sam Gross <colesbury@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants