Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release GVL around pw and gr function calls in Etc #40

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jeremyevans
Copy link
Contributor

These can block as they parse files.

I'm not sure if other function calls in Etc can block, so this doesn't add GVL releasing to them.

Releasing the GVL for these methods results in thread-safety issues, since thread-safety for the methods relied previously on the GVL. Add a mutex and surround GVL-releasing calls with a mutex.

This appears to fail on TruffleRuby due to a TruffleRuby bug.

These can block as they parse files.

I'm not sure if other function calls in Etc can block, so this
doesn't add GVL releasing to them.

Releasing the GVL for these methods results in thread-safety
issues, since thread-safety for the methods relied previously
on the GVL.  Add a mutex and surround GVL-releasing calls
with a mutex.
@jeremyevans jeremyevans requested review from nobu and eregon July 19, 2024 18:34
ext/etc/etc.c Outdated Show resolved Hide resolved
Co-authored-by: Olle Jonsson <olle.jonsson@gmail.com>
@nobu
Copy link
Member

nobu commented Jul 21, 2024

The man page of macOS says:

All of these routines are thread-safe. The getpwent(), getpwnam(), and getpwuid() routines return a pointer to a result managed by the system library in a thread-specific data structure.

@jeremyevans
Copy link
Contributor Author

I'm not sure they are thread safe on all operating systems. OpenBSD doesn't appear to use a thread-safe data structure. The Linux man pages don't mention thread safety.

I'm not sure we have thread-safety tests/specs for getpwnam and such, but there is one for getgrgid, and it segfaulted consistently on Linux before I added the mutex synchronization:
https://github.com/jeremyevans/ruby/actions/runs/10000243457

MacOS getgrgid man page has similar thread-safety language. Would you like like mutex synchronization turned off for MacOS?

ext/etc/etc.c Outdated Show resolved Hide resolved
@eregon
Copy link
Member

eregon commented Jul 22, 2024

The TruffleRuby issue is oracle/truffleruby#3624 it would be nice to wait for the fix before merging this, so at least it passes on truffleruby-head .
The CI matrix should also be made fail-fast: false.

@eregon
Copy link
Member

eregon commented Jul 22, 2024

As an aside, the signature of rb_mutex_synchronize in Ruby headers is quite wrong:

VALUE rb_mutex_synchronize(VALUE mutex, VALUE (*func)(VALUE arg), VALUE arg);

but it should be:

void* rb_mutex_synchronize(VALUE mutex, void* (*func)(void* arg), void* arg);

Could we fix that in Ruby headers?

There were significant thread-safety issues with releasing the GVL
on platforms where the underlying functions are not thread-safe.
However, I was only able to trigger the issues by manually adding
rb_thread_schedule calls after the mutexes were released.  This
fixes the thread safety issues so that the mutex is held while
creating the related Ruby objects to be returned.

I've added thread-safety tests for get{pwuid,pwnam,grgid,grnam}.
I've also added support for -DTEST_THREAD_SAFETY compile flag,
which will call rb_thread_schedule after all mutex releases.
The thread-safety tests pass with this compile flag set.
@jeremyevans
Copy link
Contributor Author

I found the GVL releasing caused significant thread-safety issues on platforms where the underlying functions are not thread-safe. However, I was only able to trigger the issues by manually adding rb_thread_schedule calls after the mutexes were released. I pushed a commit to fix the thread-safety issues by holding the mutex while creating the related Ruby objects to be returned. I also added thread-safety test, and support for a -DTEST_THREAD_SAFETY cflag to deliberately trying to expose thread-safety issues using rb_thread_schedule. All added thread-safety tests pass with -DTEST_THREAD_SAFETY.

@jeremyevans
Copy link
Contributor Author

As an aside, the signature of rb_mutex_synchronize in Ruby headers is quite wrong:

VALUE rb_mutex_synchronize(VALUE mutex, VALUE (*func)(VALUE arg), VALUE arg);

but it should be:

void* rb_mutex_synchronize(VALUE mutex, void* (*func)(void* arg), void* arg);

Could we fix that in Ruby headers?

I think this is likely to introduce int to pointer conversion compiler warnings. Maybe we could add a new function for the void * version?

@eregon
Copy link
Member

eregon commented Jul 23, 2024

oracle/truffleruby#3624 is fixed, so once https://github.com/ruby/truffleruby-dev-builder/actions/runs/10057958196 is done it should pass on truffleruby-head.

I just thought of an alternative to rb_mutex_synchronize: to use the rb_nativethread_lock* APIs of https://github.com/ruby/ruby/blob/57b11be15ae518288b719fb36068ceb23da6e050/include/ruby/thread_native.h#L88.
This feels a bit nicer than using a Ruby Mutex around a call releasing the GVL, and avoids running into oracle/truffleruby#3624 for existing TruffleRuby releases.
It also removes the need for all these casts.

@andrykonchin
Copy link
Member

The fix in TruffleRuby is merged and should be available in the next dev build (I suppose tomorrow?) cc @eregon

@jeremyevans
Copy link
Contributor Author

I just thought of an alternative to rb_mutex_synchronize: to use the rb_nativethread_lock* APIs of https://github.com/ruby/ruby/blob/57b11be15ae518288b719fb36068ceb23da6e050/include/ruby/thread_native.h#L88. This feels a bit nicer than using a Ruby Mutex around a call releasing the GVL, and avoids running into oracle/truffleruby#3624 for existing TruffleRuby releases. It also removes the need for all these casts.

I tried this approach. You need to wrap rb_nativethread_lock_lock in rb_thread_call_without_gvl to avoid deadlock, and you still need some casts. I pushed a commit for it, but I'm not sure whether it is nicer. It does appear to pass on TruffleRuby, though.

eregon
eregon previously approved these changes Jul 25, 2024
Copy link
Member

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks good to me but it would be good to have another review

The approach of releasing the GVL and using a mutex around
get{pwuid,pwnam,pwent,grgid,grnam,grent} is only safe if the only
calls to those functions are inside Etc.  However, core Ruby can
call the functions in process.c and file.c, and concurrent calls
to get{pwuid,pwnam,pwent,grgid,grnam,grent} could result in
segfaults if the GVL is released.

If get{pwuid,pwnam,grgid,grnam}_r functions are supported, use
them.  If they are not supported, continue to call
get{pwuid,pwnam,grgid,grnam}, but do not release the GVL, as
doing so is not safe.

This still releases the GVL for {end,set}{pw,gr}ent function
calls, since those return void and should not cause problems.

The get{pwuid,pwnam,grgid,grnam}_r function call logic was
taken from Ruby's process.c.

Remove use of native mutexes as they are no longer needed.
@jeremyevans
Copy link
Contributor Author

After some more thought, I came to the determination that the approach of releasing the GVL and using a mutex is not safe. It works fine if all calls to get{pwuid,pwnam,grgid,grnam} originate in etc.c. However, Ruby core in process.c and file.c can call the functions, and concurrent calls could result in segfaults.

I've pushed a commit to drop the mutex usage, retain the GVL for get{pwuid,pwnam,grgid,grnam} calls, and add support for calling get{pwuid,pwnam,grgid,grnam}_r instead, using the implementations from process.c (specifically, the implementations in ruby/ruby#11202, which adds GVL releasing to these function calls).

@eregon eregon self-requested a review July 25, 2024 21:25
@eregon eregon dismissed their stale review July 25, 2024 21:25

big code changes

Copy link
Member

@nobu nobu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The others look fine.

pwd = getpwuid(uid);

# ifdef HAVE_GETPWUID_R
char *bufid;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/ruby/etc/actions/runs/10101599428/job/27935519316#step:5:40

../../../../ext/etc/etc.c:325:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
  325 |     char *bufid;
      |     ^~~~

Still the required ruby version is 2.6, we can't assume C99.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants