Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ruby 3.2 + 3.3+ #17

Closed
wants to merge 2 commits into from
Closed

Conversation

jpcamara
Copy link
Contributor

@jpcamara jpcamara commented Dec 28, 2023

This is still WIP, but I wanted to open it up so the general approach is clear. Main oddities for me right now:

  • Having to free the data at the end of tracing. I guess that's ok? It was the only way I could figure out how to get the new rb_internal_thread_specific_* api working properly without a memory leak
  • Name is not going to work correctly in ruby 3.3+ as this code is right now, since the events don't guarantee to fire from their associated thread. Ideally we'd get the name from the thread, but that is kind of tricky since ruby does not expose a method for it, and Thread.list doesn't contain any threads that have exited at the end of the tracing
  • I know object_id is considered inefficient (Reduce usage of object_id in core gems ruby/ruby#9276), but the overhead here seems minimal enough? I like it better as an identifier than a serial id and it lets MN threads work the same as native threads

Commit notes

  • There are lots of changes to how Ruby 3.3+ works with thread instrumentation.

  • There is no guarantee which thread will trigger hooks. For instance, in Ruby 3.3 the RUBY_INTERNAL_THREAD_EVENT_STARTED event is always called from a different thread

  • As a result, you have to use the event_data->thread member in conjunction with rb_internal_thread_specific_get/set to emulate "thread local" data tracking. You can't use actual thread locals anymore because the hooks are not fired on the same thread

  • As a result, we can't reliably retrieve the native thread id because ruby does not expose a way to do this and using system calls won't work on different threads. Now uses object_id in ruby 3.3+

  • These changes also work better with the M:N thread scheduler, since even if we could access native thread ids the M:N scheduler mostly reuses threads so the associated thread ids would be wrong

  • Because the events can fire from different threads, the current way of getting the thread name (pthread_getname_np), will probably be wrong

  • Using rb_thread_current() in the instrumentation hook calls causes a segfault, but it seems to work fine in the GC methods and when starting/stopping tracing

  • Because we malloc the thread data in ruby 3.3, we free it up when we stop tracing, or whenever a thread exits

* There are lots of changes to how Ruby 3.3+ works with thread instrumentation.

* There is no guarantee which thread will trigger hooks. For instance, in Ruby 3.3 the RUBY_INTERNAL_THREAD_EVENT_STARTED event is always called from a different thread

* As a result, you have to use the `event_data->thread` member in conjunction with rb_internal_thread_specific_get/set to emulate "thread local" data tracking. You can't use actual thread locals anymore because the hooks are not fired on the same thread

* As a result, we can't reliably retrieve the native thread id because ruby does not expose a way to do this and using system calls won't work on different threads. Now uses object_id in ruby 3.3+

* These changes also work better with the M:N thread scheduler, since even if we could access native thread ids the M:N scheduler mostly reuses threads so the associated thread ids would be wrong

* Because the events can fire from different threads, the current way of getting the thread name (pthread_getname_np), will probably be wrong

* Using rb_thread_current() in the instrumentation hook calls causes a segfault, but it seems to work fine in the GC methods and when starting/stopping tracing

* Because we malloc the thread data in ruby 3.3, we free it up when we stop tracing, or whenever a thread exits
@jpcamara
Copy link
Contributor Author

jpcamara commented Dec 28, 2023

Just noticed a mistake I'll fix around the thread access in Ruby 3.3, will push that up later today

Windows is still broken which I thought was turned off since it would never work but I'll check it another time

@jpcamara
Copy link
Contributor Author

Any chance you could weigh in on the approach @byroot?

In terms of the need to manually free, I've started trying to use TypedData_Make_Struct, but don't have it fully working yet and I'm not sure it's the right approach since that seems more meant for externally exposed structs?

Also not sure the best approach for getting thread names to display. Most of my testing involves the threads exiting before the block can pull them from Thread.list so the names are always either wrong or empty

thread_meta = (thread_meta_t*)malloc(sizeof(thread_meta_t));
memset(thread_meta, 0, sizeof(thread_meta_t));
thread_meta->thread = thread;
thread_meta->thread_object_id = RB_NUM2INT(rb_obj_id(thread));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could Thread#native_thread_id work here?

Although there's a stipulation that might make it unusable here:

Note

If the thread is not associated yet or already deassociated with a native thread, it returns nil. If the Ruby implementation uses M:N thread model, the ID may change depending on the timing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input @amomchilov !

There are a couple obstacles, unfortunately.

First, as far as I can tell there is no way to safely access the native_thread_id method from C without the GVL. The thread api exposed to C extension code is pretty limited, that is safe to run without the GVL. With the way this code works it is always updating the file directly during the hook events, which have no GVL guarantees. Part of me is hoping you'll tell me i'm wrong here and there is a way to invoke thread methods safely in this context :D

Second is like you mentioned - ideally we'd solve M:N support here as well. The M:N model is actually even more strict than that note, it always returns nil in the current implementation.

Copy link

@amomchilov amomchilov Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I’m just squinting and doing loose pattern matching here. I have no expertise in this domain :)

There aren’t all that many threads. Touching their object IDs would be totally fine.

Copy link
Owner

@ivoanjo ivoanjo Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since for M:N threads the notion of a native thread id no longer exists in a lot of cases, I think it's a good idea to use the object id instead.

Having said that, I expect that for a while most folks will not be using M:N threads for Ruby 3.3 (e.g. most people won't be using ractors + won't be using M:N scheduling for the main ractor).

It would be nice to still use the native thread for such cases, methinks!

Having said all that, I realize there's already quite a few moving parts for this PR so I'm perfectly fine with starting with object id for all on 3.3 and punt on doing a fancier thing ;)

Copy link

@casperisfine casperisfine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed too deeply, but the changes overall make sense to me.

For a sec I was worried the state allocation in the callback may be racy, but actually it's fine because a given thread can't process two events concurrently, so seems OK to me.

Comment on lines +118 to +119
thread_meta = (thread_meta_t*)malloc(sizeof(thread_meta_t));
memset(thread_meta, 0, sizeof(thread_meta_t));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
thread_meta = (thread_meta_t*)malloc(sizeof(thread_meta_t));
memset(thread_meta, 0, sizeof(thread_meta_t));
thread_meta = ZALLOC(thread_meta_t);

#elif HAVE_GETTID
native_thread_id = gettid();
#else
native_thread_id = thread_local_meta->current_thread_serial; // TODO: Better fallback for Windows?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you know, but the GVL API is a noop on Windows

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, I didn't know about that!

In that case, maybe go with removing current_thread_serial and adding an ifdef to make the gem become a no-op on windows may be the way to go.

(I think it's still nice if the gem builds/installs, so that folks don't need to have conditions on their Gemfiles)

@jpcamara
Copy link
Contributor Author

I haven't reviewed too deeply, but the changes overall make sense to me.

For a sec I was worried the state allocation in the callback may be racy, but actually it's fine because a given thread can't process two events concurrently, so seems OK to me.

@casperisfine Thanks for the feedback! I'll take a look at GVL tracing and see if I can clean this up a bit based on that.

Do you have any thoughts on a way to track thread names that die before we can call Thread.list at the end?

// Used to mark function arguments that are deliberately left unused
#ifdef __GNUC__
#define UNUSED_ARG __attribute__((unused))
#else
#define UNUSED_ARG
#endif

typedef struct {
VALUE thread;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this. If you store a VALUE reference in that struct, you MUST mark it. Otherwise it can be GCed and the object slot re-used, or GC compaction could move that object in another slot.

But if you do that, it also solves your question about how to get the thread name.

Copy link
Contributor Author

@jpcamara jpcamara Feb 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this. If you store a VALUE reference in that struct, you MUST mark it. Otherwise it can be GCed and the object slot re-used, or GC compaction could move that object in another slot.

@byroot Thanks for the feedback here! Can you provide an example of how to do this in the context of this code? I've tried piecing it together hopping between gvltools and cruby, but hours in and I haven't gotten anything working. I can keep chipping away, but tbh the appropriate ways of when and where to mark in cruby are still pretty opaque to me.

Same goes for things like TypedData_Make_Struct. I thought that might be a good approach because of the free/mark options it has, but I landed on this current version because I just couldn't get that working.

But if you do that, it also solves your question about how to get the thread name.

I think this lines up with @ivoanjo's comment:

(E.g. keep it in a C array, that we need to mark but otherwise can operate on without the GVL)

So i'm assuming a combo of mark + a large-ish C array (maybe 1024 like the CRuby instrumentation tests?) would suffice to hold onto the metadata I need so I can get the thread names once the GvlTracing block finishes.

I am curious - for something like vernier, which presumably would be used to wrap long running ruby processes, how would you expect it to efficiently keep track of thread names? For gvl-tracing, you're not terribly likely to be running 1k+ threads. But for something like vernier there really are no limits to what the code running inside of it might do.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpcamara Not sure if you know this guide -- https://silverhammermba.github.io/emberb/c/#data -- it has a pretty good guide of how to setup a TypedData object, including the marking.

Currently, gvl-tracing is all class methods ("static"), but I guess to be able to mark we'd need to introduce a TypedData object, so maybe something like a GvlTracing::State object, which would contain inside it the threads array, that would then get marked.

I am curious - for something like vernier, which presumably would be used to wrap long running ruby processes, how would you expect it to efficiently keep track of thread names? For gvl-tracing, you're not terribly likely to be running 1k+ threads. But for something like vernier there really are no limits to what the code running inside of it might do.

This is a good point. I haven't looked at how Vernier does it but we actually don't need to necessarily do it at the end of the tracing. We could check from time to time if a name is available (e.g. to avoid doing it on every event), and then render it and move on, rather than waiting until the end.

The "wait until the end" strategy was just a bit of a simplification ;)

Copy link
Owner

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!

First of all, big thanks for working on this! And for the patience with my sloooow review haha >_>

Having to free the data at the end of tracing. I guess that's ok? It was the only way I could figure out how to get the new rb_internal_thread_specific_* api working properly without a memory leak

This seems very fine! In fact, it's better than the previous design, where the data would "live forever" as part of the native thread's thread-local stuff.

Name is not going to work correctly in ruby 3.3+ as this code is right now, since the events don't guarantee to fire from their associated thread. Ideally we'd get the name from the thread, but that is kind of tricky since ruby does not expose a method for it, and Thread.list doesn't contain any threads that have exited at the end of the tracing

Yeah... This is a bit annoying. Maybe something similar to @casperisfine suggested would work: we could keep references to all the threads we've seen (even those that die) and then in stop, we'd use that instead of Thread.list to iterate and get names from.

(E.g. keep it in a C array, that we need to mark but otherwise can operate on without the GVL)

I know object_id is considered inefficient (Reduce usage of object_id in core gems ruby/ruby#9276), but the overhead here seems minimal enough? I like it better as an identifier than a serial id and it lets MN threads work the same as native threads

I think if you create enough threads for object_id calls to add any measurable overhead, you're probably already holding it wrong 🤣 .

There is no guarantee which thread will trigger hooks. For instance, in Ruby 3.3 the RUBY_INTERNAL_THREAD_EVENT_STARTED event is always called from a different thread

I'm curious -- is that the case even when not using M:N scheduling on the main ractor? What thread does it get called from? 🤔

Comment on lines +112 to +113
void *data = rb_internal_thread_specific_get(thread, gvl_tracing_key);
thread_meta_t* thread_meta = NULL;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Assigning directly to thread_meta simplifies a few things below I think :)

Suggested change
void *data = rb_internal_thread_specific_get(thread, gvl_tracing_key);
thread_meta_t* thread_meta = NULL;
thread_meta_t* data = (thread_meta_t *) rb_internal_thread_specific_get(thread, gvl_tracing_key);

thread_meta = (thread_meta_t*)malloc(sizeof(thread_meta_t));
memset(thread_meta, 0, sizeof(thread_meta_t));
thread_meta->thread = thread;
thread_meta->thread_object_id = RB_NUM2INT(rb_obj_id(thread));
Copy link
Owner

@ivoanjo ivoanjo Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since for M:N threads the notion of a native thread id no longer exists in a lot of cases, I think it's a good idea to use the object id instead.

Having said that, I expect that for a while most folks will not be using M:N threads for Ruby 3.3 (e.g. most people won't be using ractors + won't be using M:N scheduling for the main ractor).

It would be nice to still use the native thread for such cases, methinks!

Having said all that, I realize there's already quite a few moving parts for this PR so I'm perfectly fine with starting with object id for all on 3.3 and punt on doing a fancier thing ;)

Comment on lines +120 to +121
thread_meta->thread = thread;
thread_meta->thread_object_id = RB_NUM2INT(rb_obj_id(thread));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking there's a few places where we're initializing the thread-local structure. What do you think of adding a function to initialize it that can be used throughout?

Comment on lines +128 to +131
static inline void initialize_thread_id(thread_meta_t *thread_meta) {
thread_meta->current_thread_seen = true;
thread_meta->thread_object_id = RB_NUM2INT(rb_obj_id(thread_meta->thread));
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems somewhat redundant/duplicate from current_thread_meta above?

#elif HAVE_GETTID
native_thread_id = gettid();
#else
native_thread_id = thread_local_meta->current_thread_serial; // TODO: Better fallback for Windows?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, I didn't know about that!

In that case, maybe go with removing current_thread_serial and adding an ifdef to make the gem become a no-op on windows may be the way to go.

(I think it's still nice if the gem builds/installs, so that folks don't need to have conditions on their Gemfiles)

Comment on lines +143 to +145
static thread_meta_t* current_thread_meta(VALUE thread) {
return &thread_local_meta;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this approach!

@@ -49,6 +49,10 @@ def start(file)

def stop
thread_list = Thread.list
thread_list.each { |t|
GvlTracing._clean(t)
thread_label(t)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is calling thread_label needed here? 🤔

@byroot
Copy link
Contributor

byroot commented Jan 20, 2024

I'm curious -- is that the case even when not using M:N scheduling on the main ractor? What thread does it get called from? 🤔

Yes, it's fired from the parent thread. That simplified the implementation quite a bit.

@jpcamara
Copy link
Contributor Author

Hey!

First of all, big thanks for working on this! And for the patience with my sloooow review haha >_>

Having to free the data at the end of tracing. I guess that's ok? It was the only way I could figure out how to get the new rb_internal_thread_specific_* api working properly without a memory leak

This seems very fine! In fact, it's better than the previous design, where the data would "live forever" as part of the native thread's thread-local stuff.

Name is not going to work correctly in ruby 3.3+ as this code is right now, since the events don't guarantee to fire from their associated thread. Ideally we'd get the name from the thread, but that is kind of tricky since ruby does not expose a method for it, and Thread.list doesn't contain any threads that have exited at the end of the tracing

Yeah... This is a bit annoying. Maybe something similar to @casperisfine suggested would work: we could keep references to all the threads we've seen (even those that die) and then in stop, we'd use that instead of Thread.list to iterate and get names from.

(E.g. keep it in a C array, that we need to mark but otherwise can operate on without the GVL)

I know object_id is considered inefficient (Reduce usage of object_id in core gems ruby/ruby#9276), but the overhead here seems minimal enough? I like it better as an identifier than a serial id and it lets MN threads work the same as native threads

I think if you create enough threads for object_id calls to add any measurable overhead, you're probably already holding it wrong 🤣 .

There is no guarantee which thread will trigger hooks. For instance, in Ruby 3.3 the RUBY_INTERNAL_THREAD_EVENT_STARTED event is always called from a different thread

I'm curious -- is that the case even when not using M:N scheduling on the main ractor? What thread does it get called from? 🤔

Thanks for the feedback! I'll make some changes in the next few days so we can get this buttoned up 🎉

@jpcamara
Copy link
Contributor Author

Closing in favor of #18

@jpcamara jpcamara closed this Feb 19, 2024
ivoanjo added a commit that referenced this pull request Feb 21, 2024
Fix #17 
Paired with @byroot 

This PR adds support for Ruby 3.3 by using the new
`rb_internal_thread_specific_get` which allows accessing data specific
to a Ruby thread without requiring the GVL to be locked. It also handles
N:M threads automatically.

More info here:
ruby/ruby@352a885

In Ruby 3.3, the thread events don't necessarily run from the thread
they correspond to, so the thread corresponding to the event is sent in
`event_data`, and now `GT_LOCAL_STATE` needs to take the thread too.

To handle GC, the thread specific data is wrapped into a TypedData
struct and stored in the Ruby thread as a instance variable. This
requires getting the GVL, so we have an `allocate` flag on
`GT_LOCAL_STATE` too, which is false when we're not running from the
thread and can't allocate. The hook only runs from its corresponding
thread for the `RUBY_INTERNAL_THREAD_EVENT_STARTED` and
`RUBY_INTERNAL_THREAD_EVENT_RESUMED` events, so we can allocate there.

If the local state hasn't been initialized yet, we just return early.
This means we would skipping events from already existing threads that
haven't been resumed yet. To avoid this, we initialize all threads when
tracing starts. We do this from Ruby, which ensures we have the GVL. We
added a spec for that, which passes on main on 3.2, but not on 3.3.

To support N:M threads (and differentiate events from different threads
but the same native thread), instead of returning the native thread id,
we know return the address of the Ruby thread in `tid` (on Ruby 3.3).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants