-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for 3.3 and N:M threads #18
Conversation
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
FYI @jpcamara |
We're not cleaning up the thread specific data. From the Ruby commit:
This only applies if the thread hasn't been GC'd yet. When we stop tracing, we should clean up behind ourselves. Another thing is that the thread name handling is still using the native thread id: https://github.com/etiennebarrie/gvl-tracing/blob/7b484b979f4322bf5057e9edda2bd2163497f510/lib/gvl-tracing.rb#L75 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see a TypedData approach here vs my PRs approach of a more standard c malloc/free🏅. I've left some comments in the code. Some are questions, some are issues to address. Thanks!
lib/gvl-tracing.rb
Outdated
@@ -35,6 +35,9 @@ class << self | |||
private :_stop | |||
|
|||
def start(file) | |||
Thread.list.each do |thread| | |||
_init_local_storage(thread) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever 👌🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really minor: It may be worth running this loop from native code to be sure we catch any threads (e.g. to make sure Ruby doesn't switch away and creates threads in the middle of this loop).
(Although I guess it'd be pretty rare in practice?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose passing a list of threads would make sense, as to reduce the potential for a race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a commit passing the array returned from Thread.list
and iterating over the threads instead, but I don't know what I'm doing 😬 . I've looked at the documentation and they seem to discourage using RARRAY_PTR
so I tried RARRAY_PTR_USE
, it seems to work.
char native_thread_name_buffer[64] = "(unnamed)"; | ||
|
||
#ifdef HAVE_PTHREAD_GETNAME_NP | ||
pthread_getname_np(pthread_self(), native_thread_name_buffer, sizeof(native_thread_name_buffer)); | ||
#endif | ||
|
||
#ifdef HAVE_RB_INTERNAL_THREAD_SPECIFIC_GET | ||
uint64_t thread_id = (uint64_t)state->thread; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is using the VALUE
and the finalizer code uses native id, there is a mismatch and the names are not applied properly. RB_NUM2INT(rb_obj_id(thread));
and later thread->object_id
seem like the most reliable option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we haven't concerned ourselves much with names in the resulting JSON. I'd rather go baby step to make the things easier to review. Having the valid timeline is more important than having the name, so we did this first.
Having the actual Ruby thread in there should make it easy to have the correct names later. Anyway the whole native_thread_id
thing should be ignored because of MaNy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I'd still want to have the native thread id when possible (e.g. when not using MaNy on the main ractor) as I think it's a useful info as you can use it to correlate with other native-level tools but totally fine to ignore that and simplify for now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could collect it even with MaNy, it just it can't be the thread identifier anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivoanjo I completely forgot - I tried using VALUE
initially, based on your recommendation elsewhere @byroot. I switched to object_id
because VALUE
is too large - it is invalid in the perfetto ui. If I try importing the file this PR generates, none of the threads show up. Annoyingly, tid
is defined as an int32
and so perfetto completely ignores/rejects every thread written using VALUE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess perhaps we could go back to the sequential thread id to make perfetto happy, and then we could have all relevant ids as part of the thread name string maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've hacked something where I xor the 32 higher bits with the lower 32 ones of the VALUE, that way tid
is 32 bits (which I assume is what's expected by Perfetto UI since TIDs are PIDs and pid_t
is 32 bits).
Now the threads are visible, even if they've lost their name in the thread_name_with_extension example.
VALUE wrapper = TypedData_Make_Struct(rb_cObject, thread_local_state, &thread_local_state_type, state); | ||
rb_thread_local_aset(thread, rb_intern("__gvl_tracing_local_state"), wrapper); | ||
rb_internal_thread_specific_set(thread, thread_storage_key, state); | ||
RB_GC_GUARD(wrapper); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was on the right path when trying to use TypedData_Make_Struct
but 100% would not have landed here 😆. A few questions that would awesome to get answers to 🙏🏼:
- Am I right to say that using
TypedData_Make_Struct
is mostly for the side effects of it? Meaning, it's utilized to be able to mark the thread, and to getfree
'd by the runtime automatically? - What's the reason the
rb_thread_local_aset
call is needed here? Is it so something actually has a reference to the wrapper so it will get garbage collected when the thread does? - I saw
RB_GC_GUARD
get used ingvltools
as well. Is that required because ofRUBY_TYPED_WB_PROTECTED
? I've seen a couple examples ofTypedData_Make_Struct
in other gems and i've never seenRB_GC_GUARD
orRUBY_TYPED_WB_PROTECTED
used in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's utilized to be able to mark the thread, and to get free'd by the runtime automatically?
Yes. Absolutely.
Is it so something actually has a reference to the wrapper so it will get garbage collected when the thread does?
Exactly. It ties that object lifetime with the thread lifetime, so they'll be GCed together.
I saw RB_GC_GUARD
It's a bit more complicated than that.
Ruby GC is "conservative". This means every GC cycle will scan the C stack, and every value on the stack that look like a reference to a Ruby object will be marked.
Here we have VALUE wrapper
so if GC trigger in the middle of that function, the reference should be on the stack and the object not GCed. But C compilers have a tendency to optimize some stuff away and sometime this cause references not to be on the stack which cause GC bugs.
RB_GC_GUARD
is a macro that spew some code that prevent the reference being optimized away. It's likely not strictly necessary here, but doesn't hurt to be explicit.
The idea is to add RB_GC_GUARD(variable)
after the last line where you want to be 100% sure that object won't be GCed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice approach!
@@ -195,6 +263,12 @@ static void render_event(const char *event_name) { | |||
// Important note: We've observed some rendering issues in perfetto if the tid or pid are numbers that are "too big", | |||
// see https://github.com/ivoanjo/gvl-tracing/pull/4#issuecomment-1196463364 for an example. | |||
|
|||
#ifdef HAVE_RB_INTERNAL_THREAD_SPECIFIC_GET | |||
uint64_t thread_id = (uint64_t)state->thread; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note as above about object_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@byroot you mention in a comment here:
I just noticed this. If you store a VALUE reference in that struct, you MUST mark it. Otherwise it can be GCed and the object slot re-used, or GC compaction could move that object in another slot.
But if you do that, it also solves your question about how to get the thread name.
In running this code, it doesn't resolve the thread name issue. It's still using Thread.list
to collect names at the end, which will only get the names of threads that are still alive. Any thread that ran and finished during instrumentation will be lost at that point. Since the pthread_getname_np
code won't work in ruby 3.3, it means a set of names can always be missing at the end.
Am I missing something about how this should help get thread name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as I said, fixing the thread name issue was out of scope, but this should make it easier I think. We can now more easily store something like Thread#inspect
directly from C (as long as we hold the GVL).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with leaving that for later, agree that having any 3.3 support is better than no 3.3 support.
Minor: It would be nice to have a pending
spec for this, as a way of documenting the gap maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Thanks a lot for the great contribution + the patience for my slow reviews! ;)
Overall I like it! I've left a few suggestions/comments, but I'd be up for merging this mostly as-is, as anyway we can't do worse than the current state of "completely wrong on 3.3".
Thanks also to @jpcamara for his insightful review + work in exploring and uncovering the way in #17.
I've noticed the build on Windows is failing. If you want to take a stab at making it not fail (probably by making it a no-op) it would be cool, but again not a blocker for this PR.
So yeah, let me know when you're ready, and I'll press the magic merge button. And thanks again for this :)
lib/gvl-tracing.rb
Outdated
@@ -35,6 +35,9 @@ class << self | |||
private :_stop | |||
|
|||
def start(file) | |||
Thread.list.each do |thread| | |||
_init_local_storage(thread) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really minor: It may be worth running this loop from native code to be sure we catch any threads (e.g. to make sure Ruby doesn't switch away and creates threads in the middle of this loop).
(Although I guess it'd be pretty rare in practice?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with leaving that for later, agree that having any 3.3 support is better than no 3.3 support.
Minor: It would be nice to have a pending
spec for this, as a way of documenting the gap maybe?
char native_thread_name_buffer[64] = "(unnamed)"; | ||
|
||
#ifdef HAVE_PTHREAD_GETNAME_NP | ||
pthread_getname_np(pthread_self(), native_thread_name_buffer, sizeof(native_thread_name_buffer)); | ||
#endif | ||
|
||
#ifdef HAVE_RB_INTERNAL_THREAD_SPECIFIC_GET | ||
uint64_t thread_id = (uint64_t)state->thread; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I'd still want to have the native thread id when possible (e.g. when not using MaNy on the main ractor) as I think it's a useful info as you can use it to correlate with other native-level tools but totally fine to ignore that and simplify for now :)
static void set_native_thread_id(void) { | ||
static void set_native_thread_id(thread_local_state *state) { | ||
uint64_t native_thread_id = 0; | ||
|
||
#ifdef HAVE_PTHREAD_THREADID_NP | ||
pthread_threadid_np(pthread_self(), &native_thread_id); | ||
#elif HAVE_GETTID | ||
native_thread_id = gettid(); | ||
#else | ||
native_thread_id = current_thread_serial; // TODO: Better fallback for Windows? | ||
native_thread_id = state->current_thread_serial; // TODO: Better fallback for Windows? | ||
#endif | ||
|
||
thread_id = native_thread_id; | ||
state->thread_id = native_thread_id; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this may be completely incorrect for 3.3? E.g. the assumption for gettid()
and pthread_self()
is that they get called from the thread that the event is about, but if events are emitted from other threads now then this is wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the alternative could be to call Thread#native_thread_id
. But you need the GVL for that, so to me it's akin to Thread#name
, some things to display, but not a good "primary key".
static void on_gc_event(VALUE tpval, UNUSED_ARG void *_unused1) { | ||
const char* event_name = "bug_unknown_event"; | ||
thread_local_state *state = GT_LOCAL_STATE(rb_thread_current(), false); // no alloc during GC | ||
switch (rb_tracearg_event_flag(rb_tracearg_from_tracepoint(tpval))) { | ||
case RUBY_INTERNAL_EVENT_GC_ENTER: event_name = "gc"; break; | ||
// TODO: is it possible the thread wasn't running? Might need to save the last state. | ||
case RUBY_INTERNAL_EVENT_GC_EXIT: event_name = "running"; break; | ||
} | ||
render_event(event_name); | ||
render_event(state, event_name); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: It may be a bit awkward right now, but at some point I've been considering maybe introducing a fake "missing" thread that could be a sink for these events, so that e.g. you at least saw in the timeline something happened, even if we couldn't attribute it to a thread.
5c4e3f2
to
4a261ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more notes!
@jpcamara raised a few relevant bugs (start tid of 0, tid too big for perfetto). Would be nice to include them in this PR, or otherwise create pending specs or open issues to track them.
I guess otherwise we're mostly good to go?
lib/gvl-tracing.rb
Outdated
_init_local_storage(Thread.list) | ||
_start(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for applying my previous suggestion!
I've been thinking -- should we reorder these two lines? E.g. should we start listening to events, and then initialize the threads? This way if any threads get created concurrently, we'll go ahead and create contexts for them, and for existing threads we do that anyway.
The only downside I see is that we may miss a few events from the existing threads right at the beginning, but arguably that seems OK since they were anyway concurrent with _start
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I swapped those, not sure of the consequences but it seems to work fine for the examples where the threads are either created before or after tracing starts, but not concurrently.
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
4a261ed
to
c232b30
Compare
c232b30
to
7c83ec6
Compare
I've fixed more indentation crap I had done (4 tabs), and I've also indented the ifdefs inside methods like you had previously done. And also the warnings under 3.2. The diff is annoying but looking without whitespace we can see the last few changes I've done: https://github.com/ivoanjo/gvl-tracing/compare/4a261ed9920e3d486a43be93ba56b3cbde1d3520..7c83ec6f406ee98da279bb0a650cb33db68faa66?w=1 At this point, the traces from the examples look similar enough between 3.2 and 3.3 that I feel like it's mergeable. Feel free to merge or take over the branch if you want to add more before merging. |
Thanks! This looks great. I still want to do a bit of manual testing/experimentation before putting out a release, but I'll go ahead press the magic button now :) |
Heeey! Has it already been more than a month? 😅 I've just released this change in gvl-tracing 1.5.0. I also took a stab at fixing thread names (main gory details in cafbeb7), let me know what you think ;) |
I'm not too sure about the Ractor compatibility of this, but I'm no expert on them. Other than that, LGTM. |
Fix #17
Paired with @byroot
This PR adds support for Ruby 3.3 by using the new
rb_internal_thread_specific_get
which allows accessing data specific to a Ruby thread without requiring the GVL to be locked. It also handles N:M threads automatically.More info here: ruby/ruby@352a885
In Ruby 3.3, the thread events don't necessarily run from the thread they correspond to, so the thread corresponding to the event is sent in
event_data
, and nowGT_LOCAL_STATE
needs to take the thread too.To handle GC, the thread specific data is wrapped into a TypedData struct and stored in the Ruby thread as a instance variable. This requires getting the GVL, so we have an
allocate
flag onGT_LOCAL_STATE
too, which is false when we're not running from the thread and can't allocate. The hook only runs from its corresponding thread for theRUBY_INTERNAL_THREAD_EVENT_STARTED
andRUBY_INTERNAL_THREAD_EVENT_RESUMED
events, so we can allocate there.If the local state hasn't been initialized yet, we just return early. This means we would skipping events from already existing threads that haven't been resumed yet. To avoid this, we initialize all threads when tracing starts. We do this from Ruby, which ensures we have the GVL. We added a spec for that, which passes on main on 3.2, but not on 3.3.
To support N:M threads (and differentiate events from different threads but the same native thread), instead of returning the native thread id, we know return the address of the Ruby thread in
tid
(on Ruby 3.3).