Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for 3.3 and N:M threads #18

Merged
merged 7 commits into from
Feb 21, 2024
Merged

Conversation

etiennebarrie
Copy link
Contributor

@etiennebarrie etiennebarrie commented Feb 6, 2024

Fix #17
Paired with @byroot

This PR adds support for Ruby 3.3 by using the new rb_internal_thread_specific_get which allows accessing data specific to a Ruby thread without requiring the GVL to be locked. It also handles N:M threads automatically.

More info here: ruby/ruby@352a885

In Ruby 3.3, the thread events don't necessarily run from the thread they correspond to, so the thread corresponding to the event is sent in event_data, and now GT_LOCAL_STATE needs to take the thread too.

To handle GC, the thread specific data is wrapped into a TypedData struct and stored in the Ruby thread as a instance variable. This requires getting the GVL, so we have an allocate flag on GT_LOCAL_STATE too, which is false when we're not running from the thread and can't allocate. The hook only runs from its corresponding thread for the RUBY_INTERNAL_THREAD_EVENT_STARTED and RUBY_INTERNAL_THREAD_EVENT_RESUMED events, so we can allocate there.

If the local state hasn't been initialized yet, we just return early. This means we would skipping events from already existing threads that haven't been resumed yet. To avoid this, we initialize all threads when tracing starts. We do this from Ruby, which ensures we have the GVL. We added a spec for that, which passes on main on 3.2, but not on 3.3.

To support N:M threads (and differentiate events from different threads but the same native thread), instead of returning the native thread id, we know return the address of the Ruby thread in tid (on Ruby 3.3).

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
@byroot
Copy link
Contributor

byroot commented Feb 6, 2024

FYI @jpcamara

@etiennebarrie
Copy link
Contributor Author

We're not cleaning up the thread specific data. From the Ruby commit:

Note that you should not forget to clean up the set data.

This only applies if the thread hasn't been GC'd yet. When we stop tracing, we should clean up behind ourselves.

Another thing is that the thread name handling is still using the native thread id: https://github.com/etiennebarrie/gvl-tracing/blob/7b484b979f4322bf5057e9edda2bd2163497f510/lib/gvl-tracing.rb#L75

@jpcamara
Copy link
Contributor

jpcamara commented Feb 6, 2024

FYI @jpcamara

Thanks @byroot, should be a helpful learning experience going through the struct code 👍🏼 . I've found a couple issues trying it out - i'll post some review comments

Copy link
Contributor

@jpcamara jpcamara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see a TypedData approach here vs my PRs approach of a more standard c malloc/free🏅. I've left some comments in the code. Some are questions, some are issues to address. Thanks!

@@ -35,6 +35,9 @@ class << self
private :_stop

def start(file)
Thread.list.each do |thread|
_init_local_storage(thread)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever 👌🏼

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really minor: It may be worth running this loop from native code to be sure we catch any threads (e.g. to make sure Ruby doesn't switch away and creates threads in the middle of this loop).

(Although I guess it'd be pretty rare in practice?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose passing a list of threads would make sense, as to reduce the potential for a race.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit passing the array returned from Thread.list and iterating over the threads instead, but I don't know what I'm doing 😬 . I've looked at the documentation and they seem to discourage using RARRAY_PTR so I tried RARRAY_PTR_USE, it seems to work.

char native_thread_name_buffer[64] = "(unnamed)";

#ifdef HAVE_PTHREAD_GETNAME_NP
pthread_getname_np(pthread_self(), native_thread_name_buffer, sizeof(native_thread_name_buffer));
#endif

#ifdef HAVE_RB_INTERNAL_THREAD_SPECIFIC_GET
uint64_t thread_id = (uint64_t)state->thread;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is using the VALUE and the finalizer code uses native id, there is a mismatch and the names are not applied properly. RB_NUM2INT(rb_obj_id(thread)); and later thread->object_id seem like the most reliable option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we haven't concerned ourselves much with names in the resulting JSON. I'd rather go baby step to make the things easier to review. Having the valid timeline is more important than having the name, so we did this first.

Having the actual Ruby thread in there should make it easy to have the correct names later. Anyway the whole native_thread_id thing should be ignored because of MaNy.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I'd still want to have the native thread id when possible (e.g. when not using MaNy on the main ractor) as I think it's a useful info as you can use it to correlate with other native-level tools but totally fine to ignore that and simplify for now :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could collect it even with MaNy, it just it can't be the thread identifier anymore.

Copy link
Contributor

@jpcamara jpcamara Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivoanjo I completely forgot - I tried using VALUE initially, based on your recommendation elsewhere @byroot. I switched to object_id because VALUE is too large - it is invalid in the perfetto ui. If I try importing the file this PR generates, none of the threads show up. Annoyingly, tid is defined as an int32 and so perfetto completely ignores/rejects every thread written using VALUE.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess perhaps we could go back to the sequential thread id to make perfetto happy, and then we could have all relevant ids as part of the thread name string maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've hacked something where I xor the 32 higher bits with the lower 32 ones of the VALUE, that way tid is 32 bits (which I assume is what's expected by Perfetto UI since TIDs are PIDs and pid_t is 32 bits).

Now the threads are visible, even if they've lost their name in the thread_name_with_extension example.

VALUE wrapper = TypedData_Make_Struct(rb_cObject, thread_local_state, &thread_local_state_type, state);
rb_thread_local_aset(thread, rb_intern("__gvl_tracing_local_state"), wrapper);
rb_internal_thread_specific_set(thread, thread_storage_key, state);
RB_GC_GUARD(wrapper);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on the right path when trying to use TypedData_Make_Struct but 100% would not have landed here 😆. A few questions that would awesome to get answers to 🙏🏼:

  • Am I right to say that using TypedData_Make_Struct is mostly for the side effects of it? Meaning, it's utilized to be able to mark the thread, and to get free'd by the runtime automatically?
  • What's the reason the rb_thread_local_aset call is needed here? Is it so something actually has a reference to the wrapper so it will get garbage collected when the thread does?
  • I saw RB_GC_GUARD get used in gvltools as well. Is that required because of RUBY_TYPED_WB_PROTECTED? I've seen a couple examples of TypedData_Make_Struct in other gems and i've never seen RB_GC_GUARD or RUBY_TYPED_WB_PROTECTED used in those cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's utilized to be able to mark the thread, and to get free'd by the runtime automatically?

Yes. Absolutely.

Is it so something actually has a reference to the wrapper so it will get garbage collected when the thread does?

Exactly. It ties that object lifetime with the thread lifetime, so they'll be GCed together.

I saw RB_GC_GUARD

It's a bit more complicated than that.

Ruby GC is "conservative". This means every GC cycle will scan the C stack, and every value on the stack that look like a reference to a Ruby object will be marked.

Here we have VALUE wrapper so if GC trigger in the middle of that function, the reference should be on the stack and the object not GCed. But C compilers have a tendency to optimize some stuff away and sometime this cause references not to be on the stack which cause GC bugs.

RB_GC_GUARD is a macro that spew some code that prevent the reference being optimized away. It's likely not strictly necessary here, but doesn't hurt to be explicit.

The idea is to add RB_GC_GUARD(variable) after the last line where you want to be 100% sure that object won't be GCed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice approach!

@@ -195,6 +263,12 @@ static void render_event(const char *event_name) {
// Important note: We've observed some rendering issues in perfetto if the tid or pid are numbers that are "too big",
// see https://github.com/ivoanjo/gvl-tracing/pull/4#issuecomment-1196463364 for an example.

#ifdef HAVE_RB_INTERNAL_THREAD_SPECIFIC_GET
uint64_t thread_id = (uint64_t)state->thread;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as above about object_id

ext/gvl_tracing_native_extension/gvl_tracing.c Outdated Show resolved Hide resolved
ext/gvl_tracing_native_extension/gvl_tracing.c Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@byroot you mention in a comment here:

I just noticed this. If you store a VALUE reference in that struct, you MUST mark it. Otherwise it can be GCed and the object slot re-used, or GC compaction could move that object in another slot.
But if you do that, it also solves your question about how to get the thread name.

In running this code, it doesn't resolve the thread name issue. It's still using Thread.list to collect names at the end, which will only get the names of threads that are still alive. Any thread that ran and finished during instrumentation will be lost at that point. Since the pthread_getname_np code won't work in ruby 3.3, it means a set of names can always be missing at the end.

Am I missing something about how this should help get thread name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as I said, fixing the thread name issue was out of scope, but this should make it easier I think. We can now more easily store something like Thread#inspect directly from C (as long as we hold the GVL).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with leaving that for later, agree that having any 3.3 support is better than no 3.3 support.

Minor: It would be nice to have a pending spec for this, as a way of documenting the gap maybe?

Copy link
Owner

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thanks a lot for the great contribution + the patience for my slow reviews! ;)

Overall I like it! I've left a few suggestions/comments, but I'd be up for merging this mostly as-is, as anyway we can't do worse than the current state of "completely wrong on 3.3".

Thanks also to @jpcamara for his insightful review + work in exploring and uncovering the way in #17.

I've noticed the build on Windows is failing. If you want to take a stab at making it not fail (probably by making it a no-op) it would be cool, but again not a blocker for this PR.

So yeah, let me know when you're ready, and I'll press the magic merge button. And thanks again for this :)

@@ -35,6 +35,9 @@ class << self
private :_stop

def start(file)
Thread.list.each do |thread|
_init_local_storage(thread)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really minor: It may be worth running this loop from native code to be sure we catch any threads (e.g. to make sure Ruby doesn't switch away and creates threads in the middle of this loop).

(Although I guess it'd be pretty rare in practice?)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with leaving that for later, agree that having any 3.3 support is better than no 3.3 support.

Minor: It would be nice to have a pending spec for this, as a way of documenting the gap maybe?

spec/gvl_tracing_spec.rb Outdated Show resolved Hide resolved
ext/gvl_tracing_native_extension/gvl_tracing.c Outdated Show resolved Hide resolved
ext/gvl_tracing_native_extension/gvl_tracing.c Outdated Show resolved Hide resolved
char native_thread_name_buffer[64] = "(unnamed)";

#ifdef HAVE_PTHREAD_GETNAME_NP
pthread_getname_np(pthread_self(), native_thread_name_buffer, sizeof(native_thread_name_buffer));
#endif

#ifdef HAVE_RB_INTERNAL_THREAD_SPECIFIC_GET
uint64_t thread_id = (uint64_t)state->thread;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I'd still want to have the native thread id when possible (e.g. when not using MaNy on the main ractor) as I think it's a useful info as you can use it to correlate with other native-level tools but totally fine to ignore that and simplify for now :)

Comment on lines -166 to 257
static void set_native_thread_id(void) {
static void set_native_thread_id(thread_local_state *state) {
uint64_t native_thread_id = 0;

#ifdef HAVE_PTHREAD_THREADID_NP
pthread_threadid_np(pthread_self(), &native_thread_id);
#elif HAVE_GETTID
native_thread_id = gettid();
#else
native_thread_id = current_thread_serial; // TODO: Better fallback for Windows?
native_thread_id = state->current_thread_serial; // TODO: Better fallback for Windows?
#endif

thread_id = native_thread_id;
state->thread_id = native_thread_id;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this may be completely incorrect for 3.3? E.g. the assumption for gettid() and pthread_self() is that they get called from the thread that the event is about, but if events are emitted from other threads now then this is wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the alternative could be to call Thread#native_thread_id. But you need the GVL for that, so to me it's akin to Thread#name, some things to display, but not a good "primary key".

ext/gvl_tracing_native_extension/gvl_tracing.c Outdated Show resolved Hide resolved
Comment on lines 319 to 344
static void on_gc_event(VALUE tpval, UNUSED_ARG void *_unused1) {
const char* event_name = "bug_unknown_event";
thread_local_state *state = GT_LOCAL_STATE(rb_thread_current(), false); // no alloc during GC
switch (rb_tracearg_event_flag(rb_tracearg_from_tracepoint(tpval))) {
case RUBY_INTERNAL_EVENT_GC_ENTER: event_name = "gc"; break;
// TODO: is it possible the thread wasn't running? Might need to save the last state.
case RUBY_INTERNAL_EVENT_GC_EXIT: event_name = "running"; break;
}
render_event(event_name);
render_event(state, event_name);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: It may be a bit awkward right now, but at some point I've been considering maybe introducing a fake "missing" thread that could be a sink for these events, so that e.g. you at least saw in the timeline something happened, even if we couldn't attribute it to a thread.

@jpcamara jpcamara mentioned this pull request Feb 19, 2024
@etiennebarrie etiennebarrie force-pushed the 3.3-support branch 2 times, most recently from 5c4e3f2 to 4a261ed Compare February 19, 2024 16:00
Copy link
Owner

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more notes!

@jpcamara raised a few relevant bugs (start tid of 0, tid too big for perfetto). Would be nice to include them in this PR, or otherwise create pending specs or open issues to track them.

I guess otherwise we're mostly good to go?

_init_local_storage(Thread.list)
_start(file)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for applying my previous suggestion!

I've been thinking -- should we reorder these two lines? E.g. should we start listening to events, and then initialize the threads? This way if any threads get created concurrently, we'll go ahead and create contexts for them, and for existing threads we do that anyway.

The only downside I see is that we may miss a few events from the existing threads right at the beginning, but arguably that seems OK since they were anyway concurrent with _start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swapped those, not sure of the consequences but it seems to work fine for the examples where the threads are either created before or after tracing starts, but not concurrently.

etiennebarrie and others added 5 commits February 20, 2024 22:26
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
@etiennebarrie
Copy link
Contributor Author

etiennebarrie commented Feb 20, 2024

I've fixed more indentation crap I had done (4 tabs), and I've also indented the ifdefs inside methods like you had previously done. And also the warnings under 3.2. The diff is annoying but looking without whitespace we can see the last few changes I've done: https://github.com/ivoanjo/gvl-tracing/compare/4a261ed9920e3d486a43be93ba56b3cbde1d3520..7c83ec6f406ee98da279bb0a650cb33db68faa66?w=1

At this point, the traces from the examples look similar enough between 3.2 and 3.3 that I feel like it's mergeable. Feel free to merge or take over the branch if you want to add more before merging.

@ivoanjo
Copy link
Owner

ivoanjo commented Feb 21, 2024

Thanks! This looks great. I still want to do a bit of manual testing/experimentation before putting out a release, but I'll go ahead press the magic button now :)

@ivoanjo ivoanjo merged commit 4ffa17f into ivoanjo:master Feb 21, 2024
14 of 19 checks passed
@ivoanjo
Copy link
Owner

ivoanjo commented Mar 29, 2024

Heeey! Has it already been more than a month? 😅

I've just released this change in gvl-tracing 1.5.0. I also took a stab at fixing thread names (main gory details in cafbeb7), let me know what you think ;)

@casperisfine
Copy link

let me know what you think

I'm not too sure about the Ractor compatibility of this, but I'm no expert on them. Other than that, LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants