Support for 3.3 and N:M threads #18

etiennebarrie · 2024-02-06T10:45:38Z

Fix #17
Paired with @byroot

This PR adds support for Ruby 3.3 by using the new rb_internal_thread_specific_get which allows accessing data specific to a Ruby thread without requiring the GVL to be locked. It also handles N:M threads automatically.

More info here: ruby/ruby@352a885

In Ruby 3.3, the thread events don't necessarily run from the thread they correspond to, so the thread corresponding to the event is sent in event_data, and now GT_LOCAL_STATE needs to take the thread too.

To handle GC, the thread specific data is wrapped into a TypedData struct and stored in the Ruby thread as a instance variable. This requires getting the GVL, so we have an allocate flag on GT_LOCAL_STATE too, which is false when we're not running from the thread and can't allocate. The hook only runs from its corresponding thread for the RUBY_INTERNAL_THREAD_EVENT_STARTED and RUBY_INTERNAL_THREAD_EVENT_RESUMED events, so we can allocate there.

If the local state hasn't been initialized yet, we just return early. This means we would skipping events from already existing threads that haven't been resumed yet. To avoid this, we initialize all threads when tracing starts. We do this from Ruby, which ensures we have the GVL. We added a spec for that, which passes on main on 3.2, but not on 3.3.

To support N:M threads (and differentiate events from different threads but the same native thread), instead of returning the native thread id, we know return the address of the Ruby thread in tid (on Ruby 3.3).

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>

byroot · 2024-02-06T10:51:23Z

FYI @jpcamara

etiennebarrie · 2024-02-06T10:57:20Z

We're not cleaning up the thread specific data. From the Ruby commit:

Note that you should not forget to clean up the set data.

This only applies if the thread hasn't been GC'd yet. When we stop tracing, we should clean up behind ourselves.

Another thing is that the thread name handling is still using the native thread id: https://github.com/etiennebarrie/gvl-tracing/blob/7b484b979f4322bf5057e9edda2bd2163497f510/lib/gvl-tracing.rb#L75

jpcamara · 2024-02-06T22:37:15Z

FYI @jpcamara

Thanks @byroot, should be a helpful learning experience going through the struct code 👍🏼 . I've found a couple issues trying it out - i'll post some review comments

jpcamara

Glad to see a TypedData approach here vs my PRs approach of a more standard c malloc/free🏅. I've left some comments in the code. Some are questions, some are issues to address. Thanks!

jpcamara · 2024-02-07T02:15:52Z

lib/gvl-tracing.rb

@@ -35,6 +35,9 @@ class << self
    private :_stop

    def start(file)
+      Thread.list.each do |thread|
+        _init_local_storage(thread)


Clever 👌🏼

Really minor: It may be worth running this loop from native code to be sure we catch any threads (e.g. to make sure Ruby doesn't switch away and creates threads in the middle of this loop).

(Although I guess it'd be pretty rare in practice?)

I suppose passing a list of threads would make sense, as to reduce the potential for a race.

I've added a commit passing the array returned from Thread.list and iterating over the threads instead, but I don't know what I'm doing 😬 . I've looked at the documentation and they seem to discourage using RARRAY_PTR so I tried RARRAY_PTR_USE, it seems to work.

jpcamara · 2024-02-07T02:25:57Z

ext/gvl_tracing_native_extension/gvl_tracing.c

  char native_thread_name_buffer[64] = "(unnamed)";

  #ifdef HAVE_PTHREAD_GETNAME_NP
    pthread_getname_np(pthread_self(), native_thread_name_buffer, sizeof(native_thread_name_buffer));
  #endif

+#ifdef HAVE_RB_INTERNAL_THREAD_SPECIFIC_GET
+  uint64_t thread_id = (uint64_t)state->thread;


Because this is using the VALUE and the finalizer code uses native id, there is a mismatch and the names are not applied properly. RB_NUM2INT(rb_obj_id(thread)); and later thread->object_id seem like the most reliable option.

https://github.com/ivoanjo/gvl-tracing/pull/17/files#diff-84e98b0dfbf6ada7e7e82bb92cd6e1b2ee542cb3d4f513544eb114bce88bdeddR121

https://github.com/ivoanjo/gvl-tracing/pull/17/files#diff-580c43b3e9db7af30c90ebf4b57f0af9874ccaa13e29dda84cd3b746b0e37b8cR76

Yeah, we haven't concerned ourselves much with names in the resulting JSON. I'd rather go baby step to make the things easier to review. Having the valid timeline is more important than having the name, so we did this first.

Having the actual Ruby thread in there should make it easy to have the correct names later. Anyway the whole native_thread_id thing should be ignored because of MaNy.

Ideally I'd still want to have the native thread id when possible (e.g. when not using MaNy on the main ractor) as I think it's a useful info as you can use it to correlate with other native-level tools but totally fine to ignore that and simplify for now :)

I think you could collect it even with MaNy, it just it can't be the thread identifier anymore.

@ivoanjo I completely forgot - I tried using VALUE initially, based on your recommendation elsewhere @byroot. I switched to object_id because VALUE is too large - it is invalid in the perfetto ui. If I try importing the file this PR generates, none of the threads show up. Annoyingly, tid is defined as an int32 and so perfetto completely ignores/rejects every thread written using VALUE.

I guess perhaps we could go back to the sequential thread id to make perfetto happy, and then we could have all relevant ids as part of the thread name string maybe?

I've hacked something where I xor the 32 higher bits with the lower 32 ones of the VALUE, that way tid is 32 bits (which I assume is what's expected by Perfetto UI since TIDs are PIDs and pid_t is 32 bits).

Now the threads are visible, even if they've lost their name in the thread_name_with_extension example.

jpcamara · 2024-02-07T02:38:10Z

ext/gvl_tracing_native_extension/gvl_tracing.c

+        VALUE wrapper = TypedData_Make_Struct(rb_cObject, thread_local_state, &thread_local_state_type, state);
+        rb_thread_local_aset(thread, rb_intern("__gvl_tracing_local_state"), wrapper);
+        rb_internal_thread_specific_set(thread, thread_storage_key, state);
+        RB_GC_GUARD(wrapper);


I was on the right path when trying to use TypedData_Make_Struct but 100% would not have landed here 😆. A few questions that would awesome to get answers to 🙏🏼:

Am I right to say that using TypedData_Make_Struct is mostly for the side effects of it? Meaning, it's utilized to be able to mark the thread, and to get free'd by the runtime automatically?

What's the reason the rb_thread_local_aset call is needed here? Is it so something actually has a reference to the wrapper so it will get garbage collected when the thread does?

I saw RB_GC_GUARD get used in gvltools as well. Is that required because of RUBY_TYPED_WB_PROTECTED? I've seen a couple examples of TypedData_Make_Struct in other gems and i've never seen RB_GC_GUARD or RUBY_TYPED_WB_PROTECTED used in those cases.

it's utilized to be able to mark the thread, and to get free'd by the runtime automatically?

Yes. Absolutely.

Is it so something actually has a reference to the wrapper so it will get garbage collected when the thread does?

Exactly. It ties that object lifetime with the thread lifetime, so they'll be GCed together.

I saw RB_GC_GUARD

It's a bit more complicated than that.

Ruby GC is "conservative". This means every GC cycle will scan the C stack, and every value on the stack that look like a reference to a Ruby object will be marked.

Here we have VALUE wrapper so if GC trigger in the middle of that function, the reference should be on the stack and the object not GCed. But C compilers have a tendency to optimize some stuff away and sometime this cause references not to be on the stack which cause GC bugs.

RB_GC_GUARD is a macro that spew some code that prevent the reference being optimized away. It's likely not strictly necessary here, but doesn't hurt to be explicit.

The idea is to add RB_GC_GUARD(variable) after the last line where you want to be 100% sure that object won't be GCed.

This is a nice approach!

ext/gvl_tracing_native_extension/gvl_tracing.c

jpcamara · 2024-02-07T02:41:43Z

ext/gvl_tracing_native_extension/gvl_tracing.c

@@ -195,6 +263,12 @@ static void render_event(const char *event_name) {
  // Important note: We've observed some rendering issues in perfetto if the tid or pid are numbers that are "too big",
  // see https://github.com/ivoanjo/gvl-tracing/pull/4#issuecomment-1196463364 for an example.

+#ifdef HAVE_RB_INTERNAL_THREAD_SPECIFIC_GET
+  uint64_t thread_id = (uint64_t)state->thread;


Same note as above about object_id

ext/gvl_tracing_native_extension/gvl_tracing.c

jpcamara · 2024-02-07T03:09:16Z

lib/gvl-tracing.rb

@byroot you mention in a comment here:

I just noticed this. If you store a VALUE reference in that struct, you MUST mark it. Otherwise it can be GCed and the object slot re-used, or GC compaction could move that object in another slot.
But if you do that, it also solves your question about how to get the thread name.

In running this code, it doesn't resolve the thread name issue. It's still using Thread.list to collect names at the end, which will only get the names of threads that are still alive. Any thread that ran and finished during instrumentation will be lost at that point. Since the pthread_getname_np code won't work in ruby 3.3, it means a set of names can always be missing at the end.

Am I missing something about how this should help get thread name?

Yeah, as I said, fixing the thread name issue was out of scope, but this should make it easier I think. We can now more easily store something like Thread#inspect directly from C (as long as we hold the GVL).

I'm OK with leaving that for later, agree that having any 3.3 support is better than no 3.3 support.

Minor: It would be nice to have a pending spec for this, as a way of documenting the gap maybe?

ivoanjo

Hey! Thanks a lot for the great contribution + the patience for my slow reviews! ;)

Overall I like it! I've left a few suggestions/comments, but I'd be up for merging this mostly as-is, as anyway we can't do worse than the current state of "completely wrong on 3.3".

Thanks also to @jpcamara for his insightful review + work in exploring and uncovering the way in #17.

I've noticed the build on Windows is failing. If you want to take a stab at making it not fail (probably by making it a no-op) it would be cool, but again not a blocker for this PR.

So yeah, let me know when you're ready, and I'll press the magic merge button. And thanks again for this :)

ivoanjo · 2024-02-18T20:53:35Z

lib/gvl-tracing.rb

@@ -35,6 +35,9 @@ class << self
    private :_stop

    def start(file)
+      Thread.list.each do |thread|
+        _init_local_storage(thread)


Really minor: It may be worth running this loop from native code to be sure we catch any threads (e.g. to make sure Ruby doesn't switch away and creates threads in the middle of this loop).

(Although I guess it'd be pretty rare in practice?)

ivoanjo · 2024-02-18T20:54:37Z

lib/gvl-tracing.rb

I'm OK with leaving that for later, agree that having any 3.3 support is better than no 3.3 support.

Minor: It would be nice to have a pending spec for this, as a way of documenting the gap maybe?

spec/gvl_tracing_spec.rb

ext/gvl_tracing_native_extension/gvl_tracing.c

ivoanjo · 2024-02-18T22:21:19Z

ext/gvl_tracing_native_extension/gvl_tracing.c

  char native_thread_name_buffer[64] = "(unnamed)";

  #ifdef HAVE_PTHREAD_GETNAME_NP
    pthread_getname_np(pthread_self(), native_thread_name_buffer, sizeof(native_thread_name_buffer));
  #endif

+#ifdef HAVE_RB_INTERNAL_THREAD_SPECIFIC_GET
+  uint64_t thread_id = (uint64_t)state->thread;


Ideally I'd still want to have the native thread id when possible (e.g. when not using MaNy on the main ractor) as I think it's a useful info as you can use it to correlate with other native-level tools but totally fine to ignore that and simplify for now :)

ivoanjo · 2024-02-18T22:28:54Z

ext/gvl_tracing_native_extension/gvl_tracing.c

-static void set_native_thread_id(void) {
+static void set_native_thread_id(thread_local_state *state) {
  uint64_t native_thread_id = 0;

  #ifdef HAVE_PTHREAD_THREADID_NP
    pthread_threadid_np(pthread_self(), &native_thread_id);
  #elif HAVE_GETTID
    native_thread_id = gettid();
  #else
-    native_thread_id = current_thread_serial; // TODO: Better fallback for Windows?
+    native_thread_id = state->current_thread_serial; // TODO: Better fallback for Windows?
  #endif

-  thread_id = native_thread_id;
+  state->thread_id = native_thread_id;
 }


I guess this may be completely incorrect for 3.3? E.g. the assumption for gettid() and pthread_self() is that they get called from the thread that the event is about, but if events are emitted from other threads now then this is wrong?

Yeah, the alternative could be to call Thread#native_thread_id. But you need the GVL for that, so to me it's akin to Thread#name, some things to display, but not a good "primary key".

ext/gvl_tracing_native_extension/gvl_tracing.c

ivoanjo · 2024-02-18T22:36:43Z

ext/gvl_tracing_native_extension/gvl_tracing.c

 static void on_gc_event(VALUE tpval, UNUSED_ARG void *_unused1) {
  const char* event_name = "bug_unknown_event";
+  thread_local_state *state = GT_LOCAL_STATE(rb_thread_current(), false); // no alloc during GC
  switch (rb_tracearg_event_flag(rb_tracearg_from_tracepoint(tpval))) {
    case RUBY_INTERNAL_EVENT_GC_ENTER: event_name = "gc"; break;
    // TODO: is it possible the thread wasn't running? Might need to save the last state.
    case RUBY_INTERNAL_EVENT_GC_EXIT: event_name = "running"; break;
  }
-  render_event(event_name);
+  render_event(state, event_name);
 }


Minor: It may be a bit awkward right now, but at some point I've been considering maybe introducing a fake "missing" thread that could be a sink for these events, so that e.g. you at least saw in the timeline something happened, even if we couldn't attribute it to a thread.

ext/gvl_tracing_native_extension/gvl_tracing.c

ivoanjo

Left a few more notes!

@jpcamara raised a few relevant bugs (start tid of 0, tid too big for perfetto). Would be nice to include them in this PR, or otherwise create pending specs or open issues to track them.

I guess otherwise we're mostly good to go?

ivoanjo · 2024-02-20T20:03:55Z

lib/gvl-tracing.rb

+      _init_local_storage(Thread.list)
      _start(file)


Thanks for applying my previous suggestion!

I've been thinking -- should we reorder these two lines? E.g. should we start listening to events, and then initialize the threads? This way if any threads get created concurrently, we'll go ahead and create contexts for them, and for existing threads we do that anyway.

The only downside I see is that we may miss a few events from the existing threads right at the beginning, but arguably that seems OK since they were anyway concurrent with _start.

I swapped those, not sure of the consequences but it seems to work fine for the examples where the threads are either created before or after tracing starts, but not concurrently.

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>

etiennebarrie · 2024-02-20T23:32:01Z

I've fixed more indentation crap I had done (4 tabs), and I've also indented the ifdefs inside methods like you had previously done. And also the warnings under 3.2. The diff is annoying but looking without whitespace we can see the last few changes I've done: https://github.com/ivoanjo/gvl-tracing/compare/4a261ed9920e3d486a43be93ba56b3cbde1d3520..7c83ec6f406ee98da279bb0a650cb33db68faa66?w=1

At this point, the traces from the examples look similar enough between 3.2 and 3.3 that I feel like it's mergeable. Feel free to merge or take over the branch if you want to add more before merging.

ivoanjo · 2024-02-21T14:58:04Z

Thanks! This looks great. I still want to do a bit of manual testing/experimentation before putting out a release, but I'll go ahead press the magic button now :)

ivoanjo · 2024-03-29T00:20:17Z

Heeey! Has it already been more than a month? 😅

I've just released this change in gvl-tracing 1.5.0. I also took a stab at fixing thread names (main gory details in cafbeb7), let me know what you think ;)

casperisfine · 2024-03-29T07:07:46Z

let me know what you think

I'm not too sure about the Ractor compatibility of this, but I'm no expert on them. Other than that, LGTM.

Refactor all thread locals in a single struct

2cc1a4d

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>

jpcamara reviewed Feb 7, 2024

View reviewed changes

ivoanjo reviewed Feb 18, 2024

View reviewed changes

jpcamara mentioned this pull request Feb 19, 2024

Support ruby 3.2 + 3.3+ #17

Closed

etiennebarrie force-pushed the 3.3-support branch 2 times, most recently from 5c4e3f2 to 4a261ed Compare February 19, 2024 16:00

byroot reviewed Feb 19, 2024

View reviewed changes

ext/gvl_tracing_native_extension/gvl_tracing.c Outdated Show resolved Hide resolved

ivoanjo reviewed Feb 20, 2024

View reviewed changes

etiennebarrie and others added 5 commits February 20, 2024 22:26

Avoid using native thread local storage in 3.3

abb474f

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>

Use Ruby thread address instead of native thread id in 3.3

fbcbdbc

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>

Initialize existing threads at GvlTracing.start

3874e37

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>

Run CI with RUBY_MN_THREADS

7c9e3c6

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>

Pass the full list of threads to _init_local_storage

6f8ace7

Co-authored-by: Jean Boussier <jean.boussier@gmail.com>

etiennebarrie force-pushed the 3.3-support branch from 4a261ed to c232b30 Compare February 20, 2024 23:17

Limit tids to 32 bits to please Perfetto UI

7c83ec6

etiennebarrie force-pushed the 3.3-support branch from c232b30 to 7c83ec6 Compare February 20, 2024 23:20

ivoanjo merged commit 4ffa17f into ivoanjo:master Feb 21, 2024
14 of 19 checks passed

Support for 3.3 and N:M threads #18

Support for 3.3 and N:M threads #18

Conversation

etiennebarrie commented Feb 6, 2024 • edited Loading

byroot commented Feb 6, 2024

etiennebarrie commented Feb 6, 2024

jpcamara commented Feb 6, 2024

jpcamara left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpcamara Feb 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoanjo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivoanjo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etiennebarrie commented Feb 20, 2024 • edited Loading

ivoanjo commented Feb 21, 2024

ivoanjo commented Mar 29, 2024

casperisfine commented Mar 29, 2024

etiennebarrie commented Feb 6, 2024 •

edited

Loading

jpcamara Feb 19, 2024 •

edited

Loading

etiennebarrie commented Feb 20, 2024 •

edited

Loading