Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft sketch of "external debug section" for feedback #6706

Conversation

KJTsanaktsidis
Copy link
Contributor

@KJTsanaktsidis KJTsanaktsidis commented Nov 10, 2022

This is a draft PR sketch to collect some feedback about how we might integrate external profiling tools with Ruby processes.

Bug tracker issue: https://bugs.ruby-lang.org/issues/19119

rb_method_entry_t (and its const cousin rb_callable_method_entry_t) need
to be RVALUEs (they are stored in T_IMEMO objects managed by the GC).
However, these structs are currently using all five words of the RVALUE.
Thus, it is not possible to add additional fields to them.

In order to solve this, we define a rb_method_entry_ext_t structure to
hold additional attributes and manage it in a similar way to how
rb_classext_struct is managed for RClass.

If USE_RVARGC is on (i.e. multiple size pools in the GC are enabled),
then we store the method entry in a larger size pool and place the
rb_method_entry_ext_t data inline with the object. Otherwise, we store a
pointer to a C heap-allocated rb_method_entry_ext_t in one of the RVALUE
words.

The details of either method are abstracted behind  METHOD_ENTRY_EXT()
and CALLABLE_METHOD_ENTRY_EXT() macros, which act analogously to the
RCLASS_EXT() one.

In order to make room for the ext pointer in non-RVARGC configurations,
the "owner" field of rb_method_entry_t has been oved to the
rb_method_entry_ext_t structure.
This commit adds a method Module#debug_name, which prints a
human-readable name intended for use in profiling tools for describing a
class.

The rules are documented in a test in test_module.rb, but boil down to:

* "<refinement Foo of Bar>" for a refinement module adding methods to
  Bar
* "<singleton of Foo>" for Foo's singleton class
* "<instance of Foo>" for the singleton class of a particular instance
  of Foo
* "<anonymous subclass of Foo>" for an anonymous subclass
* The usual classpath string otherwise

Importantly, none of these strings contain any addresses in them (i.e.
no %p of VALUEs). This is arguably useless anyway; now that the Ruby GC
is a compacting GC, and with moves afoot to compact by default, these
addresses are not even guaranteed to be stable from moment to moment
within the same process. However they're _especially_ useless for
aggregating across different processes. The intended use for these
strings is to build up fully-qualified method names for profiling tools;
addresses in the class parts of those method names would just cause
under-aggregation.
This is a method analogue for Module#debug_name; it prints the name of
the method qualified with the class name its on, as would be printed by
Module#debug_name. Thus, it gives a name for the method that is
guaranteed not to contain any addresses etc. and thus be suitable for
aggregation across processes in e.g. profilers.
This prints a thread backtrace using the same format as
Method#debug_name.
This commit adds some extra atomic operations to atomic.h:

* ATOMIC_PTR_SET & ATOMIC_SIZE_SET; these are like ATOMIC_SET (which
  already exists), but for `void *` and `size_t` types respectively.
  These do a store in a way that is a) guaranteecd not to tear and b)
  ordered with respect to other stores using the atomic.h macros.
* ATOMIC_LOAD, ATOMIC_PTR_LOAD and ATOMIC_SIZE_LOAD; these do an atomic
  load operation and work with `ruby_atomic_t`, `void *`, and `size_t`
  respectively. Again, these are needed to perform variable reads that
  a) are guaranteecd to read a valid, non-torn value and b) ordered with
  respect to other loads/stores through atomic.h.
* ATOMIC_BARRIER, which issues a memory fence/barrier instruction and
  orderes loads/stores before the barrier with respect to loads/stores
  after the barrier, even if those loads/stores are not done using the
  atomic.h macros.

The motivation for adding these is for use in the debug_external
structures; these structures are intended to be read from other
processes, and so it is important that this access is done through
instructions that valid/non-torn data is read from the external process.
This commit introduces a "debug_external" interface for Ruby. This is a
block of memory inside a Ruby process that exposes information about the
program for consumption by external tools in a documented manner.

The entrypoint is the rb_debug_ext_section global variable, which is
stored in its own ELF/PE/MachO/etc section so that it its address is
discoverable even in Ruby binaries which have had their symbols
stripped. Ruby will keep information in that structure up-to-date as the
program executes.

The first piece of information stored in there is a list of Ruby
execution contexts (i.e. fibres & threads), and the current call stack
for each one; this is also introduced in this commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant