rbspy is a little complicated. I want other people to be able to contribute to it easily, so here is an architecture document to help you understand how it works.
Here’s what happens you run rbspy snapshot --pid $PID
. This is the simplest subcommand (it takes a
PID and gets you the current stack trace from that PID), and if you understand how snapshot
works
you can relatively easily understand how the rest of the rbspy
subcommands work as well.
The implementation of the snapshot
function in main.rs
is really simple: just 6 lines of code.
The goal of this document is to explain how that code works behind the scenes.
fn snapshot(pid: pid_t) -> Result<(), Error> {
let getter = initialize::initialize(pid)?;
let trace = getter.get_trace()?;
for x in trace.iter().rev() {
println!("{}", x);
}
Ok(())
}
Our first goal is to create a struct (StackTraceGetter
) which we can call .get()
on to get a
stack trace. This struct contains a PID, a function, and the address in the target process of the
current thread. The initialization code is somewhat complicated but has a simple interface: you give
it a PID, and it returns a struct that you can call .get_trace()
on:
let getter = initialize.initialize(pid)
getter.get_trace()
Here's what happens when you call initialize(pid)
.
Step 1: Find the Ruby version of the process. The code to do this is in a function called
get_ruby_version
.
Step 2: Find the address of the ruby_current_thread
global variable. This address is the
starting point for getting a stack trace from our Ruby process -- we start there every. How we do
this depends on 2 things -- whether the Ruby process we’re profiling has symbols, and the Ruby
version (in 2.5.0+ there are some small differences).
If there are symbols, we find the address of the current thread using the symbol table.
(current_thread_address_location_symbol_table
function). This is pretty straightforward. We look
up ruby_current_thread
or ruby_current_execution_context_ptr
depending on the Ruby version.
If there aren’t symbols, instead we use a heuristic
(current_thread_address_location_search_bss
) where we search through the .bss
section of our
binary’s memory for something that plausibly looks like the address of the current thread. This
assumes that the address we want is in the .bss
section somewhere. How this works:
- Find the address of the
.bss
section and read it from memory - Cast the
.bss
section to an array ofusize
(so an array of addresses). - Iterate through that array and for every address run the
is_maybe_thread
function on that address.is_maybe_thread
is a Ruby-version-specific function (we compile a different version of this function for every Ruby version). We'll explain this later. - Return an address if
is_maybe_thread
returns true for any of them. Otherwise abort.
Step 3: Get the right stack_trace
function. We compile 30+ different functions to get
stack_traces (will explain this later). The code to decide which function to use is basically a huge
switch statement, depending on the Ruby version.
"1.9.1" => self::ruby_1_9_1_0::get_stack_trace,
"1.9.2" => self::ruby_1_9_2_0::get_stack_trace,
"1.9.3" => self::ruby_1_9_3_0::get_stack_trace,
Step 4: Return the getter
struct.
Now we're done! We return our StackTraceGetter
struct.
pub fn initialize(pid: pid_t) -> Result<StackTraceGetter, Error> {
let version = get_ruby_version_retry(pid).context("Couldn't determine Ruby version")?;
debug!("version: {}", version);
Ok(StackTraceGetter {
pid: pid,
current_thread_addr_location: os_impl::current_thread_address(pid, &version)?,
stack_trace_function: stack_trace::get_stack_trace_function(&version),
})
}
impl StackTraceGetter {
pub fn get_trace(&self) -> Result<Vec<StackFrame>, MemoryCopyError> {
let stack_trace_function = &self.stack_trace_function;
stack_trace_function(self.current_thread_addr_location, self.pid)
}
}
Once we've initialized, all that remains is calling the get_trace
function. How does that function
work?
Like we said before -- we compile a different version of the code to get stack traces for every Ruby version. This is because every Ruby version has slightly different struct layouts.
The Ruby structs are defined in a ruby-bindings
crate. All the code in that crate is autogenerated
by bindgen, using a hacky script called bindgen.sh
.
These functions are defined through a bunch of macros (4 different macros, for different ranges of
Ruby versions) which implement get_stack_trace
for every Ruby version. Each one uses the right
Ruby.
There's a lot of code in ruby_version.rs
but this is the core of how it works. First, it defines a
$ruby_version
module and inside that module uses bindings::$ruby_version
which includes all the
required struct definitions for that Ruby version.
Then it includes more macros which together make up the body of that module. This is because
some functions are the same across all Ruby versions (like get_ruby_string
) and some are different
(like get_stack_frame
which changes frequently because the way Ruby organizes that code changes a
lot).
macro_rules! ruby_version_v_2_0_to_2_2(
($ruby_version:ident) => (
pub mod $ruby_version {
use bindings::$ruby_version::*;
...
get_stack_trace!(rb_thread_struct);
get_ruby_string!();
get_cfps!();
get_lineno_2_0_0!();
get_stack_frame_2_0_0!();
is_stack_base_1_9_0!();
}