Skip to content

Commit

Permalink
Initial release of rbspy
Browse files Browse the repository at this point in the history
  • Loading branch information
jvns committed Jan 22, 2018
0 parents commit f645348
Show file tree
Hide file tree
Showing 75 changed files with 238,168 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
@@ -0,0 +1,2 @@
target
.ruby-version
85 changes: 85 additions & 0 deletions .travis.yml
@@ -0,0 +1,85 @@
# Adapted from rust-everwhere:
# https://github.com/japaric/rust-everywhere
#
# Changes:
# - no deb
# - no osx, 32-bit or ARM
# - darwin commented out for now
# - no ARM

language: rust
cache: cargo

services:
- docker

env:
global:
# This will be part of the release tarball
- PROJECT_NAME=rbspy

# AFAICT There are a few ways to set up the build jobs. This one is not the DRYest but I feel is the
# easiest to reason about.
# NOTE Make *sure* you don't remove a reference (&foo) if you are going to dereference it (*foo)
matrix:
include:
# stable channel
- os: linux
rust: stable
env: TARGET=x86_64-unknown-linux-musl
dist: trusty
sudo: required
addons:
apt:
packages: &musl_packages
- musl
- musl-dev
- musl-tools

before_install:
- export PATH="$PATH:$HOME/.cargo/bin"
- sudo apt-get update
- sudo apt-get -y install perl

install:
- bash ci/install.sh

script:
- bash ci/script.sh

before_deploy:
- bash ci/before_deploy.sh

deploy:
provider: releases
# *REALLY* not production ready!
prerelease: true
# To generate key:
# - Go to 'https://github.com/settings/tokens/new' and generate a Token with only the
# `public_repo` scope enabled
# - Call `travis encrypt $github_token` where $github_token is the token you got in the previous
# step and `travis` is the official Travis CI gem (see https://rubygems.org/gems/travis/)
# - Enter the "encrypted value" below
api_key:
secure: g3yZvnK3m4hW2MNk6Nm2qVkgjL/hnNrsWwqvO7HvvD7VmQGWX2LBpNQE3plQsXhZeELdwWJuvYE5fp/rgjl3xS6/fRw/vpCvvfWB/Meq/YVoPJyq+FjXFLonkONhnJlIqvGohkRWZrAg0K+DBcu0+Db0uuQrjIrUG2TPNfETCRYKj90eJYChYQhqdxr0tv4SywpKZhR6AZqkvKkOMItM2kuCePEshtDJACj4OxWFXzCMEri1JZhqqdCqpYYsq/hqCcBlkey6tp4ul0d9qcNSaCRbWZlJ4Tli2h/K7MsfLVesSifisBVE/cdNvu7RxF1Ei01tivkT8Up69LM9JZdiBcmOrveOJd7Hd0R69R3Z3cGYvJiQq2svh6fqMIY0E/mLTfyuc+khJdf+zgr9tX3Qqa3HORyCDaDNVYws0HXEVpfNvmHrLeG3DWbtuYcFpKDbvcmN2fSuqvvxdhHrrSSwGhzD/dr4v3zq+xJUIS3qtS1lpu/kCcGmT9n7i73bFOyEOClCukPa//61/2W0KdBYJKU+jJaKljMJ3rNSbjH5DzYxBAbWgcywSebJwmcGTwourKJZsBoInCWLK6QXHuHrm6VcGJPO01JDHNuAXS5t7CY0Ju+ft6vbtOEaWykC4fbns2EuxOeLKA1U+KpjP4mQdpr/fV4bpMVAJyY9QO/FgVo=
file_glob: true
file: ${PROJECT_NAME}-${TRAVIS_TAG}-${TARGET}.*
# don't delete the artifacts from previous phases
skip_cleanup: true
# deploy when a new tag is pushed
on:
# channel to use to produce the release artifacts
condition: $TRAVIS_RUST_VERSION = stable
tags: true

branches:
only:
# Pushes and PR to the master branch
- master
# IMPORTANT Ruby regex to match tags. Required, or travis won't trigger deploys when a new tag
# is pushed. This regex matches semantic versions like v1.2.3-rc4+2016.02.22
- /^v\d+\.\d+\.\d+.*$/

notifications:
email:
on_success: never
132 changes: 132 additions & 0 deletions ARCHITECTURE.md
@@ -0,0 +1,132 @@
# rbspy architecture

rbspy is a little complicated. I want other people to be able to contribute to it easily, so here is
an architecture document to help you understand how it works.

Here’s what happens you run `rbspy snapshot --pid $PID`. This is the simplest subcommand (it takes a
PID and gets you the current stack trace from that PID), and if you understand how `snapshot` works
you can relatively easily understand how the rest of the `rbspy` subcommands work as well.

The implementation of the `snapshot` function in `main.rs` is really simple: just 6 lines of code.
The goal of this document is to explain how that code works behind the scenes.

```
fn snapshot(pid: pid_t) -> Result<(), Error> {
let getter = initialize::initialize(pid)?;
let trace = getter.get_trace()?;
for x in trace.iter().rev() {
println!("{}", x);
}
Ok(())
}
```

## Phase 1: Initialize. (`initialize.rs` + `address_finder.rs`)

Our first goal is to create a struct (`StackTraceGetter`) which we can call `.get()` on to get a
stack trace. This struct contains a PID, a function, and the address in the target process of the
current thread. The initialization code is somewhat complicated but has a simple interface: you give
it a PID, and it returns a struct that you can call `.get_trace()` on:

```
let getter = initialize.initialize(pid)
getter.get_trace()
```

Here's what happens when you call `initialize(pid)`.

**Step 1**: **Find the Ruby version of the process**. The code to do this is in a function called
`get_ruby_version`.

**Step 2**: **Find the address of the `ruby_current_thread` global variable**. This address is the
starting point for getting a stack trace from our Ruby process -- we start there every. How we do
this depends on 2 things -- whether the Ruby process we’re profiling has symbols, and the Ruby
version (in 2.5.0+ there are some small differences).

If there are symbols, we find the address of the current thread using the symbol table.
(`current_thread_address_location_symbol_table` function). This is pretty straightforward. We look
up `ruby_current_thread` or `ruby_current_execution_context_ptr` depending on the Ruby version.

If there **aren’t** symbols, instead we use a heuristic
(`current_thread_address_location_search_bss`) where we search through the `.bss` section of our
binary’s memory for something that plausibly looks like the address of the current thread. This
assumes that the address we want is in the `.bss` section somewhere. How this works:

* Find the address of the `.bss` section and read it from memory
* Cast the `.bss` section to an array of `usize` (so an array of addresses).
* Iterate through that array and for every address run the `is_maybe_thread` function on that
address. `is_maybe_thread` is a Ruby-version-specific function (we compile a different version of
this function for every Ruby version). We'll explain this later.
* Return an address if `is_maybe_thread` returns true for any of them. Otherwise abort.

**Step 3**: **Get the right `stack_trace` function**. We compile 30+ different functions to get
stack_traces (will explain this later). The code to decide which function to use is basically a huge
switch statement, depending on the Ruby version.

```
"1.9.1" => self::ruby_1_9_1_0::get_stack_trace,
"1.9.2" => self::ruby_1_9_2_0::get_stack_trace,
"1.9.3" => self::ruby_1_9_3_0::get_stack_trace,
```

**Step 4**: **Return the `getter` struct**.

Now we're done! We return our `StackTraceGetter` struct.

```
pub fn initialize(pid: pid_t) -> Result<StackTraceGetter, Error> {
let version = get_ruby_version_retry(pid).context("Couldn't determine Ruby version")?;
debug!("version: {}", version);
Ok(StackTraceGetter {
pid: pid,
current_thread_addr_location: os_impl::current_thread_address(pid, &version)?,
stack_trace_function: stack_trace::get_stack_trace_function(&version),
})
}
impl StackTraceGetter {
pub fn get_trace(&self) -> Result<Vec<StackFrame>, MemoryCopyError> {
let stack_trace_function = &self.stack_trace_function;
stack_trace_function(self.current_thread_addr_location, self.pid)
}
}
```

## Phase 2: Get stack traces (`ruby_version.rs`, `ruby-bindings/` crate, `bindgen.sh`)

Once we've initialized, all that remains is calling the `get_trace` function. How does that function
work?

Like we said before -- we compile a different version of the code to get stack traces for every Ruby
version. This is because every Ruby version has slightly different struct layouts.

The Ruby structs are defined in a `ruby-bindings` crate. All the code in that crate is autogenerated
by bindgen, using a hacky script called `bindgen.sh`.

These functions are defined through a bunch of macros (4 different macros, for different ranges of
Ruby versions) which implement `get_stack_trace` for every Ruby version. Each one uses the right
Ruby.

There's a lot of code in `ruby_version.rs` but this is the core of how it works. First, it defines a
`$ruby_version` module and inside that module uses `bindings::$ruby_version` which includes all the
required struct definitions for that Ruby version.

Then it includes **more** macros which together make up the body of that module. This is because
some functions are the same across all Ruby versions (like `get_ruby_string`) and some are different
(like `get_stack_frame` which changes frequently because the way Ruby organizes that code changes a
lot).

```
macro_rules! ruby_version_v_2_0_to_2_2(
($ruby_version:ident) => (
pub mod $ruby_version {
use bindings::$ruby_version::*;
...
get_stack_trace!(rb_thread_struct);
get_ruby_string!();
get_cfps!();
get_lineno_2_0_0!();
get_stack_frame_2_0_0!();
is_stack_base_1_9_0!();
}
```
17 changes: 17 additions & 0 deletions CODE_OF_CONDUCT.md
@@ -0,0 +1,17 @@
# rbspy code of conduct

Adapted from the [Rust code of conduct](https://www.rust-lang.org/conduct.html).

## Conduct

**Contact**: [julia@jvns.ca](mailto:julia@jvns.ca)

* We are committed to providing a friendly, safe and welcoming environment for all, regardless of level of experience, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, nationality, or other similar characteristic.
* On IRC/gitter, please avoid using overtly sexual nicknames or other nicknames that might detract from a friendly, safe and welcoming environment for all.
* Please be kind and courteous. There's no need to be mean or rude.
* Respect that people have differences of opinion and that every design or implementation choice carries a trade-off and numerous costs. There is seldom a right answer.
* Please keep unstructured critique to a minimum. If you have solid ideas you want to experiment with, make a fork and see how it works.
* We will exclude you from interaction if you insult, demean or harass anyone. That is not welcome behaviour. We interpret the term "harassment" as including the definition in the <a href="http://citizencodeofconduct.org/">Citizen Code of Conduct</a>; if you have any lack of clarity about what might be included in that concept, please read their definition. In particular, we don't tolerate behavior that excludes people in socially marginalized groups.
* Private harassment is also unacceptable. No matter who you are, if you feel you have been or are being harassed or made uncomfortable by a community member, please contact Julia immediately. Whether you're a regular contributor or a newcomer, we care about making this community a safe place for you and we've got your back.
* Likewise any spamming, trolling, flaming, baiting or other attention-stealing behaviour is not welcome.

0 comments on commit f645348

Please sign in to comment.