Skip to content
This repository
Browse code

Add a blog post for profiling JIT-ted code with OProfile

  • Loading branch information...
commit 6fe72d070369177e7761605a8c7e334f69eabb01 1 parent 6d2b3c8
Ryo Onodera authored March 20, 2013
379  web/_posts/2013-03-20-profiling-jitted-ruby-code-with-oprofile.markdown
Source Rendered
... ...
@@ -0,0 +1,379 @@
  1
+---
  2
+layout: post
  3
+title: Profiling JIT-ted Ruby code with OProfile
  4
+author: Ryo Onodera
  5
+---
  6
+
  7
+You can now profile JIT-ted Ruby code!
  8
+
  9
+Before:
  10
+
  11
+    samples  %        linenr info                 app name                 symbol name
  12
+    497      28.7117  instructions.cpp:82         rbx                      rubinius::MachineCode::interpreter(rubinius::State*, rubinius::MachineCode*, rubinius::InterpreterCallFrame*)
  13
+    460      26.5742  (no location information)   anon (tgid:4118 range:0xb581c000-0xb589c000) anon (tgid:4118 range:0xb581c000-0xb589c000)
  14
+      # => The dark world of JIT-ted Ruby code
  15
+    158       9.1277  jit_util.cpp:787            rbx                      rbx_set_local_depth
  16
+    113       6.5280  block_environment.cpp:365   rbx                      rubinius::BlockEnvironment::call(rubinius::State*, rubinius::CallFrame*, rubinius::Arguments&, int)
  17
+    102       5.8925  (no location information)   rbx                      __x86.get_pc_thunk.bx
  18
+    93        5.3726  jit_util.cpp:848            rbx                      rbx_push_local_depth
  19
+    71        4.1017  variable_scope.cpp:108      rbx                      rubinius::VariableScope::set_local(int, rubinius::Object*)
  20
+    68        3.9284  block_environment.cpp:62    rbx                      rubinius::BlockEnvironment::invoke(rubinius::State*, rubinius::CallFrame*, rubinius::BlockEnvironment*, rubinius::Arguments&, rubinius::BlockInvocation&)
  21
+    52        3.0040  variable_scope.cpp:126      rbx                      rubinius::VariableScope::get_local(int)
  22
+    47        2.7152  variable_scope.cpp:100      rbx                      rubinius::VariableScope::set_local(rubinius::State*, int, rubinius::Object*)
  23
+    45        2.5997  kind_of.hpp:153             rbx                      bool rubinius::kind_of<rubinius::Fiber>(rubinius::Object const*)
  24
+    24        1.3865  shared_state.cpp:295        rbx                      rubinius::SharedState::checkpoint(rubinius::ManagedThread*)
  25
+
  26
+After:
  27
+
  28
+    samples  %        linenr info                 app name                 symbol name
  29
+    3468     27.0388  instructions.cpp:82         rbx                      rubinius::MachineCode::interpreter(rubinius::State*, rubinius::MachineCode*, rubinius::InterpreterCallFrame*)
  30
+    3048     23.7642  loop.rb:7                   5060.jo                  _X_Object#forever$block@1
  31
+      # => Wow, the name of JIT-ted Ruby code is shown along side C++ function symbols!
  32
+    1044      8.1397  jit_util.cpp:848            rbx                      rbx_push_local_depth
  33
+    947       7.3834  (no location information)   rbx                      __x86.get_pc_thunk.bx
  34
+    916       7.1417  jit_util.cpp:787            rbx                      rbx_set_local_depth
  35
+    781       6.0892  block_environment.cpp:365   rbx                      rubinius::BlockEnvironment::call(rubinius::State*, rubinius::CallFrame*, rubinius::Arguments&, int)
  36
+    639       4.9821  block_environment.cpp:62    rbx                      rubinius::BlockEnvironment::invoke(rubinius::State*, rubinius::CallFrame*, rubinius::BlockEnvironment*, rubinius::Arguments&, rubinius::BlockInvocation&)
  37
+    421       3.2824  variable_scope.cpp:108      rbx                      rubinius::VariableScope::set_local(int, rubinius::Object*)
  38
+    389       3.0329  variable_scope.cpp:126      rbx                      rubinius::VariableScope::get_local(int)
  39
+    326       2.5417  variable_scope.cpp:100      rbx                      rubinius::VariableScope::set_local(rubinius::State*, int, rubinius::Object*)
  40
+    288       2.2454  shared_state.cpp:295        rbx                      rubinius::SharedState::checkpoint(rubinius::ManagedThread*)
  41
+    283       2.2065  kind_of.hpp:153             rbx                      bool rubinius::kind_of<rubinius::Fiber>(rubinius::Object const*)
  42
+
  43
+In short, you can tell how the machine code that the Rubinius JIT generates
  44
+performs.
  45
+
  46
+Not satisfied yet? Even annotated profile is supported! That means you can even
  47
+know how much time it spends on each line of Ruby code or even on each CPU
  48
+instruction:
  49
+
  50
+    /* 
  51
+     * Total samples for file : "/tmp/loop.rb"
  52
+     * 
  53
+     *   5658 19.9436
  54
+     */
  55
+
  56
+
  57
+       718  2.5308 :def increment(i)
  58
+       482  1.6990 :  i + 1
  59
+                   :end
  60
+                   :
  61
+                   :def forever
  62
+                   :  i = 0
  63
+      2933 10.3384 :  loop do /* _X_Object#forever$block@1 total:   6990 24.6387 */
  64
+      1525  5.3754 :    i = increment(i)
  65
+                   :  end
  66
+                   :end
  67
+                   :
  68
+                   :forever
  69
+
  70
+We're using [OProfile](http://oprofile.sourceforge.net/news/), a profiling
  71
+software. In this blog post, I'll show you how to profile using it!
  72
+
  73
+### What's OProfile?
  74
+
  75
+It's a very useful profiling tool available on Linux. It's a sampling-based
  76
+one. That means there is absolutely no change to Rubinius and your Ruby code to
  77
+profile. Just run it as you normally do. Also, the overhead of profiling is
  78
+minimal.
  79
+
  80
+This is contrasted to measuring-based profiling. Rubinius's built-in profiler
  81
+and ruby-prof are both examples of measuring profilers. This means the overhead
  82
+is much bigger and can skew results because of that.
  83
+
  84
+Basically, it works by reporting how many individual profiled items are sampled
  85
+compared to the overall total samples. It doesn't measure elapsed time. It's
  86
+much like top command's individual processes' CPU % usage with far greater
  87
+flexibility. The actual profiled items can be any of C/C++ libraries, C/C++
  88
+functions, C/C++ source code lines, or CPU instructions.
  89
+
  90
+So, OProfile can't usually profile Ruby code because it works on CPU
  91
+instruction level. From OProfile, it can't tell which Ruby source code line
  92
+Rubinius currently executing from the CPU instructions. However, it can profile
  93
+JIT-ted Ruby code because Rubinius compiles it very down into the CPU
  94
+instructions by definition.
  95
+
  96
+OProfile works as a Linux kernel module. So, it's supported only for Linux.
  97
+Sadly, Ubuntu's OProfile and LLVM have bugs relating to this feature.
  98
+Apparently, there is no one using this. So you can say, we are really on the
  99
+cutting edge. ;)
  100
+
  101
+Anyway, we must work around these problems. But how? There are a few options.
  102
+:)
  103
+
  104
+### Setup (PPA: the super simple way; Ubuntu 12.10 only)
  105
+
  106
+I prepared a [PPA](https://launchpad.net/~ryoqun/+archive/ppa) just for you.
  107
+Add it to your system. To be specific, run this:
  108
+
  109
+    $ sudo add-apt-repository ppa:ryoqun/ppa
  110
+    $ sudo apt-get update
  111
+    $ sudo apt-get install oprofile llvm-3.1
  112
+    $ sudo apt-get dist-upgrade # Upgrade preinstalled libllvm3.1 to the PPA
  113
+
  114
+By default, Rubinius doesn't use system-provided LLVM, so re-configure Rubinius
  115
+to use it and re-build Rubinius:
  116
+
  117
+    $ cd /path/to/rubinius-git-repository
  118
+    $ rake clean
  119
+    $ ./configure --llvm-config llvm-config-3.1
  120
+    $ rake
  121
+
  122
+Done!
  123
+
  124
+### Setup (manual build: the hard way)
  125
+
  126
+If you really want to build LLVM and OProfile manually or if you're using an
  127
+other Linux distribution and the distribution doesn't provide OProfile-enabled
  128
+ LLVM packages, manually build LLVM and OProfile like this:
  129
+
  130
+(I tested this on Ubuntu 12.10. Minor adjustments may be needed to build on
  131
+your environment)
  132
+
  133
+Build and Install OProfile:
  134
+
  135
+    $ sudo apt-get build-dep oprofile # do equivalent thing on your distro.
  136
+    $ cd /path/to/working-dir-to-build-things
  137
+    $ wget http://prdownloads.sourceforge.net/oprofile/oprofile-0.9.8.tar.gz
  138
+    $ tar -xf oprofile-0.9.8.tar.gz
  139
+    $ cd oprofile-0.9.8
  140
+    $ ./autogen.sh
  141
+    $ ./configure --prefix /usr # LLVM has a build issue. So install to /usr
  142
+    $ make
  143
+    $ sudo make install
  144
+    $ adduser oprofile # this is needed for profiling JITted code
  145
+    $ opreport --version
  146
+      # => opreport: oprofile 0.9.8 compiled on Mar  8 2013 00:57:08
  147
+
  148
+Build LLVM with OProfile support enabled and rebuild Rubinius:
  149
+
  150
+    $ sudo apt-get build-dep llvm # do equivalent thing on your distro.
  151
+    $ cd /path/to/working-dir-to-build-things
  152
+    $ wget http://llvm.org/releases/3.2/llvm-3.2.src.tar.gz
  153
+    $ tar -xf llvm-3.2.src.tar.gz
  154
+    $ cd llvm-3.2.src
  155
+    $ ./configure --enable-optimized --disable-assertions --with-oprofile
  156
+    $ make
  157
+    $ sudo make install
  158
+
  159
+If the compilation of `OProfileWrapper.cpp` fails like this:
  160
+
  161
+    OProfileWrapper.cpp: In member function ‘bool llvm::OProfileWrapper::checkForOProfileProcEntry()’:
  162
+    OProfileWrapper.cpp:141:62: error: ‘read’ was not declared in this scope
  163
+    OProfileWrapper.cpp:142:24: error: ‘close’ was not declared in this scope
  164
+
  165
+Apply this patch, then, re-`make` and continue:
  166
+
  167
+    diff --git a/lib/ExecutionEngine/OProfileJIT/OProfileWrapper.cpp b/lib/ExecutionEngine/OProfileJIT/OProfileWrapper.cpp
  168
+    index d67f537..7c0d395 100644
  169
+    --- a/llvm-3.1-3.1/lib/ExecutionEngine/OProfileJIT/OProfileWrapper.cpp
  170
+    +++ b/llvm-3.1-3.1/lib/ExecutionEngine/OProfileJIT/OProfileWrapper.cpp
  171
+    @@ -29,6 +29,7 @@
  172
+     #include <dirent.h>
  173
+     #include <sys/stat.h>
  174
+     #include <fcntl.h>
  175
+    +#include <unistd.h>
  176
+     
  177
+     namespace {
  178
+     
  179
+
  180
+Phew, finally rebuild Rubinius!:
  181
+
  182
+    $ cd /path/to/rubinius-git-repository
  183
+    $ rake clean
  184
+    $ rm -rf vendor/llvm # If you build Rubinius with vendorized LLVM.
  185
+    $ ./configure
  186
+    $ rake
  187
+
  188
+### Start OProfile
  189
+
  190
+OProfile can be configured by a command called `opcontrol`, not by a
  191
+configuration file, which are common.
  192
+
  193
+    $ sudo opcontrol --deinit
  194
+    $ sudo modprobe oprofile timer=1      # Needed only inside VirtualBox VMs
  195
+    $ echo 0 | sudo tee /proc/sys/kernel/nmi_watchdog
  196
+    $ sudo opcontrol --no-vmlinux
  197
+    $ sudo opcontrol --separete all
  198
+    $ sudo opcontrol --start
  199
+    $ sudo opcontrol --dump               # Flushes buffered raw profile data
  200
+    $ opreport --merge all --threshold 1  # Prints the report of profile
  201
+
  202
+Setup is complete if you see an output from `opreport` like this:
  203
+
  204
+    CPU: CPU with timer interrupt, speed 1858.39 MHz (estimated)
  205
+    Profiling through timer interrupt
  206
+              TIMER:0|
  207
+      samples|      %|
  208
+    ------------------
  209
+        92127 67.6157 no-vmlinux
  210
+        21920 16.0880 rbx
  211
+         7950  5.8348 libc-2.15.so
  212
+         4068  2.9857 runner
  213
+         3429  2.5167 libstdc++.so.6.0.17
  214
+         2139  1.5699 cc1
  215
+         1416  1.0393 vm
  216
+
  217
+Congratulations!
  218
+
  219
+### Run Ruby code!
  220
+
  221
+To annotate Ruby code correctly, your current directory must be the top
  222
+directory of the Rubinius git repository:
  223
+
  224
+    $ cd /path/to/rubinius-git-repository
  225
+    $ ./bin/benchmark ./benchmark/core/hash/bench_access.rb
  226
+
  227
+### Generate profile report
  228
+
  229
+Let's check the profile report of the above benchmark.
  230
+
  231
+NOTE: try to run `opcontrol --dump` at least once while running Rubinius if
  232
+JIT-ted Ruby code doesn't show in the profile report. OProfile automatically
  233
+dumps periodically, but if your Ruby process terminates too quickly, you
  234
+should manually run `opcontrol --dump` while it's still alive.
  235
+
  236
+    $ sudo opcontrol --dump && sudo opjitconv /var/lib/oprofile/ 0 0
  237
+    $ opreport --merge all --threshold 1 image:./bin/rbx --symbols --debug-info \
  238
+        > /tmp/hash-access-symbols
  239
+    $ opannotate --merge all --threshold 0.5 image:./bin/rbx --source \
  240
+        > /tmp/hash-access-source
  241
+
  242
+I'll omit but you can generate profile report of annotated assembly by passing
  243
+`--assembly` to `opannotate` instead of `--source`.
  244
+
  245
+### Reset current profile
  246
+
  247
+By default, OProfile keeps the profile data indefinitely. To reset it, run
  248
+this:
  249
+
  250
+    $ sudo opcontrol --reset
  251
+
  252
+### How to read the profile result
  253
+
  254
+Open `/tmp/hash-access-symbols`, the content should be like this:
  255
+
  256
+    CPU: CPU with timer interrupt, speed 1860.42 MHz (estimated)
  257
+    Profiling through timer interrupt
  258
+    samples  %        linenr info                 app name                 symbol name
  259
+    3571     31.5990  (no location information)   no-vmlinux               /no-vmlinux
  260
+    619       5.4774  bench_access.rb:16          8295.jo                  _X_Object#__block__$block@7
  261
+    599       5.3004  bench_access.rb:24          8295.jo                  _X_Object#__block__$block@11
  262
+    531       4.6987  bench_access.rb:8           8295.jo                  _X_Object#__block__$block@1
  263
+    457       4.0439  (no location information)   rbx                      __x86.get_pc_thunk.bx
  264
+    318       2.8139  jit_util.cpp:787            rbx                      rbx_set_local_depth
  265
+    298       2.6369  bench_access.rb:42          8295.jo                  _X_Object#__block__$block@20
  266
+    277       2.4511  object.cpp:362              rbx                      rubinius::Object::hash(rubinius::State*)
  267
+    236       2.0883  jit_primitives.cpp:4494     rbx                      jit_stub_object_hash
  268
+    236       2.0883  tuple.cpp:36                rbx                      rubinius::Tuple::put(rubinius::State*, int, rubinius::Object*)
  269
+    222       1.9644  inline_cache.cpp:696        rbx                      rubinius::InlineCache::check_cache_poly(rubinius::State*, rubinius::InlineCache*, rubinius::CallFrame*, rubinius::Arguments&)
  270
+    197       1.7432  jit_util.cpp:848            rbx                      rbx_push_local_depth
  271
+    187       1.6547  variable_scope.cpp:126      rbx                      rubinius::VariableScope::get_local(int)
  272
+    179       1.5839  compiledcode.cpp:212        rbx                      rubinius::CompiledCode::specialized_executor(rubinius::State*, rubinius::CallFrame*, rubinius::Executable*, rubinius::Module*, rubinius::Arguments&)
  273
+    162       1.4335  hash.rb:220                 8295.jo                  _X_Hash#key_index@15
  274
+    147       1.3008  hash.rb:325                 8295.jo                  _X_Hash#values_at$block@13
  275
+    143       1.2654  object.cpp:400              rbx                      rubinius::Object::hash_prim(rubinius::State*)
  276
+    137       1.2123  (no location information)   libc-2.15.so             /lib/i386-linux-gnu/libc-2.15.so
  277
+    127       1.1238  variable_scope.cpp:100      rbx                      rubinius::VariableScope::set_local(rubinius::State*, int, rubinius::Object*)
  278
+    111       0.9822  inline_cache.cpp:576        rbx                      rubinius::InlineCache::check_cache_reference(rubinius::State*, rubinius::InlineCache*, rubinius::CallFrame*, rubinius::Arguments&)
  279
+    108       0.9557  vm.cpp:175                  rbx                      rubinius::VM::new_object_typed_dirty(rubinius::Class*, unsigned int, rubinius::object_type)
  280
+    99        0.8760  vm.cpp:207                  rbx                      rubinius::VM::new_young_tuple_dirty(unsigned int)
  281
+    97        0.8583  array18.rb:6                8295.jo                  _X_Array#map@12
  282
+    93        0.8229  integer.cpp:84              rbx                      rubinius::Integer::from(rubinius::State*, int)
  283
+    90        0.7964  hash.rb:137                 8295.jo                  _X_Hash#[]@23
  284
+    76        0.6725  array.cpp:205               rbx                      rubinius::Array::set(rubinius::State*, int, rubinius::Object*)
  285
+    72        0.6371  linkedlist.cpp:27           rbx                      LinkedList::remove(LinkedList::Node*)
  286
+    68        0.6017  tuple.cpp:64                rbx                      rubinius::Tuple::create(rubinius::State*, int)
  287
+    57        0.5044  method_primitives.cpp:12327 rbx                      rubinius::Primitives::object_hash(rubinius::State*, rubinius::CallFrame*, rubinius::Executable*, rubinius::Module*, rubinius::Arguments&)
  288
+
  289
+As you can guess, symbol names beginning with `_X_` is the JIT-ted Ruby code.
  290
+Many Ruby benchmark blocks are JIT-ted (like
  291
+`_X_Object#__block__$block@7` at `bench_access.rb:16`) and listed near the top.
  292
+Also, `_X_Array#map@12` is a JIT-ted code of `Array#map`. I'll explain the
  293
+format of the report using it as an example:
  294
+
  295
+    97        0.8583  array18.rb:6                8295.jo                  _X_Array#map@12
  296
+
  297
+`97` is the number of counts OProfile found what Rubinius was executing, while
  298
+periodically sampling it.
  299
+
  300
+`0.8583` is the percentage to the total number of samples OProfile collected.
  301
+Note that summing all entries up doesn't equal to 100%, because `opreport` only
  302
+reported the top part of whole profile result (by `--threshold 0.5`).
  303
+
  304
+[`array18.rb:6`](https://github.com/rubinius/rubinius/blob/1d7d7b2e2880478776476089d4dd93fd97aff122/kernel/bootstrap/array18.rb#L6)
  305
+is the source location where this method is defined (this is same as
  306
+`Method#source_location`).
  307
+
  308
+`8295.jo` is a special `app name` for JIT-ted code. Usually `app name` is the
  309
+name of file C/C++ functions reside in (shared libraries or executables).
  310
+So, for normal Rubinius' C++ functions, `app name` is just `rbx` because they
  311
+reside in `./bin/rbx`. For JIT-ted Ruby code, there can be no meaningful `app
  312
+name`, because JIT-ted Ruby code is generated at runtime and they reside in no
  313
+file, but only in memory. So, OProfile uses `*.jo` as `app name` in such cases.
  314
+And, `8295` means this is profiled when running the process of PID 8295.
  315
+
  316
+`_X_Array#map@12` is the name of symbol for this entry of profile report
  317
+(duh!).
  318
+
  319
+Open `/tmp/hash-access-source`, the content should be like this (I'll omit some
  320
+unimportant part, because this file is big):
  321
+
  322
+    /* 
  323
+     * Total samples for file : "kernel/common/hash18.rb"
  324
+     * 
  325
+     *   4870 12.8904
  326
+     */
  327
+
  328
+    <credited to line zero>     19  0.0503 :
  329
+                   :# -*- encoding: us-ascii -*-
  330
+                   :
  331
+                   :class Hash
  332
+                   :
  333
+                   :  include Enumerable
  334
+                   :
  335
+    ...
  336
+                   :
  337
+                   :  class Bucket
  338
+                   :    attr_accessor :key
  339
+                   :
  340
+       633  1.6755 :    def match?(key, key_hash) /* _X_Hash::Bucket#match?@16     75  0.1985, _X_Hash::Bucket#match?@16     45  0.1191, total:    120  0.3176 */
  341
+        40  0.1059 :      case key
  342
+       218  0.5770 :      when Symbol, Fixnum
  343
+       319  0.8444 :        return key.equal?(@key)
  344
+                   :      end
  345
+                   :
  346
+                   :      @key_hash == key_hash and (Rubinius::Type::object_equal(key, @key) or key.eql?(@key))
  347
+                   :    end
  348
+                   :  end
  349
+    ...
  350
+                   :end
  351
+    ...
  352
+
  353
+You might wonder why `Hash::Bucket#match?` has so many counts while this didn't
  354
+appear in the report for symbols. The reason is that the method is inlined to
  355
+each its callers (maybe many top-appearing JIT-ted Ruby blocks). How cool this
  356
+is! You can really know which Ruby code is actually taking too much time.
  357
+
  358
+### My last favor
  359
+
  360
+I wish the official Ubuntu packages are fixed and there is no need to add any
  361
+PPA. I want it-just-works. I reported affecting bugs to Ubuntu's bug tracking
  362
+system. So, please vote it up for fixing (vote them with "affects you?",
  363
+please!!):
  364
+
  365
+- [Bug 1154025](https://launchpad.net/bugs/1154025) Install PIC version of libbfd.a
  366
+- [Bug 1148682](https://launchpad.net/bugs/1148682) symbol lookup error: /usr/lib/libopagent.so: undefined symbol: bfd_init
  367
+- [Bug 1148529](https://launchpad.net/bugs/1148529) OProfile support is disabled
  368
+
  369
+### Further profiling
  370
+
  371
+There is new profiling tool on Linux: `perf` and `operf` (OProfile's new
  372
+command). I don't know them well, but certainly they will be useful for
  373
+profiling Rubinius.
  374
+
  375
+As a last note, this is partially based on [Unladen Swallow's
  376
+information](http://code.google.com/p/unladen-swallow/wiki/UsingOProfile)
  377
+of profiling LLVM-based JIT-ted code by OProfile.
  378
+
  379
+Happy profiling!

0 notes on commit 6fe72d0

Please sign in to comment.
Something went wrong with that request. Please try again.