Updated RubiniusVM metrics for Rubinius 2.3.0. #197

yorickpeterse · 2014-11-10T14:25:01Z

The commit 6038c791526f48b7d1b18d663432967a1e176d0d says it all. I'm currently deploying this to a production application to see how things look when presented in the RPM web interface. I'll update this pull-request once things look good from my end.

yorickpeterse · 2014-11-10T14:46:36Z

Worth mentioning: looking at the data I'm having some doubts about my heap_free calculation being correct. I'll do some extra digging into this.

yorickpeterse · 2014-11-10T15:39:00Z

Another note: I'll do a rebase once I've figured things out properly.

jasonrclark · 2014-11-10T23:10:51Z

Thanks for sending this along @yorickpeterse! Let me know once you've got things figured, and I'll be happy to pull the changes in. We'll probably get Rubinius in our CI bumped to 2.3 next week, so that'll be a good time to add your changes if they're available.

yorickpeterse · 2014-11-24T17:07:33Z

Allrighty, I rebased my commits together and this is ready to be merged from my end.

jasonrclark · 2014-11-24T22:13:02Z

Thanks @yorickpeterse. We're still pending on getting our CI up and running with rbx 2.3, so this will likely get into the newrelic_rpm release following 3.9.8 which is due out very soon.

Very much appreciate your help on getting this all lined up right! ✨

yorickpeterse · 2014-12-15T20:55:11Z

Any updates on this? I'll be moving over more applications to Rubinius in the coming weeks so it would be nice having this in as it saves me from monkey-patching this in every application.

benweint · 2014-12-15T21:14:07Z

Hey @yorickpeterse!

We use ruby-build to build the Rubies that we test against in CI, so until rbx-2.3, and rbx-2.4 are available there, we're kind of blocked. Work on that seems to be happening in rbenv/ruby-build#664 and rbenv/ruby-build#664.

yorickpeterse · 2015-01-14T23:56:52Z

With ruby-build now supporting Rubinius 2.3 & friends (rbenv/ruby-build#664 (comment)) what's the status of this?

jasonrclark · 2015-01-15T16:38:02Z

Hey @yorickpeterse. Had seen that but hadn't gotten time yet to update our CI boxes. Will try to do that, probably first thing next week, and then get this merged.

We do really appreciate the work, and sorry for all the delay in getting in pulled in!

yorickpeterse · 2015-01-15T17:02:34Z

No problem, just checking up on this :)

yorickpeterse · 2015-02-05T13:52:42Z

Any news on this, anything I can help with?

benweint · 2015-02-05T16:09:46Z

Hi @yorickpeterse! This has been merged to our internal development branch and is just awaiting release, which should be coming soon. We'll write back here and let you know when it's released.

benweint · 2015-02-12T19:15:44Z

Hey @yorickpeterse - we finally got around to actually trying this out with a test Rubinius app, and it looks like we're hitting a Rubinius bug of some kind related to Bignum. Here's what happens:

The sampler runs once, and saves an initial snapshot. In that snapshot, total_allocated_object is recorded with a value of 0, represented as a Bignum (I'm not sure why 0 is being represented as a Bignum here).
The sampler runs a second time, and records a larger value for total_allocated_object into a second snapshot. This value is represented as a Fixnum rather than a Bignum.
We attempt to take a delta for each of the fields in the two snapshots, and so end up doing Fixnum(N) - Bignum(0), and this fails with an exception.
We catch the exception, and halt the VM sampler because of it, so it never runs again.

The exception that occurs looks like this:

[02/12/15 11:01:32 -0800 koan.local (95039)] WARN : TypeError: Tried to use non-reference value 0x1 as type Bignum (10)
[02/12/15 11:01:32 -0800 koan.local (95039)] DEBUG : Debugging backtrace:
/Users/ben/src/ruby_agent/lib/new_relic/agent/samplers/vm_sampler.rb:76:in `record_delta'
  /Users/ben/src/ruby_agent/lib/new_relic/agent/samplers/vm_sampler.rb:114:in `poll'
  /Users/ben/src/ruby_agent/lib/new_relic/agent/sampler_collection.rb:41:in `poll_samplers'
<...snip...>

I can reproduce this in an IRB session on rbx-2.5.0 as follows (the coerce weirdness was the only way I could find to produce a Bignum with a value of 0):

irb(main):010:0> 9000 - (0x3FFFFFFFFFFFFFFF+1).coerce(0).first
TypeError: Tried to use non-reference value 0x1 as type Bignum (10)
    from (irb):10
    from kernel/common/block_environment.rb:53:in `call_on_instance'
    from kernel/common/eval.rb:176:in `eval'
    from kernel/common/kernel.rb:510:in `loop'
    from kernel/bootstrap/proc.rb:20:in `call'
    from kernel/common/throw_catch.rb:30:in `catch'
    from kernel/common/throw_catch.rb:8:in `register'
    from kernel/common/throw_catch.rb:29:in `catch'
    from kernel/bootstrap/proc.rb:20:in `call'
    from kernel/common/throw_catch.rb:30:in `catch'
    from kernel/common/throw_catch.rb:8:in `register'
    from kernel/common/throw_catch.rb:29:in `catch'
    from /Users/ben/.gem/rbx/2.1.0/gems/rubysl-irb-2.1.1/bin/irb:12:in `__script__'
    from kernel/common/kernel.rb:497:in `load'
    from /Users/ben/.gem/rbx/2.1.0/bin/irb:23:in `__script__'
    from kernel/delta/code_loader.rb:66:in `load_script'
    from kernel/delta/code_loader.rb:152:in `load_script'
    from kernel/loader.rb:645:in `script'

On MRI (2.1.4, haven't tried others), this appears to work correctly:

irb(main):001:0> 9000 - (0x3FFFFFFFFFFFFFFF+1).coerce(0).first
=> 9000
irb(main):002:0> (0x3FFFFFFFFFFFFFFF+1).coerce(0).first.class
=> Bignum

So, two questions for you:

Is it intended that we'd ever get a Bignum value of 0 for total_allocated_object?
Does the subtraction issue that I can reproduce in IRB seem worthy of an rbx bug report to you?

We're quite close to release, so I don't know if we're going to be able to get this fixed before then, but hopefully we can get it fixed in a follow-up release. Sorry for not finding this sooner!

yorickpeterse · 2015-02-12T19:34:44Z

Huh, that's interesting as I don't recall having any of these problems when running the code in some of my applications. Either way, no, the metrics used for total_allocated_object should be returned as a Fixnum unless you would have more than 4611686018427387903 objects allocated. In an empty IRB session running Rubinius::Metrics.data['memory.large.objects.total'] returns 2185 for me, so getting it to overflow to a Bignum would be quite the challenge.

Is there some script I can run to reproduce this problem?

Does the subtraction issue that I can reproduce in IRB seem worthy of an rbx bug report to you?

I'm not 100% sure but it never hurts reporting it as an issue :)

benweint · 2015-02-12T21:30:59Z

@yorickpeterse yeah, there's something strange going on with the query against Rubinius::Metrics. In an irb session, I can reliably get a non-zero Fixnum out of Rubinius::Metrics.data[:'memory.large.objects.total'], but when running from a script, I cannot. You can try the script in this gist.

For me, the first ~10 calls to Rubinius::Metrics.data return 0-valued Bignums using this script.

After that, the object allocation counter never increments. Perhaps I'm misunderstanding what memory.large.objects.total is meant to represent, or I'm not actually triggering object allocations in the script due to some optimization in Rubinius? Just so we're on the same page, is there some documentation that describes exactly what memory.large.objects.total is meant to represent? I'm operating under the assumption that it's a monotonically increasing counter the is incremented once for each object allocation in the host process.

benweint · 2015-02-12T21:39:00Z

(FWIW, I'm running on rbx-2.5.0, but I see there's a newer version - I'll try on the latest and see if this behavior still manifests.)

yorickpeterse · 2015-02-12T21:45:49Z

The specifics on large/young/etc objects can be found at http://rubini.us/doc/en/memory-system/object-layout/. The gist of it is that large objects are, well, larger than regular ones. As such simply running Object.new won't increment this metric if I'm not mistaken. However, the counter simply being 0 and returning a Bignum seems like a bug in the underlying metrics code. I tested this on 2.5.2 which also has the problem. Unfortunately I can't try master since @brixen broke it for Linux :P

Which reminds me, the current metric used for the total amount of allocated objects in this PR is most likely wrong as it only counts the large objects. Since I'm not entirely sure if that's correct or not I'm summoning the mighty @brixen for the question: what exactly is the best way to get the total amount of allocated objects at a given point in time (without relying on ObjectSpace)?

benweint · 2015-02-12T21:51:51Z

Ok, cool. So I think we have three issues then :)

The fact that subtracting a 0-valued Bignum from a Fixnum raises an exception
The fact that Rubinius::Metrics.data['memory.large.objects.total'] returns 0 as a Bignum
The semantic mismatch between memory.large.objects.total (a subset of all object allocations) and our total_allocated_object key (meant to represent all object allocations).

Of these, 1 looks like the most clear-cut rbx bug, so I'll file an issue for that.

yorickpeterse · 2015-02-12T23:12:31Z

I applied a few fixes for proper counting of total objects and the heap size (apparently I butchered both those metrics). I also made sure that RubiniusVM#supports? exists to keep it compatible with the other VMs.

Interesting enough I now get the following test failures:

/home/yorickpeterse/.rubies/rbx-2.5.2/bin/rbx -I"lib:/home/yorickpeterse/Private/Projects/ruby/rpm/test:/home/yorickpeterse/Private/Projects/ruby/rpm/lib" -I"/home/yorickpeterse/.gem/rbx/2.1.0/gems/rake-10.1.0/lib" "/home/yorickpeterse/.gem/rbx/2.1.0/gems/rake-10.1.0/lib/rake/rake_test_loader.rb" "/home/yorickpeterse/Private/Projects/ruby/rpm/test/new_relic/**/*_test.rb" 
Running tests in standalone mode.
Skipping tests in /home/yorickpeterse/Private/Projects/ruby/rpm/test/new_relic/agent/instrumentation/active_record_subscriber_test.rb because Rails >= 4 is unavailable
Run options: --seed 24917

# Running tests:

...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................EE................................................................................................................................................................................................................................................................................

Finished tests in 31.645260s, 66.2342 tests/s, 506.8058 assertions/s.

  1) Failure:
NewRelic::Agent::StatsEngine::GCProfilerTest#test_gc_profiler_clear_does_not_reset_count [/home/yorickpeterse/Private/Projects/ruby/rpm/test/new_relic/agent/stats_engine/gc_profiler_test.rb:97]:
Expected 856 to be < 856.

  2) Error:
NewRelic::MetricSpecTest#test_initialize_can_take_a_nil_name:
TypeError: Coercion error: nil.to_str => String failed
    kernel/common/type.rb:36:in `coerce_to_failed'
    kernel/common/type.rb:26:in `execute_coerce_to'
    kernel/common/type.rb:19:in `coerce_to'
    kernel/common/kernel.rb:147:in `StringValue'
    kernel/common/string.rb:1548:in `replace'
    kernel/common/string.rb:620:in `to_s'
    /home/yorickpeterse/Private/Projects/ruby/rpm/lib/new_relic/metric_spec.rb:15:in `initialize'
    /home/yorickpeterse/Private/Projects/ruby/rpm/test/new_relic/metric_spec_test.rb:84:in `test_initialize_can_take_a_nil_name'

  3) Error:
NewRelic::MetricSpecTest#test_initialize_can_take_a_non_string_name:
TypeError: Coercion error: #<#<Class:0xf1090>:0xf1094 @value="name">.to_str => String failed
    kernel/common/type.rb:36:in `coerce_to_failed'
    kernel/common/type.rb:26:in `execute_coerce_to'
    kernel/common/type.rb:19:in `coerce_to'
    kernel/common/kernel.rb:147:in `StringValue'
    kernel/common/string.rb:1548:in `replace'
    kernel/common/string.rb:620:in `to_s'
    /home/yorickpeterse/Private/Projects/ruby/rpm/lib/new_relic/metric_spec.rb:15:in `initialize'
    /home/yorickpeterse/Private/Projects/ruby/rpm/test/new_relic/metric_spec_test.rb:93:in `test_initialize_can_take_a_non_string_name'

2096 tests, 16038 assertions, 1 failures, 2 errors, 0 skips
rake aborted!
Command failed with status (1): [ruby -I"lib:/home/yorickpeterse/Private/Projects/ruby/rpm/test:/home/yorickpeterse/Private/Projects/ruby/rpm/lib" -I"/home/yorickpeterse/.gem/rbx/2.1.0/gems/rake-10.1.0/lib" "/home/yorickpeterse/.gem/rbx/2.1.0/gems/rake-10.1.0/lib/rake/rake_test_loader.rb" "/home/yorickpeterse/Private/Projects/ruby/rpm/test/new_relic/**/*_test.rb" ]
kernel/bootstrap/proc.rb:20:in `call'
kernel/bootstrap/proc.rb:20:in `call'
kernel/bootstrap/array.rb:76:in `each'
kernel/bootstrap/array.rb:76:in `each'
kernel/bootstrap/array.rb:76:in `each'
kernel/bootstrap/array.rb:76:in `each'
kernel/common/kernel.rb:497:in `load'
kernel/delta/code_loader.rb:66:in `load_script'
kernel/delta/code_loader.rb:152:in `load_script'
kernel/loader.rb:655:in `script'
kernel/loader.rb:809:in `main'
Tasks: TOP => default => test => test:newrelic
(See full trace by running task with --trace)

Also, is there a preference to have the various commits rebased into a single one? Doing so is no problem for me.

benweint · 2015-02-12T23:14:03Z

Thanks, @yorickpeterse! Rebasing would be cool if it's easy for you, since I could then just cherry-pick your fixes onto the branch we're about to release. I'll take a look at those test failures and see if I can figure out what's up there, too.

Rubinius 2.3 introduced a new constant: Rubinius::Metrics. This constant, and in particular its method Rubinius::Metrics.data, contain periodically updated metrics. This includes garbage collection data, information about locks, thread counts, etc. This commit updates the metrics collection process for Rubinius to make use of Rubinius::Metrics. This also comes with support for aggregating data such as the heap slots and method cache resets. Constant cache resets are not yet available. I also rewrote the tests to not use any mocks (these were breaking parts of Rubinius by the looks of it) as well as to make sure the important parts are actually tested. Tracking of free heap slots is not supported by Rubinius as it doesn't use a memory allocation mechanism that produces free heap slots.

yorickpeterse · 2015-02-12T23:36:20Z

Rebase done.

jemc · 2015-02-12T23:37:10Z

Regarding "The fact that Rubinius::Metrics.data['memory.large.objects.total'] returns 0 as a Bignum":

If it's desired this could be "fixed" by making the numeric metrics do a Bignum::normalize before returning, so that numbers small enough to be a Fixnum would be returned as a Fixnum.

jemc · 2015-02-12T23:52:42Z

@brixen just fixed Metrics to return either Fixnum or Bignum - rubinius/rubinius@b0825d8

benweint · 2015-02-13T00:20:13Z

@brixen and @jemc thanks for the quick fix on that!

The latest changes still fail on extant Rubinius versions because Rubinius::Metrics returns Bignum 0s for some of the other metrics (gc.immix.count for example), and I don't see a Bignum::normalize method that is accessible from Ruby, but doing a + 0 seems to have the same effect of truncating to a Fixnum if it's possible without loss, so I think I'm going to do that for now.

jemc · 2015-02-13T06:25:37Z

@benweint
Yeah, Bignum::normalize is only in C++.

Fixnum minus Bignum is now fixed on master, rubinius/rubinius@0318b9a , but as you said, this doesn't help for rubinius versions already released.

Thanks for your work in finding the minimal repro!

yorickpeterse · 2015-02-17T18:41:02Z

Thanks! 👍

benweint · 2015-02-17T18:41:54Z

Yeah, thank you! Sorry it took so long to get this all figured out, but you should be good to go with newrelic_rpm version 3.10.0.

brixen · 2015-02-17T20:46:06Z

@benweint awesome, thank you!

jc00ke mentioned this pull request Jan 1, 2015

Ensure the tests for these projects run on Travis rubinius/rubinius#3221

Closed

14 tasks

yorickpeterse mentioned this pull request Feb 12, 2015

Subtracting a 0-valued Bignum from a Fixnum raises a TypeError rubinius/rubinius#3316

Closed

newrelicbot merged commit c6aba79 into newrelic:master Feb 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated RubiniusVM metrics for Rubinius 2.3.0. #197

Updated RubiniusVM metrics for Rubinius 2.3.0. #197

yorickpeterse commented Nov 10, 2014

yorickpeterse commented Nov 10, 2014

yorickpeterse commented Nov 10, 2014

jasonrclark commented Nov 10, 2014

yorickpeterse commented Nov 24, 2014

jasonrclark commented Nov 24, 2014

yorickpeterse commented Dec 15, 2014

benweint commented Dec 15, 2014

yorickpeterse commented Jan 14, 2015

jasonrclark commented Jan 15, 2015

yorickpeterse commented Jan 15, 2015

yorickpeterse commented Feb 5, 2015

benweint commented Feb 5, 2015

benweint commented Feb 12, 2015

yorickpeterse commented Feb 12, 2015

benweint commented Feb 12, 2015

benweint commented Feb 12, 2015

yorickpeterse commented Feb 12, 2015

benweint commented Feb 12, 2015

yorickpeterse commented Feb 12, 2015

benweint commented Feb 12, 2015

yorickpeterse commented Feb 12, 2015

jemc commented Feb 12, 2015

jemc commented Feb 12, 2015

benweint commented Feb 13, 2015

jemc commented Feb 13, 2015

yorickpeterse commented Feb 17, 2015

benweint commented Feb 17, 2015

brixen commented Feb 17, 2015

Updated RubiniusVM metrics for Rubinius 2.3.0. #197

Updated RubiniusVM metrics for Rubinius 2.3.0. #197

Conversation

yorickpeterse commented Nov 10, 2014

yorickpeterse commented Nov 10, 2014

yorickpeterse commented Nov 10, 2014

jasonrclark commented Nov 10, 2014

yorickpeterse commented Nov 24, 2014

jasonrclark commented Nov 24, 2014

yorickpeterse commented Dec 15, 2014

benweint commented Dec 15, 2014

yorickpeterse commented Jan 14, 2015

jasonrclark commented Jan 15, 2015

yorickpeterse commented Jan 15, 2015

yorickpeterse commented Feb 5, 2015

benweint commented Feb 5, 2015

benweint commented Feb 12, 2015

yorickpeterse commented Feb 12, 2015

benweint commented Feb 12, 2015

benweint commented Feb 12, 2015

yorickpeterse commented Feb 12, 2015

benweint commented Feb 12, 2015

yorickpeterse commented Feb 12, 2015

benweint commented Feb 12, 2015

yorickpeterse commented Feb 12, 2015

jemc commented Feb 12, 2015

jemc commented Feb 12, 2015

benweint commented Feb 13, 2015

jemc commented Feb 13, 2015

yorickpeterse commented Feb 17, 2015

benweint commented Feb 17, 2015

brixen commented Feb 17, 2015