Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated RubiniusVM metrics for Rubinius 2.3.0. #197

Merged
merged 1 commit into from
Feb 17, 2015
Merged

Updated RubiniusVM metrics for Rubinius 2.3.0. #197

merged 1 commit into from
Feb 17, 2015

Conversation

yorickpeterse
Copy link
Contributor

The commit 6038c791526f48b7d1b18d663432967a1e176d0d says it all. I'm currently deploying this to a production application to see how things look when presented in the RPM web interface. I'll update this pull-request once things look good from my end.

@yorickpeterse
Copy link
Contributor Author

Worth mentioning: looking at the data I'm having some doubts about my heap_free calculation being correct. I'll do some extra digging into this.

@yorickpeterse
Copy link
Contributor Author

Another note: I'll do a rebase once I've figured things out properly.

@jasonrclark
Copy link
Contributor

Thanks for sending this along @yorickpeterse! Let me know once you've got things figured, and I'll be happy to pull the changes in. We'll probably get Rubinius in our CI bumped to 2.3 next week, so that'll be a good time to add your changes if they're available.

@yorickpeterse
Copy link
Contributor Author

Allrighty, I rebased my commits together and this is ready to be merged from my end.

@jasonrclark
Copy link
Contributor

Thanks @yorickpeterse. We're still pending on getting our CI up and running with rbx 2.3, so this will likely get into the newrelic_rpm release following 3.9.8 which is due out very soon.

Very much appreciate your help on getting this all lined up right! ✨

@yorickpeterse
Copy link
Contributor Author

Any updates on this? I'll be moving over more applications to Rubinius in the coming weeks so it would be nice having this in as it saves me from monkey-patching this in every application.

@benweint
Copy link
Contributor

Hey @yorickpeterse!

We use ruby-build to build the Rubies that we test against in CI, so until rbx-2.3, and rbx-2.4 are available there, we're kind of blocked. Work on that seems to be happening in rbenv/ruby-build#664 and rbenv/ruby-build#664.

@yorickpeterse
Copy link
Contributor Author

With ruby-build now supporting Rubinius 2.3 & friends (rbenv/ruby-build#664 (comment)) what's the status of this?

@jasonrclark
Copy link
Contributor

Hey @yorickpeterse. Had seen that but hadn't gotten time yet to update our CI boxes. Will try to do that, probably first thing next week, and then get this merged.

We do really appreciate the work, and sorry for all the delay in getting in pulled in!

@yorickpeterse
Copy link
Contributor Author

No problem, just checking up on this :)

@yorickpeterse
Copy link
Contributor Author

Any news on this, anything I can help with?

@benweint
Copy link
Contributor

benweint commented Feb 5, 2015

Hi @yorickpeterse! This has been merged to our internal development branch and is just awaiting release, which should be coming soon. We'll write back here and let you know when it's released.

@benweint
Copy link
Contributor

Hey @yorickpeterse - we finally got around to actually trying this out with a test Rubinius app, and it looks like we're hitting a Rubinius bug of some kind related to Bignum. Here's what happens:

  1. The sampler runs once, and saves an initial snapshot. In that snapshot, total_allocated_object is recorded with a value of 0, represented as a Bignum (I'm not sure why 0 is being represented as a Bignum here).
  2. The sampler runs a second time, and records a larger value for total_allocated_object into a second snapshot. This value is represented as a Fixnum rather than a Bignum.
  3. We attempt to take a delta for each of the fields in the two snapshots, and so end up doing Fixnum(N) - Bignum(0), and this fails with an exception.
  4. We catch the exception, and halt the VM sampler because of it, so it never runs again.

The exception that occurs looks like this:

[02/12/15 11:01:32 -0800 koan.local (95039)] WARN : TypeError: Tried to use non-reference value 0x1 as type Bignum (10)
[02/12/15 11:01:32 -0800 koan.local (95039)] DEBUG : Debugging backtrace:
/Users/ben/src/ruby_agent/lib/new_relic/agent/samplers/vm_sampler.rb:76:in `record_delta'
  /Users/ben/src/ruby_agent/lib/new_relic/agent/samplers/vm_sampler.rb:114:in `poll'
  /Users/ben/src/ruby_agent/lib/new_relic/agent/sampler_collection.rb:41:in `poll_samplers'
<...snip...>

I can reproduce this in an IRB session on rbx-2.5.0 as follows (the coerce weirdness was the only way I could find to produce a Bignum with a value of 0):

irb(main):010:0> 9000 - (0x3FFFFFFFFFFFFFFF+1).coerce(0).first
TypeError: Tried to use non-reference value 0x1 as type Bignum (10)
    from (irb):10
    from kernel/common/block_environment.rb:53:in `call_on_instance'
    from kernel/common/eval.rb:176:in `eval'
    from kernel/common/kernel.rb:510:in `loop'
    from kernel/bootstrap/proc.rb:20:in `call'
    from kernel/common/throw_catch.rb:30:in `catch'
    from kernel/common/throw_catch.rb:8:in `register'
    from kernel/common/throw_catch.rb:29:in `catch'
    from kernel/bootstrap/proc.rb:20:in `call'
    from kernel/common/throw_catch.rb:30:in `catch'
    from kernel/common/throw_catch.rb:8:in `register'
    from kernel/common/throw_catch.rb:29:in `catch'
    from /Users/ben/.gem/rbx/2.1.0/gems/rubysl-irb-2.1.1/bin/irb:12:in `__script__'
    from kernel/common/kernel.rb:497:in `load'
    from /Users/ben/.gem/rbx/2.1.0/bin/irb:23:in `__script__'
    from kernel/delta/code_loader.rb:66:in `load_script'
    from kernel/delta/code_loader.rb:152:in `load_script'
    from kernel/loader.rb:645:in `script'

On MRI (2.1.4, haven't tried others), this appears to work correctly:

irb(main):001:0> 9000 - (0x3FFFFFFFFFFFFFFF+1).coerce(0).first
=> 9000
irb(main):002:0> (0x3FFFFFFFFFFFFFFF+1).coerce(0).first.class
=> Bignum

So, two questions for you:

  1. Is it intended that we'd ever get a Bignum value of 0 for total_allocated_object?
  2. Does the subtraction issue that I can reproduce in IRB seem worthy of an rbx bug report to you?

We're quite close to release, so I don't know if we're going to be able to get this fixed before then, but hopefully we can get it fixed in a follow-up release. Sorry for not finding this sooner!

@yorickpeterse
Copy link
Contributor Author

Huh, that's interesting as I don't recall having any of these problems when running the code in some of my applications. Either way, no, the metrics used for total_allocated_object should be returned as a Fixnum unless you would have more than 4611686018427387903 objects allocated. In an empty IRB session running Rubinius::Metrics.data['memory.large.objects.total'] returns 2185 for me, so getting it to overflow to a Bignum would be quite the challenge.

Is there some script I can run to reproduce this problem?

Does the subtraction issue that I can reproduce in IRB seem worthy of an rbx bug report to you?

I'm not 100% sure but it never hurts reporting it as an issue :)

@benweint
Copy link
Contributor

@yorickpeterse yeah, there's something strange going on with the query against Rubinius::Metrics. In an irb session, I can reliably get a non-zero Fixnum out of Rubinius::Metrics.data[:'memory.large.objects.total'], but when running from a script, I cannot. You can try the script in this gist.

For me, the first ~10 calls to Rubinius::Metrics.data return 0-valued Bignums using this script.

After that, the object allocation counter never increments. Perhaps I'm misunderstanding what memory.large.objects.total is meant to represent, or I'm not actually triggering object allocations in the script due to some optimization in Rubinius? Just so we're on the same page, is there some documentation that describes exactly what memory.large.objects.total is meant to represent? I'm operating under the assumption that it's a monotonically increasing counter the is incremented once for each object allocation in the host process.

@benweint
Copy link
Contributor

(FWIW, I'm running on rbx-2.5.0, but I see there's a newer version - I'll try on the latest and see if this behavior still manifests.)

@yorickpeterse
Copy link
Contributor Author

The specifics on large/young/etc objects can be found at http://rubini.us/doc/en/memory-system/object-layout/. The gist of it is that large objects are, well, larger than regular ones. As such simply running Object.new won't increment this metric if I'm not mistaken. However, the counter simply being 0 and returning a Bignum seems like a bug in the underlying metrics code. I tested this on 2.5.2 which also has the problem. Unfortunately I can't try master since @brixen broke it for Linux :P

Which reminds me, the current metric used for the total amount of allocated objects in this PR is most likely wrong as it only counts the large objects. Since I'm not entirely sure if that's correct or not I'm summoning the mighty @brixen for the question: what exactly is the best way to get the total amount of allocated objects at a given point in time (without relying on ObjectSpace)?

@benweint
Copy link
Contributor

Ok, cool. So I think we have three issues then :)

  1. The fact that subtracting a 0-valued Bignum from a Fixnum raises an exception
  2. The fact that Rubinius::Metrics.data['memory.large.objects.total'] returns 0 as a Bignum
  3. The semantic mismatch between memory.large.objects.total (a subset of all object allocations) and our total_allocated_object key (meant to represent all object allocations).

Of these, 1 looks like the most clear-cut rbx bug, so I'll file an issue for that.

@yorickpeterse
Copy link
Contributor Author

I applied a few fixes for proper counting of total objects and the heap size (apparently I butchered both those metrics). I also made sure that RubiniusVM#supports? exists to keep it compatible with the other VMs.

Interesting enough I now get the following test failures:

/home/yorickpeterse/.rubies/rbx-2.5.2/bin/rbx -I"lib:/home/yorickpeterse/Private/Projects/ruby/rpm/test:/home/yorickpeterse/Private/Projects/ruby/rpm/lib" -I"/home/yorickpeterse/.gem/rbx/2.1.0/gems/rake-10.1.0/lib" "/home/yorickpeterse/.gem/rbx/2.1.0/gems/rake-10.1.0/lib/rake/rake_test_loader.rb" "/home/yorickpeterse/Private/Projects/ruby/rpm/test/new_relic/**/*_test.rb" 
Running tests in standalone mode.
Skipping tests in /home/yorickpeterse/Private/Projects/ruby/rpm/test/new_relic/agent/instrumentation/active_record_subscriber_test.rb because Rails >= 4 is unavailable
Run options: --seed 24917

# Running tests:

...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................EE................................................................................................................................................................................................................................................................................

Finished tests in 31.645260s, 66.2342 tests/s, 506.8058 assertions/s.

  1) Failure:
NewRelic::Agent::StatsEngine::GCProfilerTest#test_gc_profiler_clear_does_not_reset_count [/home/yorickpeterse/Private/Projects/ruby/rpm/test/new_relic/agent/stats_engine/gc_profiler_test.rb:97]:
Expected 856 to be < 856.

  2) Error:
NewRelic::MetricSpecTest#test_initialize_can_take_a_nil_name:
TypeError: Coercion error: nil.to_str => String failed
    kernel/common/type.rb:36:in `coerce_to_failed'
    kernel/common/type.rb:26:in `execute_coerce_to'
    kernel/common/type.rb:19:in `coerce_to'
    kernel/common/kernel.rb:147:in `StringValue'
    kernel/common/string.rb:1548:in `replace'
    kernel/common/string.rb:620:in `to_s'
    /home/yorickpeterse/Private/Projects/ruby/rpm/lib/new_relic/metric_spec.rb:15:in `initialize'
    /home/yorickpeterse/Private/Projects/ruby/rpm/test/new_relic/metric_spec_test.rb:84:in `test_initialize_can_take_a_nil_name'

  3) Error:
NewRelic::MetricSpecTest#test_initialize_can_take_a_non_string_name:
TypeError: Coercion error: #<#<Class:0xf1090>:0xf1094 @value="name">.to_str => String failed
    kernel/common/type.rb:36:in `coerce_to_failed'
    kernel/common/type.rb:26:in `execute_coerce_to'
    kernel/common/type.rb:19:in `coerce_to'
    kernel/common/kernel.rb:147:in `StringValue'
    kernel/common/string.rb:1548:in `replace'
    kernel/common/string.rb:620:in `to_s'
    /home/yorickpeterse/Private/Projects/ruby/rpm/lib/new_relic/metric_spec.rb:15:in `initialize'
    /home/yorickpeterse/Private/Projects/ruby/rpm/test/new_relic/metric_spec_test.rb:93:in `test_initialize_can_take_a_non_string_name'

2096 tests, 16038 assertions, 1 failures, 2 errors, 0 skips
rake aborted!
Command failed with status (1): [ruby -I"lib:/home/yorickpeterse/Private/Projects/ruby/rpm/test:/home/yorickpeterse/Private/Projects/ruby/rpm/lib" -I"/home/yorickpeterse/.gem/rbx/2.1.0/gems/rake-10.1.0/lib" "/home/yorickpeterse/.gem/rbx/2.1.0/gems/rake-10.1.0/lib/rake/rake_test_loader.rb" "/home/yorickpeterse/Private/Projects/ruby/rpm/test/new_relic/**/*_test.rb" ]
kernel/bootstrap/proc.rb:20:in `call'
kernel/bootstrap/proc.rb:20:in `call'
kernel/bootstrap/array.rb:76:in `each'
kernel/bootstrap/array.rb:76:in `each'
kernel/bootstrap/array.rb:76:in `each'
kernel/bootstrap/array.rb:76:in `each'
kernel/common/kernel.rb:497:in `load'
kernel/delta/code_loader.rb:66:in `load_script'
kernel/delta/code_loader.rb:152:in `load_script'
kernel/loader.rb:655:in `script'
kernel/loader.rb:809:in `main'
Tasks: TOP => default => test => test:newrelic
(See full trace by running task with --trace)

Also, is there a preference to have the various commits rebased into a single one? Doing so is no problem for me.

@benweint
Copy link
Contributor

Thanks, @yorickpeterse! Rebasing would be cool if it's easy for you, since I could then just cherry-pick your fixes onto the branch we're about to release. I'll take a look at those test failures and see if I can figure out what's up there, too.

Rubinius 2.3 introduced a new constant: Rubinius::Metrics. This constant, and in
particular its method Rubinius::Metrics.data, contain periodically updated
metrics. This includes garbage collection data, information about locks, thread
counts, etc.

This commit updates the metrics collection process for Rubinius to make use of
Rubinius::Metrics. This also comes with support for aggregating data such as the
heap slots and method cache resets. Constant cache resets are not yet available.

I also rewrote the tests to not use any mocks (these were breaking parts of
Rubinius by the looks of it) as well as to make sure the important parts are
actually tested.

Tracking of free heap slots is not supported by Rubinius as it doesn't use a
memory allocation mechanism that produces free heap slots.
@yorickpeterse
Copy link
Contributor Author

Rebase done.

@jemc
Copy link

jemc commented Feb 12, 2015

Regarding "The fact that Rubinius::Metrics.data['memory.large.objects.total'] returns 0 as a Bignum":

If it's desired this could be "fixed" by making the numeric metrics do a Bignum::normalize before returning, so that numbers small enough to be a Fixnum would be returned as a Fixnum.

@jemc
Copy link

jemc commented Feb 12, 2015

@brixen just fixed Metrics to return either Fixnum or Bignum - rubinius/rubinius@b0825d8

@benweint
Copy link
Contributor

@brixen and @jemc thanks for the quick fix on that!

The latest changes still fail on extant Rubinius versions because Rubinius::Metrics returns Bignum 0s for some of the other metrics (gc.immix.count for example), and I don't see a Bignum::normalize method that is accessible from Ruby, but doing a + 0 seems to have the same effect of truncating to a Fixnum if it's possible without loss, so I think I'm going to do that for now.

@jemc
Copy link

jemc commented Feb 13, 2015

@benweint
Yeah, Bignum::normalize is only in C++.

Fixnum minus Bignum is now fixed on master, rubinius/rubinius@0318b9a , but as you said, this doesn't help for rubinius versions already released.

Thanks for your work in finding the minimal repro!

@newrelicbot newrelicbot merged commit c6aba79 into newrelic:master Feb 17, 2015
@yorickpeterse
Copy link
Contributor Author

Thanks! 👍

@benweint
Copy link
Contributor

Yeah, thank you! Sorry it took so long to get this all figured out, but you should be good to go with newrelic_rpm version 3.10.0.

@brixen
Copy link

brixen commented Feb 17, 2015

@benweint awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants