Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative memory used by Erlang VM calculation strategy #1223

Closed
gerhard opened this issue May 15, 2017 · 3 comments
Closed

Alternative memory used by Erlang VM calculation strategy #1223

gerhard opened this issue May 15, 2017 · 3 comments

Comments

@gerhard
Copy link
Contributor

gerhard commented May 15, 2017

We have seen the following scenario play out many times:

  • Memory alarm is not triggered, publishers continue publishing
  • RabbitMQ claims more memory via the Erlang VM
  • Erlang VM claims more memory from the Kernel
  • Kernel cannot allocate more memory since there isn't enough system memory available
  • Kernel kills Erlang VM

This erlang-questions thread explains some of the discrepancies: the kernel and the runtime use different approaches to accounting and the kernel is blissfully unaware of whatever pre-allocation strategies a process can use.

We are proposing to use the memory as reported by the kernel in all places that erlang:memory/0 is called. When setting {use_system_memory_reporting,1}, we will try to use the process memory as reported by the system for vm_memory_high_watermark, most likely via the /proc/pid subsystem. If this is not possible, we will fallback to the current implementation.

We are thinking of introducing this as an experimental feature in 3.6 and learning whether this can be a viable default in 3.7

A few related issues that could benefit from this: #993 & #965.

UPD: the option name was changed to rabbit.vm_memory_calculation_strategy (which can be rss or erlang, the former is the new default, the latter is used to go to the earlier calculation strategy which is less accurate, often significantly so).

@michaelklishin
Copy link
Member

michaelklishin commented May 15, 2017

Windows is particularly problematic here because gathering certain OS level metrics currently requires an external tool to be installed. The plan is to fall back to the less precise implementation if that's not available.

Also worth pointing out that we already obtain the total amount of memory in an OS-specific way (such as /proc/meminfo on Linux).

@michaelklishin michaelklishin changed the title Alternative memory reporting for memory alarm Alternative memory used by Erlang VM calculation strategy May 15, 2017
hairyhum pushed a commit that referenced this issue Jun 12, 2017
Memory reported by erlang:memory(total) is not supposed to
be equal to the total size of all pages mapped to the emulator,
according to http://erlang.org/doc/man/erlang.html#memory-0
erlang:memory(total) under-reports memory usage by around 20%

[#1223]
[#145451399]
@michaelklishin michaelklishin modified the milestones: 3.6.11, 3.6.x Jun 14, 2017
@michaelklishin
Copy link
Member

A follow-up change for Windows: #1270.

gerhard added a commit that referenced this issue Jul 10, 2017
As discussed with @michaelklishin:

    We discovered that `erlang:memory(system).` can be almost as large
    as OsTotal when swapping is in effect. This means that total
    (processes + system) will be larger than the OsTotal, therefore
    OsTotal - ErlangTotal cannot be assumed to be non-negative. I think
    having some "unaccounted" memory is better than having it
    "accounted" as negative.

re #1223

[finishes #148435813]
@michaelklishin
Copy link
Member

michaelklishin commented Oct 6, 2017

An update: due to what we've learned in #1343 and a few other places, as of 3.6.13 there are further adjustments to the strategy: as of rabbitmq/rabbitmq-common#225 it now uses runtime's allocators stats (which supposedly track every single malloc performed). This means no external tools are invoked. In addition, as of rabbitmq/rabbitmq-common#221 we avoid frequent calls to the function in question: it is now invoked once a second.

Because existing strategy names no longer make sense with these changes, we renamed them to allocated (née rss) and legacy (née erlang). rss and erlang are still supported for backwards compatibility, allocated/rss is still the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants