Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puma Memory abnormal memory consumption #1600

Closed
shirkavand opened this Issue Jun 21, 2018 · 13 comments

Comments

Projects
None yet
6 participants
@shirkavand
Copy link

shirkavand commented Jun 21, 2018

I have read several threads related to this topic:

a) Some of those links say that the problem is related with Ruby 2.2 and that the problem is fixed on 2.3

b) Some others say that using a Puma-configured-with-jemalloc (along some other tweaks) fix the problem

c) Some other - in this case heroku users - say that the did nothing and it just started working properly

I am using Ruby 2.4.0p0, Rails 5.1.6 and Puma 3.11.4 and still see abnormal memory consumption with Puma.

Side note: For testing purposes, i replaced Puma and installed Passenger. Using the same machine, same Rails, Same Ruby, Same JMeter tests for the same period of time, and had no issue at all. This is: The memory usage grows up to a plateau, and when the test finishes the memory return to 150Mb per worker (aprox)

Steps to reproduce

There are many ways, one simple way is:

  1. Create a Rails app from scratch

  2. Create a basic GET/POST endpoint (one that only returns head :ok)

  3. Start hitting the endpoint using some tool like JMeter for a long period of time

Expected behavior

The memory usage grows up to a plateau, and when the test finishes the memory return to 150Mb per worker (aprox). This happened locally on my development machine and on Heroku.

Actual behavior

The memory usage starts to grow and grow and never stops growing. Once the JMeter test finish, the memory never gets to 150Mb (or something) instead stays to the last value reached by the last request made by JMeter (i.e 2Gb per worker aprox). This happened locally on my development machine and on Heroku.

System configuration

Ruby version: 2.4.0p0 (2016-12-24 revision 57164) [x86_64-darwin16]
Rails version: 5.1.6
Puma version: 3.11.4

@njirap

This comment has been minimized.

Copy link

njirap commented Jul 2, 2018

We have also observed this with almost similar versions but on Ubuntu 16.04. OSX does not have this.
Puma version: 3.11.4
Rails version: 5.0.3
Ruby version: 2.4.1

@QuantamHD

This comment has been minimized.

Copy link

QuantamHD commented Jul 31, 2018

Had this issue as well. The Google Protobuf project also had a similar issue with Ruby 2.2 and was supposedly fixed in 2.3, but the issue could have persisted. In their case, protocolbuffers/protobuf#474, the issue seemed to be related to the ruby method rb_str_cat and by extension in the puma code base rb_str_cat2. These two calls show up only twice in total in the puma code base in puma_http11.c:211-212

void http_field(puma_parser* hp, const char *field, size_t flen,
                                 const char *value, size_t vlen)
{
  VALUE f = Qnil;
  VALUE v;

  VALIDATE_MAX_LENGTH(flen, FIELD_NAME);
  VALIDATE_MAX_LENGTH(vlen, FIELD_VALUE);

  f = find_common_field_value(field, flen);

  if (f == Qnil) {
    /*
     * We got a strange header that we don't have a memoized value for.
     * Fallback to creating a new string to use as a hash key.
     */

    size_t new_size = HTTP_PREFIX_LEN + flen;
    assert(new_size < BUFFER_LEN);

    memcpy(hp->buf, HTTP_PREFIX, HTTP_PREFIX_LEN);
    memcpy(hp->buf + HTTP_PREFIX_LEN, field, flen);

    f = rb_str_new(hp->buf, new_size);
  }

  /* check for duplicate header */
  v = rb_hash_aref(hp->request, f);

  if (v == Qnil) {
      v = rb_str_new(value, vlen);
      rb_hash_aset(hp->request, f, v);
  } else {
      /* if duplicate header, normalize to comma-separated values */
      rb_str_cat2(v, ", ");
      rb_str_cat(v, value, vlen);
  }
}

The Protobuf team created an alternate implementation which did not leak memory shown below

// This function is equivalent to rb_str_cat(), but unlike the real
// rb_str_cat(), it doesn't leak memory in some versions of Ruby.
// For more information, see:
//   https://bugs.ruby-lang.org/issues/11328
VALUE noleak_rb_str_cat(VALUE rb_str, const char *str, long len) {
  size_t oldlen = RSTRING_LEN(rb_str);
  rb_str_modify_expand(rb_str, len);
  char *p = RSTRING_PTR(rb_str);
  memcpy(p + oldlen, str, len);
  rb_str_set_len(rb_str, oldlen + len);
}

Puma version: 3.12.0
Rails: 5.1.6
Ruby: 2.3.7

@shirkavand

This comment has been minimized.

Copy link
Author

shirkavand commented Sep 18, 2018

@QuantamHD thanks for the reply, can you give me more guidance about what are the next steps to follow based on your answer? Do i have to patch the code with Protobuf's version somehow?

@1c7

This comment has been minimized.

Copy link

1c7 commented Jan 25, 2019

@shirkavand have you solve your problem?
it's been 4 months now (2018,9~2019,1)

@shirkavand

This comment has been minimized.

Copy link
Author

shirkavand commented Feb 26, 2019

@1c7 no, at the end we switched - sadly - to passenger. We found no way to avoid this memory leak. I know some new versions of puma have been released since then, but we have not tested those yet.

Do you have any insight of wether these new versions solve the problem or not?

@1c7

This comment has been minimized.

Copy link

1c7 commented Feb 27, 2019

@shirkavand
I am on subway now. reply you later

@wjordan

This comment has been minimized.

Copy link
Contributor

wjordan commented Feb 27, 2019

I tried and failed to reproduce the original issue as described. I ran a local load-test against a rails new project with a simple head :ok controller-action, running a local-development Puma server (bin/rails s), testing with wrk (wrk http://0.0.0.0:3000/test/test -d 60 -t 50 -c 50) on Ubuntu 18.04, Ruby 2.4.1, Puma 3.11.4, and Rails 5.1.6. RSS remained constant around ~64MB throughout the length of the load-test.

What's missing is a minimal, complete and verifiable example that reliably reproduces the issue across a broader range of environments. Until more specific steps can be found, it could very well be any number of other issues, unrelated to Puma, that could be causing abnormal memory growth in specific applications/environments.

@1c7

This comment has been minimized.

Copy link

1c7 commented Feb 27, 2019

Hi @shirkavand

Do you have any insight of whether these new versions solve the problem or not?

I am using Rails 5.2.2, Ruby 2.5.1
I haven't try to reproduce what you describe. like @wjordan did.

But I have a story for you (it's about Docker) (it may help, not sure)

After I type it all out, I realize it may be too long.

if you don't want to read all of that, the point is just use Docker:

0 downtime & limit memory & auto restart when crash happen

Detail

You can write config in docker-compose.yml to limit memory usage,
And add healthcheck to restart it automatically if somehow it crash.
You can think of Docker as poor man's Heroku haha.
Even though Docker is much much more than that.

Docker is free and even you are only 1 person+1 production machine. you should still use it.
Docker is not big or difficult to learn. (Kubernetes is difficult, but not docker&docker swarm)

Story begin:

Couple months ago I was trying to do 0 downtime deployment
(or blue/green deployment, same things)
by the way I use mina, not cap

@1c7

This comment has been minimized.

Copy link

1c7 commented Feb 27, 2019

Try 1: Puma Phased Restart 😞

  • Machine A: works perfectly
    Mean that it restart without downtime in 5-15 seconds

  • Machine B: Fail. (Machine B is production server)
    Mean that each time puma restarts, it takes 100% CPU for 5-7 minute.

Both machine configs:

  • Ubuntu 18.04 (or 16.04, can't remember, it's been quite long)
  • 4G memory
  • 2 virtual CPU

Cloud Provider: UCloud

(not AWS or Azure or Google Compute Platform)

Sure, It still successfully start-up after 5-7 minute.
But I just don't like it take this long. and cause 100% CPU

I try to work out what's the difference between A and B.

  • Environments variable? LANG=en_US.utf8
  • Puma config file (Am I config the worker number right? preload_app? what's that?)

I tried everything, everything I can think of. Can't solve it.
I just tire of this. I don't care. I just want it up and running.

(Insert memes here)

Mission Impossible 6 meme: "Why do you have to make things so fucking complicated?"
(Henry Cavill yelling to that bad guy)
(Disclaminer: I still thanks people behind puma open source project, I am just mad at the problem itself. not the people behind the project)

I ask for help. didn't work out.

Github issue here

So I just run 2 copy of Rails 5 App on the production server

  • One on port 9292
  • One on port 9696

it didn't solve that 5-7 minute (100% CPU) start-up time.
but I just don't care, I have to move on.

And I use Nginx load balancer. config looks like this:

upstream app_witt {
  server 127.0.0.1:9292;
  # server 127.0.0.1:9696;
}

You can see # here,
because I just switch 9292 and 9696 back and forth.
Manually change Nginx config and apply config withsudo nginx -s reload

Conclusion

Mina + Nginx + Puma

It works OK for a while. until I tire of it.

it's just a stupid & temporary solution.
that 5-7 minute start-up time is still there

Every time I made some change to the codebase and want to deploy.

  • ssh to the server
  • kill -9 [PID] and restart the other Rails app
  • Change Nginx config, point to that Rails app. apply Nginx config

It is very manual and cost me at least 5 minutes.
After doing that for 100+ time. I am just sick of it.

@1c7

This comment has been minimized.

Copy link

1c7 commented Feb 27, 2019

Try 2: Passenger 😢

Passenger work OK. start-up fast. like 5 to 10 second.

Good, they have 0 downtime

image
Screenshot link here

But, 0 downtimes require "Passenger Enterprise"

image
Screenshot link here

OK.. so... how much is that "Passenger Enterprise"?

image
image
image

Now I get it.

"Passenger Enterprise" mean let them handle everything.
But I have my cloud server! I don't want pay that much money. I want use my machine!

So I don't use 0 downtime provider by Passenger.
Back to the old way: Nginx Load balancer

Conclusion:

Mina + Nginx + Passenger

Problem

  1. Still not solving the real issue: Deployment is too manual, Very annoying
  2. Not a generic solution: Dependent on Passenger
  3. Expensive for solo dev working on side project: I rather use Heroku at that price point.
  4. Not enough time: not interesting in learning the passenger free edtion. I don't care these config.

What's next?

@1c7

This comment has been minimized.

Copy link

1c7 commented Feb 27, 2019

Try 3: Docker 😄

  1. Write Dockerfile to pack my rails app into Docker Image
  2. Write docker-compose.yml for local dev
    docker-compose.yml use Dockerfile mention in Step 1 to run Rails 5. and it spin up redis and sidekiq
  3. Write docker-stack.yml for deploy

0 downtime for FREE!

(You need write "healthcheck" to get this working)

Container Orchestration

I am using Docker Swarm.
for solo developer like me, Kubernetes is just too much for me at this point.
I don't have 2-4 week to spend on learning Kubernetes

Note

I didn't provide example for Dockerfile, docker-compose.yml and docker-stack.yml.
because this is about high-level view, not implementation detail.

But of course, there are still some problem

Logging, Storage, Secret, Config, Metric Monitoring.

But

I am so much happy now. Docker is a generic solution. it didn't tie to Ruby, Rails, or puma/passenger.

Reverse Proxy.

I try Traefik for a bit.
But give up, See this Github issue
So I still use Nginx.
I freaking hate that TOML syntax

Conclusion:

Nginx + Docker Swarm

@1c7

This comment has been minimized.

Copy link

1c7 commented Feb 27, 2019

I hope what I wrote to provide some value 💰

And not being seen as some guys on the internet wrote a long ranting.
I spend 10:00 AM to 11:29 AM writing these..
I don't know what it took so long

I should put these into my blog: 1c7.me for better SEO haha

tahb added a commit to dxw/teacher-vacancy-service that referenced this issue Feb 28, 2019

(Test) does memory usage improve on Ruby 2.6?
* Ruby protobuffs was mentioned: puma/puma#1600 - while we aren’t on the same versions as mentioned, it could be worth a try.
@nateberkopec

This comment has been minimized.

Copy link
Member

nateberkopec commented Feb 28, 2019

99% of memory "leak" issues in Ruby are:

  1. Fragmentation.
  2. An old C extension in your dependencies.

There are already several things in this thread that make me think that many of you are seeing thread-related fragmentation:

For testing purposes, i replaced Puma and installed Passenger.

You replaced a multi-threaded application server with a single-threaded one (I'm assuming you would have said something specifically if you were using Passenger Enterprise, which allows threads). This eliminates the majority of memory fragmentation in Ruby.

We have also observed this with almost similar versions but on Ubuntu 16.04. OSX does not have this.

The fragmentation issue linked to above seems to be much worse on Linux than on Mac, possibly due to OSX shipping older versions of malloc or possibly just better memory management in the Mach kernel. Many BSD-based operating systems also do not report fragmentation issues w/Ruby.

I must close this issue, as without a minimal verifiable example, we can't do anything about it. I've fixed memory "leak" issues on dozens of Puma applications, and they all come down to the two issues I mentioned at the beginning.

tahb added a commit to dxw/teacher-vacancy-service that referenced this issue Feb 28, 2019

(Test) does memory usage improve on Ruby 2.6?
* Ruby protobuffs was mentioned: puma/puma#1600 - while we aren’t on the same versions as mentioned, it could be worth a try.

tahb added a commit to dxw/teacher-vacancy-service that referenced this issue Feb 28, 2019

(Test) does memory usage improve on Ruby 2.6?
* Ruby protobuffs was mentioned: puma/puma#1600 - while we aren’t on the same versions as mentioned, it could be worth a try.
* Bump Rubocop to support the new ruby version which includes 2 types of autocorrection, new lines below guard clauses and replacing a deprecated ActiveRecord association method

tahb added a commit to dxw/teacher-vacancy-service that referenced this issue Feb 28, 2019

(chore) Update Ruby to 2.6.1
* During the high memory investigation a similar issue was flagged around memory leaks in Ruby protobuffs: puma/puma#1600 - while we aren’t on the same versions as mentioned, I updated, tested and checked the application can run. It's hard to tell if this change has had the desired improvements to memory usage as I can only do so much on Edge to replicate the production usage.
* Rubocop also required a change and autocorrect

tahb added a commit to dxw/teacher-vacancy-service that referenced this issue Feb 28, 2019

(chore) Update Ruby to 2.6.1
* During the high memory investigation a similar issue was flagged around memory leaks in Ruby protobuffs: puma/puma#1600 - while we aren’t on the same versions as mentioned, I updated, tested and checked the application can run. It's hard to tell if this change has had the desired improvements to memory usage as I can only do so much on Edge to replicate the production usage.
* Rubocop also required a change and autocorrect

tahb added a commit to dxw/teacher-vacancy-service that referenced this issue Mar 1, 2019

(Test) does memory usage improve on Ruby 2.6?
* Ruby protobuffs was mentioned: puma/puma#1600 - while we aren’t on the same versions as mentioned, it could be worth a try.

tahb added a commit to dxw/teacher-vacancy-service that referenced this issue Mar 4, 2019

(chore) Update Ruby to 2.6.1
* During the high memory investigation a similar issue was flagged around memory leaks in Ruby protobuffs: puma/puma#1600 - while we aren’t on the same versions as mentioned, I updated, tested and checked the application can run. It's hard to tell if this change has had the desired improvements to memory usage as I can only do so much on Edge to replicate the production usage.
* Rubocop also required a change and autocorrect

tahb added a commit to dxw/teacher-vacancy-service that referenced this issue Mar 4, 2019

(testing) does memory usage improve on Ruby 2.6?
* Ruby protobuffs was mentioned: puma/puma#1600 - while we aren’t on the same versions as mentioned, it could be worth a try.

tahb added a commit to dxw/teacher-vacancy-service that referenced this issue Mar 5, 2019

(chore) Update Ruby to 2.6.1
* During the high memory investigation a similar issue was flagged around memory leaks in Ruby protobuffs: puma/puma#1600 - while we aren’t on the same versions as mentioned, I updated, tested and checked the application can run. It's hard to tell if this change has had the desired improvements to memory usage as I can only do so much on Edge to replicate the production usage.
* Rubocop also required a change and autocorrect

tahb added a commit to dxw/teacher-vacancy-service that referenced this issue Mar 5, 2019

(chore) Update Ruby to 2.6.1
* During the high memory investigation a similar issue was flagged around memory leaks in Ruby protobuffs: puma/puma#1600 - while we aren’t on the same versions as mentioned, I updated, tested and checked the application can run. It's hard to tell if this change has had the desired improvements to memory usage as I can only do so much on Edge to replicate the production usage.
* Rubocop also required a change and autocorrect

tahb added a commit to dxw/teacher-vacancy-service that referenced this issue Mar 5, 2019

(chore) Update Ruby to 2.6.1
* During the high memory investigation a similar issue was flagged around memory leaks in Ruby protobuffs: puma/puma#1600 - while we aren’t on the same versions as mentioned, I updated, tested and checked the application can run. It's hard to tell if this change has had the desired improvements to memory usage as I can only do so much on Edge to replicate the production usage.
* Rubocop also required a change and autocorrect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.