Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed 100% CPU problem under jruby #165

Closed
wants to merge 1 commit into from
Closed

Conversation

meesern
Copy link

@meesern meesern commented Nov 6, 2013

This change is to address a problem with Bunny locking up using 100% of a processor under jruby on ubuntu 12.04 i64 and ubuntu 10.04 arm after running for some hours.

The problem appears to be an operating system one where an EAGAIN exception is repeatedly raised on read_nonblock. Since the rescue and retry on retry exception classes exception handling exactly replicates the behaviour of the readpartial() method it seems reasonable to replace the misbehaving read_nonblock() with readpartial().

This fix is proving stable over time in the above long running systems.

@michaelklishin
Copy link
Member

@meesern thank you so much! I will take a look later today.

@michaelklishin
Copy link
Member

@bernd this is relevant to your recent report.

@michaelklishin
Copy link
Member

Please ignore the CI failure. Travis still hasn't upgraded to RabbitMQ 3.2 which has exchange.delete and queue.delete idempotency changes. The tests reflect the 3.2 behavior.

@meesern
Copy link
Author

meesern commented Nov 6, 2013

You are welcome - I believe it works well under jruby and 1.9.3 (which I don't think showed a problem). I can't see any reason why it shouldn't be stable on all platforms. Interestingly using the same jruby (1.7.3) the problem showed up more rapidly on 10.04 arm than 13.04 i64.

@bernd
Copy link
Contributor

bernd commented Nov 6, 2013

Awesome! I cannot reproduce the problem anymore with JRuby 1.7.6 on Ubuntu 12.04 amd64.

@bernd
Copy link
Contributor

bernd commented Nov 6, 2013

Is it okay that #readpartial blocks if no data is available?

@michaelklishin
Copy link
Member

@bernd yes. So does Socket#read. Socket#read_nonblock instead raises an exception which we immediately handle and IO.select (which also waits) until some data becomes available.

I will run our benchmarks to see if this has any effect on CRuby.

@bernd
Copy link
Contributor

bernd commented Nov 6, 2013

@michaelklishin I see. So you are going to remove the IO.select call and the timeout parameter then?

@michaelklishin
Copy link
Member

We will see. There is a separate per-operation timeout that will raise. The IO.select one is of limited use.

@bernd
Copy link
Contributor

bernd commented Nov 6, 2013

Thanks for the explanation! :)

@michaelklishin
Copy link
Member

This change has a substantial impact on publishing throughput on CRuby (~7.85 kghz vs 12 kghz with read_nonblock). Otherwise I've discovered no issues. I will apply the change manually (it needs to be applied
to SSLSocket as well) and we will special-case JRuby.

@michaelklishin
Copy link
Member

@meesern can you confirm that full test suite passes for you on JRuby? I cannot get the suite to finish.

@meesern
Copy link
Author

meesern commented Nov 6, 2013

No, I forgot to say rspec failed in quite a few ways before I started and didn't seem to be worse after. It looked like an incompatibility with my version of rabbitmq so I'm afraid I didn't pursue it.

michaelklishin pushed a commit that referenced this pull request Nov 6, 2013
… and JRuby

Avoids CPU burn on JRuby because of excessive EWOULDBLOCK exceptions being raised.

References #165, #166.
@michaelklishin
Copy link
Member

OK, I have the suite passing now. One exception is jruby/jruby#1055, finishing a work around for it.

@michaelklishin
Copy link
Member

Fixed in master. CRuby still uses the original implementation. @meesern @bernd please give master a try.

michaelklishin pushed a commit that referenced this pull request Nov 6, 2013
References #165, #166.
michaelklishin pushed a commit that referenced this pull request Nov 6, 2013
… and JRuby

Avoids CPU burn on JRuby because of excessive EWOULDBLOCK exceptions being raised.

References #165, #166.
@michaelklishin
Copy link
Member

1.0.1 is out, too.

@iconara
Copy link
Member

iconara commented Nov 6, 2013

Sorry for being late to this discussion. I wonder if this implementation would solve the problem:

def read_fully(count, timeout = nil)
  value = ''
  started_at = Time.now
  while IO.select([self], nil, nil, timeout)
    value << read_nonblock(count - value.bytesize)
    break if value.bytesize >= count || (timeout && (Time.now - started_at) > timeout)
  end
  value
end

(I've not included the Rubinious workaround, I don't really know what it worked around).

The important bit is to do IO.select before the call to #read_nonblock, which should avoid all cases of EAGAIN, and other solutions that use exceptions for flow control. Exceptions are really expensive in code that is executed hundreds or thousands of times per second.

@michaelklishin
Copy link
Member

@iconara interesting idea. We will try it out and run our benchmarks to see if avoiding EAGAIN makes much of a difference.

@michaelklishin
Copy link
Member

There was a mistake in the 1.0.1 release process, 1.0.2 should include the proper fix.

@bernd
Copy link
Contributor

bernd commented Nov 6, 2013

I can confirm that 1.0.2 works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants