Patch: replace method instance of strings and regexp with singleton instances #89

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
3 participants

bdurand commented Dec 7, 2010

This patch provides a performance boost for simple applications that need a very high throughput by replacing hard coded instances of strings and regular expressions inside methods with singleton, frozen instances so there are fewer objects needing to be reclaimed by the garbage collector.

As an added advantage, it provides a spot to document what each of the environment variables are.

Brian Durand Replace method instances of strings and regular expressions with sing…
…leton instances so each request won't keep allocating the same objects over and over again.
45ee32b
Contributor

josh commented Dec 8, 2010

I tried this once 6ae0a10

Its up to raggi to pull, I don't have much hard evidence for it.

bdurand commented Dec 8, 2010

My tests show a 4 - 5% increase on throughput on a simple test:

use Rack::Head
use Rack::ConditionalGet
use Rack::Sendfile
use Rack::ContentType
use Rack::ContentLength

map "/test" do
  run lambda{|env|
    request = Rack::Request.new(env)
    response = Rack::Response.new
    response.write("Hello #{request.params['q']}")
    response.finish
  }
end

Most of the gain comes from reducing the number of objects needing to be reclaimed by the garbage collector.

At first I didn't see the sort of performance gain I expected. However, then I realized that the garbage collector was cleaning up an unrealistically small heap. The performance gain only became apparent when I mimicked a heap from a larger and more complex application by creating a large number of objects and retaining reference to many of them.

Owner

raggi commented Dec 9, 2010

On which interpreter?

Have you tested on the others?

bdurand commented Dec 9, 2010

I've tested on Ruby Enterprise Edition 1.8.7-2009.10 and Ruby 1.9.1p378 using thin 1.2.7. On Ruby 1.9 I'm seeing a 6 - 7% gain with the patch.

I've got the rack server running on a xubuntu box with a Core2 Duo and jmeter on MacOS X on the same subnet. To generate a larger heap, I added this to my config.ru file which give me a 200+ MB heap:

refs = []
200000.times do
  refs << rand.to_s * 50
end
refs = refs[0, 100000]
GC.start

I'll admit gain is pretty small and will only be noticeable on services that require very high throughput, but that is exactly the use case I had when I started working on this. In a handler that is called hundreds of times per second I noticed a definite (but small) throughput gain by switching to singleton variables and eliminating a couple of dozen mallocs per request.

Owner

raggi commented Dec 9, 2010

So you're actually talking about high request rate, not high throughput. More, smaller responses.

I'm open to optimisations, but, for example, test this on 1.8.7 trunk, jruby and rbx, and you'll see different results.

Also, 1.9.2 is much preferred to 1.9.1, as 1.9.1 has a bunch of leaks so if you were using thin you'd see degraded performance from the gc there anyway.

Owner

raggi commented Dec 9, 2010

Also, heap filling is hard to use as a metric, because heap filling like this tends to mislead the gc. Particularly under MRI the specific datasets can have drastically different results. That's just a note that it's necessary to take aggregate results over a larger number of runs that you'd normally expect to find a real mean.

bdurand commented Dec 9, 2010

I'm not using the heap filling as a metric. It's just to make sure the garbage collector has some work to do. My test case is an artificially simple application since I only want to test the rack changes. Allocating a bunch of objects is meant to replicate what the heap would look like in a real, more complex application. I did change the heap filling code to only allocate ~150 mb of memory and randomly remove references. This didn't affect my findings.

refs = []
100000.times do
  refs << rand.to_s * 50
end
(refs.size / 2).times do
  refs.delete_at(rand(refs.size))
end
GC.start

I re-ran my test of 80,000 requests multiple times on:

  • MRI 1.8.7-p302
  • MRI 1.9.2-p0
  • REE 1.8.7-2009.10
  • Rubinius 1.1.1

In all cases the requests/second was increased (from 2.3% on Rubinius to 5.8% on MRI 1.8.7).

Owner

raggi commented Dec 19, 2010

I can't actually replicate these results.

  • What server were you using?
  • What concurrency were you using in requests?
  • What results did you actually get?
  • Were you using the 'none' rackup environment?

Locally on MRI I tested several times with mongrel, thin and webrick, and saw a reduction in throughput using your above rackup file launched like so: ruby -rubygems -Ilib bin/rackup -E none -s <server> bench.ru

For explicit clarity, bench.ru was:

use Rack::Head
use Rack::ConditionalGet
use Rack::Sendfile
use Rack::ContentType
use Rack::ContentLength

map "/test" do
  run lambda{|env|
    request = Rack::Request.new(env)
    response = Rack::Response.new
    response.write("Hello #{request.params['q']}")
    response.finish
  }
end

And I benchmarked using ab(1), which can be somewhat unreliable, however, I ran multiple runs like so:

ab -k -c 10 -n 10000 http://127.0.0.1:9292/test

Also note that because mongrel does not support keepalive, this can quickly expire available ports, as such in order to keep the test results stable, between runs you must ensure that the waits expire:

netstat -an | grep TIME | wc -l

The results for thin, mongrel and webrick using your code, and then using the original 1.2.1 code are as follows:

    raggi@mbk: ~ * ab -k -c 10 -n 10000 http://127.0.0.1:9292/test
    This is ApacheBench, Version 2.3 <$Revision: 655654 $>
    Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Licensed to The Apache Software Foundation, http://www.apache.org/

    Benchmarking 127.0.0.1 (be patient)
    Completed 1000 requests
    Completed 2000 requests
    Completed 3000 requests
    Completed 4000 requests
    Completed 5000 requests
    Completed 6000 requests
    Completed 7000 requests
    Completed 8000 requests
    Completed 9000 requests
    Completed 10000 requests
    Finished 10000 requests


    Server Software:        thin
    Server Hostname:        127.0.0.1
    Server Port:            9292

    Document Path:          /test
    Document Length:        6 bytes

    Concurrency Level:      10
    Time taken for tests:   4.183 seconds
    Complete requests:      10000
    Failed requests:        0
    Write errors:           0
    Keep-Alive requests:    10000
    Total transferred:      1290000 bytes
    HTML transferred:       60000 bytes
    Requests per second:    2390.76 [#/sec] (mean)
    Time per request:       4.183 [ms] (mean)
    Time per request:       0.418 [ms] (mean, across all concurrent requests)
    Transfer rate:          301.18 [Kbytes/sec] received

    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0   0.0      0       1
    Processing:     0    4   7.6      3      65
    Waiting:        0    4   7.6      3      65
    Total:          0    4   7.6      3      65

    Percentage of the requests served within a certain time (ms)
      50%      3
      66%      4
      75%      4
      80%      4
      90%      4
      95%      5
      98%      6
      99%     63
     100%     65 (longest request)
    raggi@mbk: ~ % ab -k -c 10 -n 10000 http://127.0.0.1:9292/test
    This is ApacheBench, Version 2.3 <$Revision: 655654 $>
    Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Licensed to The Apache Software Foundation, http://www.apache.org/

    Benchmarking 127.0.0.1 (be patient)
    Completed 1000 requests
    Completed 2000 requests
    Completed 3000 requests
    Completed 4000 requests
    Completed 5000 requests
    Completed 6000 requests
    Completed 7000 requests
    Completed 8000 requests
    Completed 9000 requests
    Completed 10000 requests
    Finished 10000 requests


    Server Software:        
    Server Hostname:        127.0.0.1
    Server Port:            9292

    Document Path:          /test
    Document Length:        6 bytes

    Concurrency Level:      10
    Time taken for tests:   8.875 seconds
    Complete requests:      10000
    Failed requests:        0
    Write errors:           0
    Keep-Alive requests:    0
    Total transferred:      1250000 bytes
    HTML transferred:       60000 bytes
    Requests per second:    1126.73 [#/sec] (mean)
    Time per request:       8.875 [ms] (mean)
    Time per request:       0.888 [ms] (mean, across all concurrent requests)
    Transfer rate:          137.54 [Kbytes/sec] received

    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0   0.3      0       6
    Processing:     2    9   9.4      6      82
    Waiting:        1    6   7.9      5      73
    Total:          3    9   9.4      6      82

    Percentage of the requests served within a certain time (ms)
      50%      6
      66%      7
      75%      7
      80%      7
      90%     10
      95%     17
      98%     52
      99%     53
     100%     82 (longest request)
    raggi@mbk: ~ % ab -k -c 10 -n 10000 http://127.0.0.1:9292/test
    This is ApacheBench, Version 2.3 <$Revision: 655654 $>
    Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Licensed to The Apache Software Foundation, http://www.apache.org/

    Benchmarking 127.0.0.1 (be patient)
    Completed 1000 requests
    Completed 2000 requests
    Completed 3000 requests
    Completed 4000 requests
    Completed 5000 requests
    Completed 6000 requests
    Completed 7000 requests
    Completed 8000 requests
    Completed 9000 requests
    Completed 10000 requests
    Finished 10000 requests


    Server Software:        WEBrick/1.3.1
    Server Hostname:        127.0.0.1
    Server Port:            9292

    Document Path:          /test
    Document Length:        6 bytes

    Concurrency Level:      10
    Time taken for tests:   28.378 seconds
    Complete requests:      10000
    Failed requests:        0
    Write errors:           0
    Keep-Alive requests:    10000
    Total transferred:      1780344 bytes
    HTML transferred:       60000 bytes
    Requests per second:    352.39 [#/sec] (mean)
    Time per request:       28.378 [ms] (mean)
    Time per request:       2.838 [ms] (mean, across all concurrent requests)
    Transfer rate:          61.27 [Kbytes/sec] received

    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0  20.3      0     910
    Processing:     4   28  20.6     22     197
    Waiting:        4   25  19.4     20     194
    Total:          4   28  28.9     22     943

    Percentage of the requests served within a certain time (ms)
      50%     22
      66%     22
      75%     23
      80%     23
      90%     36
      95%     91
      98%     93
      99%     97
     100%    943 (longest request)



    raggi@mbk: ~ % ab -k -c 10 -n 10000 http://127.0.0.1:9292/test
    This is ApacheBench, Version 2.3 <$Revision: 655654 $>
    Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Licensed to The Apache Software Foundation, http://www.apache.org/

    Benchmarking 127.0.0.1 (be patient)
    Completed 1000 requests
    Completed 2000 requests
    Completed 3000 requests
    Completed 4000 requests
    Completed 5000 requests
    Completed 6000 requests
    Completed 7000 requests
    Completed 8000 requests
    Completed 9000 requests
    Completed 10000 requests
    Finished 10000 requests


    Server Software:        thin
    Server Hostname:        127.0.0.1
    Server Port:            9292

    Document Path:          /test
    Document Length:        6 bytes

    Concurrency Level:      10
    Time taken for tests:   4.159 seconds
    Complete requests:      10000
    Failed requests:        0
    Write errors:           0
    Keep-Alive requests:    10000
    Total transferred:      1290129 bytes
    HTML transferred:       60006 bytes
    Requests per second:    2404.24 [#/sec] (mean)
    Time per request:       4.159 [ms] (mean)
    Time per request:       0.416 [ms] (mean, across all concurrent requests)
    Transfer rate:          302.91 [Kbytes/sec] received

    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0   0.0      0       1
    Processing:     0    4   7.8      3      77
    Waiting:        0    4   7.8      3      77
    Total:          0    4   7.8      3      77

    Percentage of the requests served within a certain time (ms)
      50%      3
      66%      4
      75%      4
      80%      4
      90%      4
      95%      4
      98%      6
      99%     62
     100%     77 (longest request)
    raggi@mbk: ~ % ab -k -c 10 -n 10000 http://127.0.0.1:9292/test
    This is ApacheBench, Version 2.3 <$Revision: 655654 $>
    Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Licensed to The Apache Software Foundation, http://www.apache.org/

    Benchmarking 127.0.0.1 (be patient)
    Completed 1000 requests
    Completed 2000 requests
    Completed 3000 requests
    Completed 4000 requests
    Completed 5000 requests
    Completed 6000 requests
    Completed 7000 requests
    Completed 8000 requests
    Completed 9000 requests
    Completed 10000 requests
    Finished 10000 requests


    Server Software:        
    Server Hostname:        127.0.0.1
    Server Port:            9292

    Document Path:          /test
    Document Length:        6 bytes

    Concurrency Level:      10
    Time taken for tests:   8.736 seconds
    Complete requests:      10000
    Failed requests:        0
    Write errors:           0
    Keep-Alive requests:    0
    Total transferred:      1250000 bytes
    HTML transferred:       60000 bytes
    Requests per second:    1144.68 [#/sec] (mean)
    Time per request:       8.736 [ms] (mean)
    Time per request:       0.874 [ms] (mean, across all concurrent requests)
    Transfer rate:          139.73 [Kbytes/sec] received

    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0   0.2      0      12
    Processing:     2    9   9.4      6      66
    Waiting:        1    6   8.0      5      64
    Total:          3    9   9.5      6      66

    Percentage of the requests served within a certain time (ms)
      50%      6
      66%      7
      75%      7
      80%      7
      90%      8
      95%     14
      98%     53
      99%     56
     100%     66 (longest request)
    raggi@mbk: ~ % ab -k -c 10 -n 10000 http://127.0.0.1:9292/test
    This is ApacheBench, Version 2.3 <$Revision: 655654 $>
    Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
    Licensed to The Apache Software Foundation, http://www.apache.org/

    Benchmarking 127.0.0.1 (be patient)
    Completed 1000 requests
    Completed 2000 requests
    Completed 3000 requests
    Completed 4000 requests
    Completed 5000 requests
    Completed 6000 requests
    Completed 7000 requests
    Completed 8000 requests
    Completed 9000 requests
    Completed 10000 requests
    Finished 10000 requests


    Server Software:        WEBrick/1.3.1
    Server Hostname:        127.0.0.1
    Server Port:            9292

    Document Path:          /test
    Document Length:        6 bytes

    Concurrency Level:      10
    Time taken for tests:   29.129 seconds
    Complete requests:      10000
    Failed requests:        0
    Write errors:           0
    Keep-Alive requests:    10000
    Total transferred:      1780172 bytes
    HTML transferred:       60000 bytes
    Requests per second:    343.31 [#/sec] (mean)
    Time per request:       29.129 [ms] (mean)
    Time per request:       2.913 [ms] (mean, across all concurrent requests)
    Transfer rate:          59.68 [Kbytes/sec] received

    Connection Times (ms)
                  min  mean[+/-sd] median   max
    Connect:        0    0  21.7      0     970
    Processing:     4   29  19.9     23     129
    Waiting:        4   26  19.1     20     127
    Total:          4   29  30.6     23    1075

    Percentage of the requests served within a certain time (ms)
      50%     23
      66%     23
      75%     24
      80%     24
      90%     39
      95%     91
      98%     93
      99%     94
     100%   1075 (longest request)

It is clear from these tests (and further repeated runs which are not worth including here) that the statistical variation in server and application performance is significantly greater than any gains I can measure. In essence I can see no repeated statistical evidence that any gains have been made from making these items constant.

Whilst MRIs GC might be weak, constant lookups are also not free. Whilst I might believe that there could be some replicable difference due to fragmentation in an application that is under significant GC stress on one of the mark & sweep implementations, this is purely a conceptual hypothesis and I would rather see numbers before we introduce such a change.

On a final note, I also had to correct an error in the patch, and I suspect there might be some others that need to be covered, I would appreciate a careful pass over all features and changes, as well as whitespace fixes:

    raggi@mbk: ~/dev/ext/rack % g d HEAD^
    diff --git a/lib/rack/handler.rb b/lib/rack/handler.rb
    index 5d843e2..f2a58bd 100644
    --- a/lib/rack/handler.rb
    +++ b/lib/rack/handler.rb
    @@ -24,13 +24,13 @@ module Rack

         def self.default(options = {})
           # Guess.
    -      if ENV.include?(VARS::ENV::PHP_FCGI_CHILDREN)
    +      if ENV.include?(CGI_VARIABLE::PHP_FCGI_CHILDREN)
             # We already speak FastCGI
             options.delete :File
             options.delete :Port

             Rack::Handler::FastCGI
    -      elsif ENV.include?(VARS::ENV::REQUEST_METHOD)
    +      elsif ENV.include?(CGI_VARIABLE::REQUEST_METHOD)
             Rack::Handler::CGI
           else
             begin

We are grateful for your contribution, please don't be disheartened. I am not rejecting the idea, I merely desire replicable evidence that it is worth including. Thank you for your work so far!

raggi closed this May 3, 2011

@rkh rkh referenced this pull request Oct 6, 2014

@josh josh Revert "Add common HTTP strings to Rack::Const"
This reverts commit 6ae0a10.
71030b9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment