Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance tweaks #22

Closed
wants to merge 5 commits into from
Closed

Performance tweaks #22

wants to merge 5 commits into from

Conversation

alex-yam
Copy link

The table.insert command re-hash the table on every insert, using the # operator to find the last integer key of the table eliminates this overhead, and result in 1.5x to 17x performance boost across different benchmarks.

This:
t[#t+1] = value
Is a lot faster than this (even in localized LuaJIT):
table.insert(t, value)

Some published benchmark result:
http://blog.jgc.org/2013/04/performance-of-array-creation-in-lua.html

The only draw back is that the # operator doesn't actually count the numbers of items in the table (according to http://lua-users.org/wiki/TablesTutorial), which isn't needed anyway when items are being sequentially inserted.

We have been using the # operator in production for months without any issue, in fact we no longer have "local table.insert" in any of our codes.

We highly recommend you guys to replace all the table.insert across all lua codes with t[#t+1] = x and gain extra performances.

Thanks for the hard work agentzh, keep it up!

The table.insert command re-hash the table on every insert, using the # operator to find the last integer key of the table eliminates this overhead, and result in 1.5x to 17x performance boost across different benchmarks.

This:
	t[#t+1] = value
Is a lot faster than this (even in localized LuaJIT):
	table.insert(t, value)

Some published benchmark result:
http://blog.jgc.org/2013/04/performance-of-array-creation-in-lua.html

The only draw back is that the # operator doesn't actually count the numbers of items in the table (according to http://lua-users.org/wiki/TablesTutorial), which isn't needed anyway when items are being sequentially inserted.

We have been using the # operator in production for months without any issue, in fact we no longer have "local table.insert" in any of our codes.

We highly recommend you guys to replace all the table.insert across all lua codes with t[#t+1] = x and gain extra performances.

Thanks for the hard work agentzh, keep it up!
@agentzh
Copy link
Member

agentzh commented Sep 18, 2013

Thank you for the catch!

But use of the # operator in the loop iterations is also a little expensive than tracking the table length by a local Lua variable ourselves according to the on-CPU Flame Graphs. Because it requires calling the lj_tab_len primitive in the LuaJIT VM on almost every iteration. Could you please update your patch and use the following paradigm instead?

local vals = {}
nvals = 0
for i = 1, n do
    ...
    nvals = nvals + 1
    vals[nvals] = ...
end

More performance tweaks recommended by agentzh
@alex-yam
Copy link
Author

Wow so there are more room for even more performance? That's great news!

I've made the modifications as suggested (except line 248 which isn't part of a loop), if you have any more ideas please let us know, we don't mind tweaking it until every drop of performance has been squeezed out :-)

@alex-yam
Copy link
Author

Just studied the Redis protocol again, it appears most bulk replies actually begin with "$" and not "*", I read the protocol the other way last time somehow, must have been low on caffeine :-).

@alex-yam
Copy link
Author

By the way, in our production box we've replaced:

local dummy, err = sock:receive(2) -- ignore CRLF
if not dummy then
return nil, err
end

with just:

sock:receive(2) -- ignore CRLF

The idea was to save us from doing millions of "if not dummy then" in huge bulk replies, since the only code after that is a "return data", if there is a network error the next _read_reply() should be able to catch it straight away.

What is your opinion on this?

agentzh added a commit that referenced this pull request Sep 19, 2013
… + 1] = val". thanks alex-yam for the patch in #22.
@agentzh
Copy link
Member

agentzh commented Sep 19, 2013

@alex-yam To be honest, I don't really like your later micro optimizations :)

Let's focus on true bottlenecks in the on-CPU Flame Graphs generated by the ngx-sample-bt tool for real-world apps:

https://github.com/agentzh/nginx-systemtap-toolkit/#ngx-sample-bt

It'll be great if you can share flamegraphs for your real-world apps using lua-resty-redis :)

@agentzh
Copy link
Member

agentzh commented Sep 19, 2013

@alex-yam Also it's a good idea to try out the latest low-overhead builtin profiler in LuaJIT's v2.1 git branch:

http://comments.gmane.org/gmane.comp.lang.lua.luajit/2906

BTW, we can further optimize the lj_tab_newkey primitive inside the LuaJIT VM in lua-resty-redis (and other libraries) when Mike Pall implements the new table.new(narr, nrec) and string.buffer API in LuaJIT 2.1.

These new Lua APIs will appear in the v2.1 branch by this Christmas I think :)

Another thing that may give noticeable performance is to JIT compile this Lua library. Right now, even when you're using LuaJIT, most of the Lua code is still interpreted by LuaJIT's interpreter. We'll need a better lua-resty-core library and a better LuaJIT 2.1 for this :) See https://github.com/agentzh/lua-resty-core

agentzh added a commit that referenced this pull request Sep 19, 2013
…concat() call because it is already the default. thanks alex-yam for the patch in #22.
agentzh added a commit that referenced this pull request Sep 19, 2013
@alex-yam
Copy link
Author

@agentzh

No problems, year 2012 was a good year, who needs 2013 ;)

Jokes aside, my main focus has always been on the application development side so I was pretty sure some modifications didn't take care of many of the deeper level lua requirements.

Thanks for applying what worked for you, the flame graphs looks like it can be very useful, I'll take a look.

You may not have realized it but what you have developed is a game changer on many levels around the world. Keep up the great work!

agentzh added a commit that referenced this pull request Sep 19, 2013
…rbyfloat, incrbyfloat, migrate, pexpire, pexpireat, psetex, pubsub, pttl, restore, and time. thanks alex-yam for the patch in #22.
@antonheryanto
Copy link

did simple benchmark compare old with updated version but found drop performance

old: 11282 req/s
new: 5984 req/s

@agentzh
Copy link
Member

agentzh commented Sep 20, 2013

@antonheryanto Could you provide the full benchmark example that you're using (and the steps you run the benchmark).

I've tried the following trivial example on my side and the new version is consistently 1.45% faster:

location /t {
    content_by_lua '
        local redis = require "resty.redis"
        local red = redis:new()

        red:set_timeout(1000) -- 1 sec

        local ok, err = red:connect("127.0.0.1", 6379)
        if not ok then
            ngx.log(ngx.ERR, "failed to connect: ", err)
            return ngx.exit(500)
        end

        local res, err = red:set("foo", 32)
        if err then
            ngx.log(ngx.ERR, "failed to set dog: ", err)
            return ngx.exit(500)
        end

        local res, err = red:get("foo")

        if err then
            ngx.log(ngx.ERR, "failed to get dog: ", err)
            return ngx.exit(500)
        end

        if res == ngx.null then
            ngx.status = 404
            ngx.say("key not found.")
            ngx.exit(404)
        end

        ngx.say("res: ", res)

        local ok, err = red:set_keepalive(0, 1000)
        if not ok then
            ngx.say("failed to set keepalive: ", err)
            return
        end
    ';
}

I'm using the command ab -k -c10 -n40000 localhost:1984/t and getting on average 20408 req/sec with the old version and on average 20703 req/sec with the new version.

Because your testing results with the new version drops so much, there could be some configuration mistakes or benchmarking mistakes out there. Always keep an eye on your nginx's error log file. And you can use the on-CPU and off-CPU flamegraph tools to find the bottlenecks.

@antonheryanto
Copy link

@agentzh
I uses one of my web apps module
but instead using ab I'm using wrk

wrk -c5 -t5 -d5s http://localhost/area

bellow is the lua code

-- redis
local cjson = require "cjson"
local ngx = ngx
local tonumber = tonumber
local redis = require "resty.redis"
local r = redis:new()
r:set_timeout(1000)
local ok, error = r:connect("127.0.0.1", 6379)
if not ok then
  ngx.log(ngx.ERR, "failed connect to redis at : ", error)
  return
end

local name = "area"

local function get(id)
  local a = r:hgetall(name ..":".. id)
  local v = r:array_to_hash(a)
  v.id = id
  return v
end

local function all()
  local m = r:smembers(name)
  local result = {}
  for i=1,#m do
    result[i] = get(m[i])
  end
  return result
end

local id = tonumber(ngx.var.arg_id)
local output = id and get(id) or all()
ngx.say(cjson.encode(output))

local ok,error = r:set_keepalive(0,1024)
if not ok then ngx.log(ngx.ERR,"failed to keepalive", error) end

and i try to benchmark using your trivial example above, I got
old: 10227.86 req/s
new: 8275.91 req/s

my nginx.conf

worker_processes 8;
events { worker_connections  1024; }
error_log logs/error.log warn;

on linode 1GB running ubuntu 13.04 (raring) 32 bit

you maybe right, i could have benchmark or configuration mistake cause not really expert on this mater yet.. still learning :D

@antonheryanto
Copy link

@agentzh

just run the trivial example on my linode again using ab instead

ab -k -c10 -n40000 localhost:8001/t

got result:
old: 13090.77 req/s
new: 6562.37 req/s

btw, i run benchmark 3 times and get the average

@agentzh
Copy link
Member

agentzh commented Sep 21, 2013

@antonheryanto which version of LuaJIT are you using? This result looks weird. I've tried my trivial code example (pasted previously) with both LuaJIT v2.1 branch and LuaJIT 2.0.2 release on Linux x86_64, both giving consistent speed up in lua-resty-redis's new version:

LuaJIT v2.1 branch:
old lua-resty-redis: 21984 req/sec (average)
new lua-resty-redis: 23063 req/sec (average)
speedup: 4.9%

LuaJIT 2.0.2 release:
old lua-resty-redis: 19555 req/sec (average)
new lua-resty-redis: 19918 req/sec (average)
speedup: 1.9%

Also, we can see that the LuaJIT v2.1 branch gives a further 15.8% speedup as compared to LuaJIT 2.0.2.

BTW, could you generate an on-CPU flamegraph for both of these setups?

Also, your code example using hgetall and smembers is incomplete because you don't provide the sample data in your redis so I cannot try it out on my side.

Seeing that you're using 32-bit systems, I'll try it out on a 32-bit box later.

@alex-yam
Copy link
Author

@antonheryanto

That was certainly unexpected since the code didn't change much, I mean 1+1 can't really be slower than table.insert.

Perhaps not using the old module(...) is somehow creating errors for your nginx compile. Please check the error logs, maybe nginx was writing to the error log on every request and caused req/s to drop by half.

Or perhaps there were some network issues, can you run a test with this script and see what happens with the network effect reduced? (it merges 'smembers' + multiple 'get' requests into a single redis request)

-- redis
local cjson = require "cjson"
local ngx = ngx
local tonumber = tonumber
local redis = require "resty.redis"
local r = redis:new()
r:set_timeout(1000)
local ok, error = r:connect("127.0.0.1", 6379)
if not ok then
  ngx.log(ngx.ERR, "failed connect to redis at : ", error)
  return
end

local name = "area"

-- original code
--[[
local function get(id)
  local a = r:hgetall(name ..":".. id)
  local v = r:array_to_hash(a)
  v.id = id
  return v
end

local function all()
  local m = r:smembers(name)
  local result = {}
  for i=1,#m do
    result[i] = get(m[i])
  end
  return result
end

local id = tonumber(ngx.var.arg_id)


    local output = id and get(id) or all()
]]
-- end original code

-- test code
    local id = tonumber(ngx.var.arg_id)
    local redis_lua_script = "if KEYS[2] then return redis.pcall('hgetall',table.concat({KEYS[1],':',KEYS[2]})) else local p = redis.pcall return p('mget',unpack(p('smembers',KEYS[1]))) end"
    local output = {}
    if id then
        output = r:array_to_hash(r:eval(redis_lua_script,2,name,id))
        output.id = id
    else
        output = r:eval(redis_lua_script,1,name)
    end
-- end test code

ngx.say(cjson.encode(output))

local ok,error = r:set_keepalive(0,1024)
if not ok then ngx.log(ngx.ERR,"failed to keepalive", error) end

@agentzh
Copy link
Member

agentzh commented Sep 27, 2013

I'm closing this :)

@agentzh agentzh closed this Sep 27, 2013
@antonheryanto
Copy link

sorry didn't time to re investigate but as both of you have test and proved its improved i think the problem is on my side.. thank for response..

@aCayF aCayF mentioned this pull request Feb 20, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants