-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance tweaks #22
Conversation
The table.insert command re-hash the table on every insert, using the # operator to find the last integer key of the table eliminates this overhead, and result in 1.5x to 17x performance boost across different benchmarks. This: t[#t+1] = value Is a lot faster than this (even in localized LuaJIT): table.insert(t, value) Some published benchmark result: http://blog.jgc.org/2013/04/performance-of-array-creation-in-lua.html The only draw back is that the # operator doesn't actually count the numbers of items in the table (according to http://lua-users.org/wiki/TablesTutorial), which isn't needed anyway when items are being sequentially inserted. We have been using the # operator in production for months without any issue, in fact we no longer have "local table.insert" in any of our codes. We highly recommend you guys to replace all the table.insert across all lua codes with t[#t+1] = x and gain extra performances. Thanks for the hard work agentzh, keep it up!
Thank you for the catch! But use of the # operator in the loop iterations is also a little expensive than tracking the table length by a local Lua variable ourselves according to the on-CPU Flame Graphs. Because it requires calling the lj_tab_len primitive in the LuaJIT VM on almost every iteration. Could you please update your patch and use the following paradigm instead?
|
More performance tweaks recommended by agentzh
Wow so there are more room for even more performance? That's great news! I've made the modifications as suggested (except line 248 which isn't part of a loop), if you have any more ideas please let us know, we don't mind tweaking it until every drop of performance has been squeezed out :-) |
Just studied the Redis protocol again, it appears most bulk replies actually begin with "$" and not "*", I read the protocol the other way last time somehow, must have been low on caffeine :-). |
By the way, in our production box we've replaced:
with just:
The idea was to save us from doing millions of "if not dummy then" in huge bulk replies, since the only code after that is a "return data", if there is a network error the next _read_reply() should be able to catch it straight away. What is your opinion on this? |
… + 1] = val". thanks alex-yam for the patch in #22.
@alex-yam To be honest, I don't really like your later micro optimizations :) Let's focus on true bottlenecks in the on-CPU Flame Graphs generated by the ngx-sample-bt tool for real-world apps: https://github.com/agentzh/nginx-systemtap-toolkit/#ngx-sample-bt It'll be great if you can share flamegraphs for your real-world apps using lua-resty-redis :) |
@alex-yam Also it's a good idea to try out the latest low-overhead builtin profiler in LuaJIT's v2.1 git branch: http://comments.gmane.org/gmane.comp.lang.lua.luajit/2906 BTW, we can further optimize the lj_tab_newkey primitive inside the LuaJIT VM in lua-resty-redis (and other libraries) when Mike Pall implements the new These new Lua APIs will appear in the v2.1 branch by this Christmas I think :) Another thing that may give noticeable performance is to JIT compile this Lua library. Right now, even when you're using LuaJIT, most of the Lua code is still interpreted by LuaJIT's interpreter. We'll need a better lua-resty-core library and a better LuaJIT 2.1 for this :) See https://github.com/agentzh/lua-resty-core |
…concat() call because it is already the default. thanks alex-yam for the patch in #22.
No problems, year 2012 was a good year, who needs 2013 ;) Jokes aside, my main focus has always been on the application development side so I was pretty sure some modifications didn't take care of many of the deeper level lua requirements. Thanks for applying what worked for you, the flame graphs looks like it can be very useful, I'll take a look. You may not have realized it but what you have developed is a game changer on many levels around the world. Keep up the great work! |
…rbyfloat, incrbyfloat, migrate, pexpire, pexpireat, psetex, pubsub, pttl, restore, and time. thanks alex-yam for the patch in #22.
did simple benchmark compare old with updated version but found drop performance old: 11282 req/s |
@antonheryanto Could you provide the full benchmark example that you're using (and the steps you run the benchmark). I've tried the following trivial example on my side and the new version is consistently 1.45% faster:
I'm using the command Because your testing results with the new version drops so much, there could be some configuration mistakes or benchmarking mistakes out there. Always keep an eye on your nginx's error log file. And you can use the on-CPU and off-CPU flamegraph tools to find the bottlenecks. |
@agentzh wrk -c5 -t5 -d5s http://localhost/area bellow is the lua code -- redis
local cjson = require "cjson"
local ngx = ngx
local tonumber = tonumber
local redis = require "resty.redis"
local r = redis:new()
r:set_timeout(1000)
local ok, error = r:connect("127.0.0.1", 6379)
if not ok then
ngx.log(ngx.ERR, "failed connect to redis at : ", error)
return
end
local name = "area"
local function get(id)
local a = r:hgetall(name ..":".. id)
local v = r:array_to_hash(a)
v.id = id
return v
end
local function all()
local m = r:smembers(name)
local result = {}
for i=1,#m do
result[i] = get(m[i])
end
return result
end
local id = tonumber(ngx.var.arg_id)
local output = id and get(id) or all()
ngx.say(cjson.encode(output))
local ok,error = r:set_keepalive(0,1024)
if not ok then ngx.log(ngx.ERR,"failed to keepalive", error) end and i try to benchmark using your trivial example above, I got my nginx.conf
on linode 1GB running ubuntu 13.04 (raring) 32 bit you maybe right, i could have benchmark or configuration mistake cause not really expert on this mater yet.. still learning :D |
just run the trivial example on my linode again using ab instead ab -k -c10 -n40000 localhost:8001/t got result: btw, i run benchmark 3 times and get the average |
@antonheryanto which version of LuaJIT are you using? This result looks weird. I've tried my trivial code example (pasted previously) with both LuaJIT v2.1 branch and LuaJIT 2.0.2 release on Linux x86_64, both giving consistent speed up in lua-resty-redis's new version: LuaJIT v2.1 branch: LuaJIT 2.0.2 release: Also, we can see that the LuaJIT v2.1 branch gives a further 15.8% speedup as compared to LuaJIT 2.0.2. BTW, could you generate an on-CPU flamegraph for both of these setups? Also, your code example using Seeing that you're using 32-bit systems, I'll try it out on a 32-bit box later. |
That was certainly unexpected since the code didn't change much, I mean 1+1 can't really be slower than table.insert. Perhaps not using the old module(...) is somehow creating errors for your nginx compile. Please check the error logs, maybe nginx was writing to the error log on every request and caused req/s to drop by half. Or perhaps there were some network issues, can you run a test with this script and see what happens with the network effect reduced? (it merges 'smembers' + multiple 'get' requests into a single redis request) -- redis
local cjson = require "cjson"
local ngx = ngx
local tonumber = tonumber
local redis = require "resty.redis"
local r = redis:new()
r:set_timeout(1000)
local ok, error = r:connect("127.0.0.1", 6379)
if not ok then
ngx.log(ngx.ERR, "failed connect to redis at : ", error)
return
end
local name = "area"
-- original code
--[[
local function get(id)
local a = r:hgetall(name ..":".. id)
local v = r:array_to_hash(a)
v.id = id
return v
end
local function all()
local m = r:smembers(name)
local result = {}
for i=1,#m do
result[i] = get(m[i])
end
return result
end
local id = tonumber(ngx.var.arg_id)
local output = id and get(id) or all()
]]
-- end original code
-- test code
local id = tonumber(ngx.var.arg_id)
local redis_lua_script = "if KEYS[2] then return redis.pcall('hgetall',table.concat({KEYS[1],':',KEYS[2]})) else local p = redis.pcall return p('mget',unpack(p('smembers',KEYS[1]))) end"
local output = {}
if id then
output = r:array_to_hash(r:eval(redis_lua_script,2,name,id))
output.id = id
else
output = r:eval(redis_lua_script,1,name)
end
-- end test code
ngx.say(cjson.encode(output))
local ok,error = r:set_keepalive(0,1024)
if not ok then ngx.log(ngx.ERR,"failed to keepalive", error) end |
I'm closing this :) |
sorry didn't time to re investigate but as both of you have test and proved its improved i think the problem is on my side.. thank for response.. |
The table.insert command re-hash the table on every insert, using the # operator to find the last integer key of the table eliminates this overhead, and result in 1.5x to 17x performance boost across different benchmarks.
This:
t[#t+1] = value
Is a lot faster than this (even in localized LuaJIT):
table.insert(t, value)
Some published benchmark result:
http://blog.jgc.org/2013/04/performance-of-array-creation-in-lua.html
The only draw back is that the # operator doesn't actually count the numbers of items in the table (according to http://lua-users.org/wiki/TablesTutorial), which isn't needed anyway when items are being sequentially inserted.
We have been using the # operator in production for months without any issue, in fact we no longer have "local table.insert" in any of our codes.
We highly recommend you guys to replace all the table.insert across all lua codes with t[#t+1] = x and gain extra performances.
Thanks for the hard work agentzh, keep it up!