Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems the socket upstream pointer changed while processing concurrent requests. #150

Closed
bigplum opened this issue Aug 9, 2012 · 9 comments

Comments

@bigplum
Copy link

bigplum commented Aug 9, 2012

I add a debug to print upstream pointers in ngx_http_lua_socket_tcp_send(lua_State *L)

    if (u == NULL || u->peer.connection == NULL || u->ft_type || u->eof) {
        ngx_log_error(NGX_LOG_ERR, r->connection->log, 0,
                      "attempt to send data on a closed socket: u:%p, c:%p, "
                      "ft:%ui eof:%ud",
                      u, u ? u->peer.connection : NULL, u ? u->ft_type : 0,
                      u ? u->eof : 0);

        lua_pushnil(L);
        lua_pushliteral(L, "closed");
        return 2;
    }
    ngx_log_debug1(NGX_LOG_DEBUG_HTTP, r->connection->log, 0,
                   "lua upstream: %p", u);

    if (u->waiting) {
        lua_pushnil(L);
        lua_pushliteral(L, "socket busy");
        return 2;
    }

and make a ab test: ab -c 2 -n 2 localhost/mongo which to do mongodb insert and find.
The req1 get the upstream pointer B71D5FF8 which belongs to req2. So the lua script
will get a "socket busy" exception.

This issue only reproduced when make first ab testing after nginx starting, the second ab testing gets right result.
And restart nginx the issue will reproduced again.

root@simon-1:/usr/local/openresty/nginx/logs# grep "lua upstream" error.log
2012/08/09 10:22:42 [debug] 3505#0: *1 lua upstream: B64A2168
2012/08/09 10:22:42 [debug] 3505#0: *2 lua upstream: B71D5FF8
2012/08/09 10:22:42 [debug] 3505#0: *1 lua upstream: B64A2168
2012/08/09 10:22:42 [debug] 3505#0: *2 lua upstream: B71D5FF8
2012/08/09 10:22:42 [debug] 3505#0: *1 lua upstream: B64A2168
2012/08/09 10:22:42 [debug] 3505#0: *1 lua upstream: B64A2168
2012/08/09 10:22:42 [debug] 3505#0: *2 lua upstream: B71D5FF8
2012/08/09 10:22:42 [debug] 3505#0: *2 lua upstream: B71D5FF8
2012/08/09 10:22:42 [debug] 3505#0: *1 lua upstream: B71D5FF8   <=== 
2012/08/09 10:22:42 [debug] 3505#0: *2 lua upstream: B71D5FF8
2012/08/09 10:29:23 [debug] 3505#0: *5 lua upstream: B64B0A20
2012/08/09 10:29:23 [debug] 3505#0: *6 lua upstream: B64B0FA8
2012/08/09 10:29:23 [debug] 3505#0: *5 lua upstream: B64B0A20
2012/08/09 10:29:23 [debug] 3505#0: *6 lua upstream: B64B0FA8
2012/08/09 10:29:23 [debug] 3505#0: *5 lua upstream: B64B0A20
2012/08/09 10:29:23 [debug] 3505#0: *5 lua upstream: B64B0A20
2012/08/09 10:29:23 [debug] 3505#0: *6 lua upstream: B64B0FA8
2012/08/09 10:29:23 [debug] 3505#0: *6 lua upstream: B64B0FA8
2012/08/09 10:29:23 [debug] 3505#0: *5 lua upstream: B64B0A20    <===
2012/08/09 10:29:23 [debug] 3505#0: *6 lua upstream: B64B0FA8

nginx version: ngx_openresty/1.2.1.11
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-debug --with-cc-opt=-O0 --add-module=../ngx_devel_kit-0.2.17 --add-module=../echo-nginx-module-0.40 --add-module=../xss-nginx-module-0.03rc9 --add-module=../ngx_coolkit-0.2rc1 --add-module=../set-misc-nginx-module-0.22rc8 --add-module=../form-input-nginx-module-0.07rc5 --add-module=../encrypted-session-nginx-module-0.02 --add-module=../ngx_lua-0.5.12 --add-module=../headers-more-nginx-module-0.18 --add-module=../srcache-nginx-module-0.14 --add-module=../array-var-nginx-module-0.03rc1 --add-module=../memc-nginx-module-0.13rc3 --add-module=../redis2-nginx-module-0.08rc4 --add-module=../redis-nginx-module-0.3.6 --add-module=../auth-request-nginx-module-0.2 --add-module=../rds-json-nginx-module-0.12rc10 --add-module=../rds-csv-nginx-module-0.05rc2 --with-ld-opt=-Wl,-rpath,/usr/local/openresty/luajit/lib --with-http_ssl_module
@agentzh
Copy link
Member

agentzh commented Aug 9, 2012

Hello!

On Wed, Aug 8, 2012 at 7:35 PM, Simon notifications@github.com wrote:

and make a ab test: ab -c 2 -n 2 localhost/mongo which to do mongodb insert and find.
The req1 get the upstream pointer B71D5FF8 which belongs to req2. So the lua script
will get a "socket busy" exception.

This issue reproduced when make first ab testing after nginx starting, the second ab testing gets right result.

Are you sharing cosocket objects among all requests by hooking them up
onto the Lua module-level variables? (See here:
http://wiki.nginx.org/HttpLuaModule#Data_Sharing_within_an_Nginx_Worker
)

You'll get the "socket busy" exception when concurrent requests try to
use the same cosocket object and this is by design. So make sure you
create individual cosocket objects for each request to prevent such
access conflicts.

Best regards,
-agentzh

@bigplum
Copy link
Author

bigplum commented Aug 9, 2012

I'm not sure is this a shared cosocket object?

content_by_lua:

1 local mongo = require "resty.mongol"
2 conn = mongo:new()

init.lua:

9 local socket = ngx.socket.tcp

129 function new(self)
130 return setmetatable ( {
131 sock = socket();
132 host = "localhost";
133 port = 27017;
134 } , connmt )
135 end

And if it's a shared object, will all request use the same upstream
pointer?
But the requests use different pointer in the debug log.

于 2012-8-9 11:31, agentzh (章亦春) 写道:

Hello!

On Wed, Aug 8, 2012 at 7:35 PM, Simon notifications@github.com wrote:

and make a ab test: ab -c 2 -n 2 localhost/mongo which to do mongodb
insert and find.
The req1 get the upstream pointer B71D5FF8 which belongs to req2. So
the lua script
will get a "socket busy" exception.

This issue reproduced when make first ab testing after nginx
starting, the second ab testing gets right result.

Are you sharing cosocket objects among all requests by hooking them up
onto the Lua module-level variables? (See here:
http://wiki.nginx.org/HttpLuaModule#Data_Sharing_within_an_Nginx_Worker
)

You'll get the "socket busy" exception when concurrent requests try to
use the same cosocket object and this is by design. So make sure you
create individual cosocket objects for each request to prevent such
access conflicts.

Best regards,
-agentzh


Reply to this email directly or view it on GitHub
#150 (comment).

@agentzh
Copy link
Member

agentzh commented Aug 9, 2012

Hello!

On Wed, Aug 8, 2012 at 8:52 PM, Simon notifications@github.com wrote:

I'm not sure is this a shared cosocket object?

content_by_lua:

1 local mongo = require "resty.mongol"
2 conn = mongo:new()

init.lua:

9 local socket = ngx.socket.tcp

129 function new(self)
130 return setmetatable ( {
131 sock = socket();
132 host = "localhost";
133 port = 27017;
134 } , connmt )
135 end

It seems not.

Are you turning lua_code_cache off? disabling the Lua code cache will
make ngx_lua uncapable of handling concurrent requests.

And also, could you provide a minimized but complete example that can
reproduce this issue?

Thanks!
-agentzh

@bigplum
Copy link
Author

bigplum commented Aug 9, 2012

于 2012-8-9 12:02, agentzh (章亦春) 写道:

Hello!

On Wed, Aug 8, 2012 at 8:52 PM, Simon notifications@github.com wrote:

I'm not sure is this a shared cosocket object?

content_by_lua:

1 local mongo = require "resty.mongol"
2 conn = mongo:new()

init.lua:

9 local socket = ngx.socket.tcp

129 function new(self)
130 return setmetatable ( {
131 sock = socket();
132 host = "localhost";
133 port = 27017;
134 } , connmt )
135 end

It seems not.

Are you turning lua_code_cache off? disabling the Lua code cache will
make ngx_lua uncapable of handling concurrent requests.

lua_code_cache is on. If turning off, this issue will not reproduced.
And also, could you provide a minimized but complete example that can
reproduce this issue?

I'll try.
Thanks!
-agentzh


Reply to this email directly or view it on GitHub
#150 (comment).

@bigplum
Copy link
Author

bigplum commented Aug 9, 2012

It's about the global value. I think the col is global, so it's value changed by another request.
It's weird that the issue only occured at the first 2 requests.

Does the global value have side effect?

local mongo = require "resty.mongol"
conn = mongo:new()
ok, err = conn:connect("10.6.2.51", 27000)
if not ok then
    ngx.say("connect failed: "..err)
end

local db = conn:new_db_handle("test")

col = db:get_col("test")       ---> col is a global value, should be changed to local
if not col then ngx.say("get col failed") end  

ngx.say(col.col)              --->  print test

r,err = col:insert({{name="dog",n=20}},false,true)
if not r then ngx.say("insert failed: "..err) end 

ngx.say(col.col)              ---> print $cmd but it's not changed in col:insert
                                     ---> col.col is invalid at first request but second request gets valid

r = col:find_one({name="dog"})
ngx.say(r["name"])

I can't reproduce this by other test case with simple socket operations. So could you downlaod the mongodb
and lua-resty-mongol to test it?

@agentzh
Copy link
Member

agentzh commented Aug 9, 2012

Hello!

On Wed, Aug 8, 2012 at 7:35 PM, Simon notifications@github.com wrote:

and make a ab test: ab -c 2 -n 2 localhost/mongo which to do mongodb insert and find.
The req1 get the upstream pointer B71D5FF8 which belongs to req2. So the lua script
will get a "socket busy" exception.

Forgot to ask you, are you actually able to observe the "socket busy"
exception in this setup?

Why I'm asking is because that the upstream struct is allocated by the
Lua GC (in the form of a userdata), so it's possible that this piece
of memory already get freed on request A and then immediately is
reused by reqeust B. Could you also track the lifetime of each
request's "upstream" userdata? That is, also tracking the __gc
metamethod call (ngx_http_lua_socket_tcp_upstream_destroy) along the
line?

Thanks!
-agentzh

@bigplum
Copy link
Author

bigplum commented Aug 10, 2012

Yes, here is the "socket busy":

257 2012/08/10 10:01:29 [debug] 3218#0: *1 lua upstream: B7230860
258 2012/08/10 10:01:29 [debug] 3218#0: *1 lua resume returned 2
259 2012/08/10 10:01:29 [error] 3218#0: *1 lua handler aborted: runtime error: /    usr/local/openresty/lualib/resty/mongol/colmt.lua:60: socket busy
260 stack traceback:
261     [C]: in function 'assert'
262     /usr/local/openresty/lualib/resty/mongol/colmt.lua:60: in function 'docm    d'
263     /usr/local/openresty/lualib/resty/mongol/colmt.lua:240: in function 'que    ry'
264     /usr/local/openresty/lualib/resty/mongol/colmt.lua:281: in function 'fin    d_one'
265     /usr/local/openresty/nginx/conf/mongo.lua:24: in function </usr/local/op    enresty/nginx/conf/mongo.lua:1>, client: 127.0.0.1, server: localhost, reque    st: "GET /mongo HTTP/1.0", host: "localhost"

It seems have nothing to do with GC. See the last 2 lines: *1 used upstream: B7230860 which belongs to *2,
and than *2 used this upstream too. So B7230860 was not GC yet.

2012/08/10 10:01:28 [debug] 3218#0: *1 lua upstream: B650B890
2012/08/10 10:01:28 [debug] 3218#0: *1 lua upstream: B650B890
2012/08/10 10:01:28 [debug] 3218#0: *2 lua upstream: B7230860
2012/08/10 10:01:28 [debug] 3218#0: *2 lua upstream: B7230860
2012/08/10 10:01:29 [debug] 3218#0: *1 lua upstream: B7230860
2012/08/10 10:01:29 [debug] 3218#0: *2 lua upstream: B7230860

@agentzh
Copy link
Member

agentzh commented Aug 10, 2012

Hello!

On Thu, Aug 9, 2012 at 7:08 PM, Simon notifications@github.com wrote:

Yes, here is the "socket busy":

257 2012/08/10 10:01:29 [debug] 3218#0: *1 lua upstream: B7230860
258 2012/08/10 10:01:29 [debug] 3218#0: *1 lua resume returned 2
259 2012/08/10 10:01:29 [error] 3218#0: *1 lua handler aborted: runtime error: / usr/local/openresty/lualib/resty/mongol/colmt.lua:60: socket busy
260 stack traceback:
261 [C]: in function 'assert'
262 /usr/local/openresty/lualib/resty/mongol/colmt.lua:60: in function 'docm d'
263 /usr/local/openresty/lualib/resty/mongol/colmt.lua:240: in function 'que ry'
264 /usr/local/openresty/lualib/resty/mongol/colmt.lua:281: in function 'fin d_one'
265 /usr/local/openresty/nginx/conf/mongo.lua:24: in function </usr/local/op enresty/nginx/conf/mongo.lua:1>, client: 127.0.0.1, server: localhost, reque st: "GET /mongo HTTP/1.0", host: "localhost"

It seems have nothing to do with GC. See the last 2 lines: *1 used upstream: B7230860 which belongs to *2,
and than *2 used this upstream too. So B7230860 was not GC yet.

2012/08/10 10:01:28 [debug] 3218#0: *1 lua upstream: B650B890
2012/08/10 10:01:28 [debug] 3218#0: *1 lua upstream: B650B890
2012/08/10 10:01:28 [debug] 3218#0: *2 lua upstream: B7230860
2012/08/10 10:01:28 [debug] 3218#0: *2 lua upstream: B7230860
2012/08/10 10:01:29 [debug] 3218#0: *1 lua upstream: B7230860
2012/08/10 10:01:29 [debug] 3218#0: *2 lua upstream: B7230860

I still believe there's accidental global variables used in your
lua-resty-mongol library. And I've already seen a few in
resty.mongol.dbmt's cmd method. I suggest you apply the tricks in
strict.lua (http://metalua.luaforge.net/src/lib/strict.lua.html )to
your modules to prevent such places that you forgot to declare local
variables as "local".

To quote the related sections in ngx_lua's documentation:

"It is recommended to always place the following piece of code at the
end of Lua modules that use the ngx.location.capture or
ngx.location.capture_multi directives to prevent casual use of
module-level global variables that are shared among all requests:

getmetatable(foo.bar).__newindex = function (table, key, val)
    error('Attempt to write to undeclared variable "' .. key .. '": '
            .. debug.traceback())
end

Assuming the current Lua module is named foo.bar, this will guarantee
that local variables in module foo.bar functions have been declared as
"local". It prevents undesirable race conditions while accessing such
variables. See Data Sharing within an Nginx Worker for the reasons
behind this."

Best regards,
-agentzh

@bigplum
Copy link
Author

bigplum commented Aug 13, 2012

It's related with the undefined variable in function block of lua modules.
So this issue could be closed.

Thanks!

@bigplum bigplum closed this as completed Aug 13, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants