Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't check size for streaming #1

Closed
fcicq opened this issue Jun 6, 2016 · 23 comments
Closed

don't check size for streaming #1

fcicq opened this issue Jun 6, 2016 · 23 comments

Comments

@fcicq
Copy link

fcicq commented Jun 6, 2016

The size checking is not actually required as long as no changes are applied to the data in _upl() and _dwn().

Currently I don't have a working env for openresty. I guess the core loop should look like this:

while not self.exit_flag do
  buf, _, __ = ssock:receive(4096)
  if not buf then
    rsock:send(__)
    break
  end
  local bytes = rsock:send(buf)
  if not bytes then
    break
  end
end
@fcicq
Copy link
Author

fcicq commented Jun 6, 2016

and I think self.exit_flag is unnecessary. "while true do" would be the same and still able to break the loop on error.

@fffonion
Copy link
Owner

fffonion commented Jun 6, 2016

There'll be some problems if client reuse the same connection to issue multiple requests and the requests are not continuous. Imagine the following scenario between a client and our server (_upl thread):

|-------------------4096 bytes----------------------|-------------------4096 bytes----------------------|
|----req1----|----req2----|----req3----|--req4--|-----req5-----|----------------------------req6------------------|

If client finnished several http requests after req5 and stop sending data to our server. But the client wants to reuse this connection in the future, so TCP is not closed. In this situation, our server will not read req5 and proxy req5 to remote before client sends req6, which will leads to the request req5 will not be finnished before client sends req6. The situation is simliar between the response between upstream server and our server (_dwn thread).

This problem will not exists if the connection is only used once or the requests are issued continously.

The self.exit_flag part makes sense and I'll leave this issue open to discuss the buffer size part.

fffonion added a commit that referenced this issue Jun 6, 2016
@fcicq
Copy link
Author

fcicq commented Jun 6, 2016

You are wrongly understanding the semantics of recv().
As man 2 recv: "The receive calls normally return any data available, up to the requested amount, rather than waiting for receipt of the full amount requested."

I'm not sure about how it is possible to reuse the connection in the forwarding mode. Please tell me the RFC Standard(s) if there are.

@fffonion
Copy link
Owner

fffonion commented Jun 6, 2016

The receive() API of ngx cosocket is wrapped from nginx's own event loop and is not working as the standard recv() call.

According to document, "This method will not return until it reads exactly this size of data or an error occurs."

I've set up a echo server with the following code:

server {
        listen 18444;
        content_by_lua_block {
                local s = ngx.req.socket()
                local b = s:receive(10)
                s:send(b)
        }   
}

You can try it using telnet 172.245.255.135 18444 and type 10 charaters. The server will not return until you type the 10th byte.

@fcicq
Copy link
Author

fcicq commented Jun 6, 2016

@fffonion That is shocking that echo server cannot be implemented in an easy way, the current receive() call behavior will largely limit the usefulness of stream-lua-nginx-module. :(

I'm sorry I didn't find that in the first place. so it becomes clear that the issue depends on an API change from the upstream project.

@fcicq
Copy link
Author

fcicq commented Jun 7, 2016

a "partial" argument can be added to the receive() call like the read_bytes() in tornado.iostream module.

the current receive() function waits for (bytes >= (ssize_t) u->rest) in ngx_stream_lua_socket_read_chunk.

            u->input_filter = ngx_stream_lua_socket_read_chunk;
            u->length = (size_t) bytes;
            u->rest = u->length;

a new input_filter function which do the following should work, u->rest should be 0:

        u->buf_in->buf->last += bytes;
        b->pos += bytes;
        return NGX_OK;

@fcicq
Copy link
Author

fcicq commented Jun 7, 2016

Please test the patch for stream-lua-nginx-module so receive('*p') can be introduced.
https://gist.github.com/fcicq/82e1c6d0c85cbc2d3f8e9f1523bfd1d1

@fffonion
Copy link
Owner

fffonion commented Jun 7, 2016

I don't think that'll be the principle of writing coroutines. I suppose a echo server can be implemented like:

while true do
    local buf = sock:receive("*p")
    if buf then
        -- do something of yield some data
    end
end

The code above will block a nginx worker process if client is not sending any data. And ngx_lua has no mechanism like callback (and shouldn't have one) like tornado.

@fcicq
Copy link
Author

fcicq commented Jun 7, 2016

it should block in the same way as line reading (*l), but returns data as soon as arrived. you mean *l will block the process?

@fffonion
Copy link
Owner

fffonion commented Jun 7, 2016

I misread your code before. That makes sense to me.
It would be nice if you send a PR to the upstream 😄

@fcicq
Copy link
Author

fcicq commented Jun 7, 2016

hmm. that patch makes writing echo and other kinds of proxy server in openresty possible.
wondering why other guys didn't use openresty in this way.

actually the same API issue comes from luasocket (buffer.c), and later ported to openresty project.
the same method can be applied to lua-nginx-module with a little modification as well.
a careful discussion with luasocket author(s?) may be required to make the change merged, then openresty should follow.

anyway, thanks for your project that revealed the issues in upstream projects, so some future contributions can be expected.
please do take advantage of the patch to make more great projects.

@fffonion
Copy link
Owner

fffonion commented Jun 7, 2016

Just for some results I found after doing several tests:
The bottleneck of this proxy at present, or to be specific the stream forwarding part is not in len = lshift(byte(hd, 4), 8) + byte(hd, 5) but in lua buffer copying.
See flamegraph flame.zip

This can be convinced by comparing cpu time between current receive(len) method and using receive("*p") method (8.35s compared to 9.3s with default value of lua_socket_buffer_size set to 4k ). So that receive("*p") will not read more than 4k data at a time. In my test, a download from dl-ssl.google.com has a max TLS packet of 16408, roughly 16k to the payload. By default tls payload has a max length of 16k. After chaning lua_socket_buffer_size to 16k or 32k, we spotted a decreasing of CPU time of receive("*p") approach.

This leads to a problem that, if the server or client has a slow connection to our proxy, receive("*p") will be outperformed by receive(len). Since more time is wasted on smaller buffer copying. What's the general solution for handling such situations?

@fcicq
Copy link
Author

fcicq commented Jun 7, 2016

hmm... the final method would be a native forwarding loop, like ngx_stream_proxy_process in ngx_stream_proxy_module.c.
but I'm not sure if socket can be detached from lua object or not.

I can accept the performance fact, but the semantics is different.
the latency would be quite different in "reading size then forward" method. a larger length means larger buffer. what if the sender is slow? the chunk will not be forwarded until fully received.
and I'm happy with the possibility that encryption or other transforms can be applied later (without knowing the length first, aka streaming).

@fffonion
Copy link
Owner

fffonion commented Jun 7, 2016

native module will defintely be a better solution.

while limit the implementation in lua land, whether to use stream forwarding or read length seems to be a tradeoff🌚.

@fcicq
Copy link
Author

fcicq commented Jun 8, 2016

I think that only means the difference of the times of string copy (or build?) operations.

@fffonion
Copy link
Owner

fffonion commented Jun 8, 2016

yes and that'll lead to a dilemma between "more performance" or "less system resources". at current state I think both implementations are acceptable :smile

@fcicq
Copy link
Author

fcicq commented Jun 9, 2016

I don't have enough time to implement a forwarder that bypass the lua string.

@fffonion
Copy link
Owner

fffonion commented Jun 9, 2016

well i'm not asking for anything. it's just a discussion.
i'm closing this issue if you are ok with it.

@fcicq
Copy link
Author

fcicq commented Jun 13, 2016

please leave this open. the cpu usage issue comes from luaL_addlstring implementation in luajit (lib_aux.c). it calls luaL_addchar for length() times.

the luaL_buffer struct in luajit can only store strings with length up to LUAL_BUFFERSIZE, while in lua 5.3 it is able to use luaL_prepbuffsize to prepare the buffer, and luaL_addlstring uses memcpy directly.
hopefully there will be a method to reduce the running complexity for luaL_addlstring.

@fcicq
Copy link
Author

fcicq commented Jun 13, 2016

as defined in src/lauxlib.h, we can do memcpy up to the lesser of length or bufffree(B). then do a emptybuffer(B) && adjuststack(B).
it still allocate once every LUAL_BUFFERSIZE, but I do think it will be faster due to memcpy() usage.

#define luaL_addchar(B,c) \
  ((void)((B)->p < ((B)->buffer+LUAL_BUFFERSIZE) || luaL_prepbuffer(B)), \
(*(B)->p++ = (char)(c)))

@fcicq
Copy link
Author

fcicq commented Jun 13, 2016

@fcicq
Copy link
Author

fcicq commented Jun 15, 2016

please attach a new flamegraph in your workload. I'm also considering a new method similar with select(), it should match the raw forwarding speed if implemented.

@fcicq fcicq closed this as completed Jun 23, 2016
@fffonion
Copy link
Owner

Thanks for your work and it would be more helpful if you can send PR to luajit. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants