Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about chunk sizes when combined with ngx_http_proxy_module with buffering enabled #817

Closed
deiwin opened this issue Jul 5, 2016 · 4 comments

Comments

@deiwin
Copy link

deiwin commented Jul 5, 2016

I'm using body_filter_by_lua to do regex search and replace on files proxied with proxy_pass. These regexes will fail finding a match if the matching part of the file happens to be split in parts by chunking. Because of this it is important for me to understand exactly how the chunks provided to body_filter_by_lua are chosen.

In a simple case where proxy buffering is disabled with proxy_buffering off;, I've observed that the chunks are the actual response chunks from upstream. This is also somewhat documented in the third example of the body_filter_by_lua documentation.

But if proxy buffering is enabled (which it is by default), it says here that

A response is stored in the internal buffers and is not sent to the client until the whole response is received.

Does this mean that the chunk provided to body_filter_by_lua is guaranteed to be the entire response body when proxy buffering is enabled?

@agentzh
Copy link
Member

agentzh commented Jul 5, 2016

@deiwin No, there is no such guarantee especially for relatively large response bodies, even when buffering is enabled in the content handler modules (like ngjx_proxy). Always be prepared that you only get a data chunk of the whole body.

To solve such problems in general, we need a regex engine that supports streaming processing. The ngx.re API is based on a backtracking regex engine called PCRE, which won't ever fly. One such attempt is my sregex engine which is used by the ngx_replace_filter to do streaming regex substibution:

https://github.com/openresty/replace-filter-nginx-module

And yeah, sregex still needs a lot of optimizations to really beat PCRE, as experimented here:

http://openresty.org/misc/re/bench/

@agentzh
Copy link
Member

agentzh commented Jul 5, 2016

@deiwin BTW, This place is for bug reports and development discussions only. For general questions and discussions, please join the openresty-en mailing list instead: https://openresty.org/en/community.html

This is specified in the default github issue template. Please read it more carefully. Thank you!

@deiwin
Copy link
Author

deiwin commented Jul 6, 2016

Thank you for the clarification. Your sregex engine is very intriguing! In the particular case I'm currently working on, however, simple buffering is sufficient. As there are no lua bindings for sregex it would've been a lot of work to change the current system, so I've borrowed from an example you provided here and implemented a buffering solution instead.

This place is for bug reports and development discussions only. For general questions and discussions, please join the openresty-en mailing list instead: https://openresty.org/en/community.html

Sorry about that. I did read the note in the issue template, but misunderstood what was meant by development discussions. If I have any further/other questions, I'll be sure to join the mailing list. Thanks for answering regardless!

P.S.

And yeah, sregex still needs a lot of optimizations to really beat PCRE, as experimented here:
http://openresty.org/misc/re/bench/

Am I reading the results wrong or is sregex actually winning in a lot of the cases?

@deiwin deiwin closed this as completed Jul 6, 2016
@agentzh
Copy link
Member

agentzh commented Jul 6, 2016

@deiwin Yeah, the benchmark page I gave you shows the performance of the next generation of sregex DFA engine which is generally faster than both PCRE JIT and RE2 even in non-streaming mode (well, the latter two do not support streaming processing anyway).

The current version of sregex in git master is still the slow version and is not used in that benchmark page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants