Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support chunked request bodies #1262

Merged
merged 3 commits into from
Jun 21, 2024
Merged

Conversation

hongzhidao
Copy link
Contributor

@hongzhidao hongzhidao commented May 14, 2024

A draft patch for the MVP of this feature.

Internally, I transferred the chunked request to content-length request since the application process can't handle chunked request now.

Test

conf.json

{
    "listeners": {
        "*:8080": {
            "pass": "routes"
        }
    },
    "routes": [
        {
            "action": {
                "pass": "applications/app"
            }
        }
    ],
    "applications": {
        "app": {
            "type": "python",
            "path": "/www",
            "module": "wsgi"
        }
    }
}

/www/wsgi.py

def application(environ, start_response):
    content_length = int(environ.get('CONTENT_LENGTH', 0))
    body = bytes(environ['wsgi.input'].read(content_length))
    start_response(
        '200', [],
    )
    return [body]
> curl -X POST -d 'hello' -H "Transfer-Encoding: chunked" http://127.0.0.1:8080 -v
Note: Unnecessary use of -X or --request, POST is already inferred.
* Rebuilt URL to: http://127.0.0.1:8080/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> POST / HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.61.1
> Accept: */*
> Transfer-Encoding: chunked
> Content-Type: application/x-www-form-urlencoded
>
> 5
* upload completely sent off: 12 out of 5 bytes
< HTTP/1.1 200 OK
< Server: Unit/1.33.0
< Date: Wed, 29 May 2024 09:04:27 GMT
< Transfer-Encoding: chunked
<
* Connection #0 to host 127.0.0.1 left intact
hello

Copy link
Contributor

@avahahn avahahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question:
Does this result in the response being buffered in NGINX Unit, or are the chunks immediately streamed to the peer?


nxt_buf_t *out = nxt_http_chunk_parse(task, &h1p->chunked_parse, in);

if (h1p->chunked_parse.chunk_error ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this conditional meant to be formatted like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it looks simply like a snafu...

size = nxt_buf_mem_used_size(&in->mem);
if (size == 0) {
status = NXT_HTTP_NOT_IMPLEMENTED;
goto error;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice in nxt_h1p_peer_header_read_done that the case where nxt_buf_mem_used_size returns non-zero goes to the following logic:

        r->state->ready_handler(task, r, peer);
        return;

Does that mean we should goto ready instead?

Comment on lines 903 to 908
if (h1p->chunked_parse.last) {
out->next = nxt_buf_sync_alloc(r->mem_pool, NXT_BUF_SYNC_LAST);
out = nxt_h1p_chunk_create(task, r, out);
r->body = out;
r->state->ready_handler(task, r,
NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... same as here ...

But this is clearly just a PoC/work in progress...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, just for PoC.

@avahahn
Copy link
Contributor

avahahn commented May 20, 2024

fixes: #445

@avahahn
Copy link
Contributor

avahahn commented May 20, 2024

Here's a patch that fixes the formatting nits:

diff --git a/src/nxt_h1proto.c b/src/nxt_h1proto.c
index 6550efb9..16f6dcaa 100644
--- a/src/nxt_h1proto.c
+++ b/src/nxt_h1proto.c
@@ -893,10 +893,8 @@ nxt_h1p_request_body_read(nxt_task_t *task, nxt_http_request_t *r)
 
         nxt_buf_t *out = nxt_http_chunk_parse(task, &h1p->chunked_parse, in);
 
-        if (h1p->chunked_parse.chunk_error ||
- h1p->chunked_parse.error) {
-            status = NXT_HTTP_NOT_IMPLEMENTED
-;
+        if (h1p->chunked_parse.chunk_error || h1p->chunked_parse.error) {
+            status = NXT_HTTP_NOT_IMPLEMENTED;
             goto error;
         }
 
@@ -904,8 +902,7 @@ nxt_h1p_request_body_read(nxt_task_t *task, nxt_http_request_t *r)
             out->next = nxt_buf_sync_alloc(r->mem_pool, NXT_BUF_SYNC_LAST);
             out = nxt_h1p_chunk_create(task, r, out);
             r->body = out;
-            r->state->ready_handler(task, r,
-NULL);
+            r->state->ready_handler(task, r, NULL);
             return;
         }
     }

Can we get this PR converted to a non-draft PR? I think we should push this in as a solution for chunked transfer encoded requests.

@hongzhidao
Copy link
Contributor Author

Can we get this PR converted to a non-draft PR?

Does non-draft PR mean it's ready to review? And it will run with GH actions for tests?

@ac000
Copy link
Member

ac000 commented May 21, 2024

Can we get this PR converted to a non-draft PR?

Does non-draft PR mean it's ready to review? And it will run with GH actions for tests?

I generally regard a non-draft PR as reviewable (whether I've been assigned to review or not), sometimes I'll do a rough review of a draft PR depending on circumstances...

Draft and non-draft PR's both get run through the GH actions, of which you can generally ignore the Ruby 3.2 failures....

@hongzhidao
Copy link
Contributor Author

Good to know, thanks.
Btw, the way to support the chunked body should be different from the patch. The final solution is not clear yet for me.

@callahad
Copy link
Collaborator

Btw, the way to support the chunked body should be different from the patch. The final solution is not clear yet for me.

Can you say more about how this approach doesn't feel quite right to you?

We need to land something ASAP, and should act as though the deadline to have a mergeable / testable branch is the end of this week.

Is the current approach good enough for that, or do you think this approach is so flawed that you would block merging?

@ac000
Copy link
Member

ac000 commented May 21, 2024

We need to land something ASAP, and should act as though the deadline

Actually, I don't think we do, not in the main Unit repository anyway...

to have a mergeable / testable branch is the end of this week.

There is nothing preventing you from having a branch in whoevers
personal clone for testing purposes or even for actual use until a
correct solution is merged upstream...

@callahad
Copy link
Collaborator

We have an internal opportunity with a constraint that requires landing functional (but not perfect) support for Transfer-Encoding: chunked in master this calendar month.

Rightly or wrongly, missing that will close the window on that particular opportunity.

@hongzhidao
Copy link
Contributor Author

Can you say more about how this approach doesn't feel quite right to you?

Sure. Let me list some of the challenges.

  1. About the pipeline process.
    Unit supports pipeline requests. For any request, it doesn't handle the request after it. Since for content-length body, we know how much to read based on the content-length size from the request header. But for chunked body, the size isn't known until reading the data in the request body and parsed successfully for each chunk.
    So to solve the problem, we need to check if we need to change the design of limiting not to read or process the next request for the current request. I'd prefer to keep the design, but it looks like chunked body might break it.

  2. About the body for size that is larger than the size of the buffer.
    In Unit, if the request body is larger than the buffer which is 8k size, the body will be dumped into a temporary file. For content-length request, after reading request header, we've known if the body will be saved into a memory buffer or a file buffer. But for chunked body, we have to check it while reading and parsing the body.
    To write a patch to be ready to review, I feel we need to change the implementation of content-length body reading and make it general. NGINX does it this way.

  3. Ideally, we should send the chunked body to application process for the chunked request.
    Unfortunately, the application process only process content-length body. It's not a small change to support it. The imperfect way is to transfer the chunked body to content-length body before sending to application process, and it means we need to receive the entire body in this way.

  4. The buffer type for the request body.
    For the size that is less than buffer size, Unit uses a memory buffer to save it. But if the size is larger than buffer size, Unit uses a file buffer instead. To note this point since this design has a lot to do with the challenge.

Is the current approach good enough for that, or do you think this approach is so flawed that you would block merging?

The patch I posted last week is for the demo asked Thursday, and it didn't solve the above challenges.

We have an internal opportunity with a constraint that requires landing functional (but not perfect) support for Transfer-Encoding: chunked in master this calendar month.

I'll try my best to provide a patch that can work with application processes, but I'm afraid it's not ready to be reviewed or merged into upstream. It needs to take more development time.

@ac000
Copy link
Member

ac000 commented May 29, 2024

Hi @hongzhidao good job!

I see this is still marked as a draft, but I'll do a quick once over your latest patch while I notice some minor things...

One small other nit, perhaps you could reword the subject to something like

Support chunked request bodies

@hongzhidao hongzhidao changed the title HTTP: request chunked body support. Support chunked request body. May 29, 2024
@hongzhidao
Copy link
Contributor Author

One small other nit, perhaps you could reword the subject to something like
Support chunked request bodies

Good suggestion, let me know if body needs to be replaced with bodies.

I see this is still marked as a draft

Yep, it's for the demo.

We need to land something ASAP, and should act as though the deadline to have a mergeable / testable branch is the end of this week.

Internally, I transferred the chunked request to content-length request since the application process can't handle chunked request now.

The design should be rewritten to make more sense. But it's not clear yet. Will do it after the demo.

src/nxt_h1proto.c Outdated Show resolved Hide resolved
src/nxt_h1proto.c Outdated Show resolved Hide resolved
src/nxt_http_request.c Outdated Show resolved Hide resolved
@ac000
Copy link
Member

ac000 commented May 29, 2024

One small other nit, perhaps you could reword the subject to something like
Support chunked request bodies

Good suggestion, let me know if body needs to be replaced with bodies.

Yes, I think the plural bodes makes more sense here...

I see this is still marked as a draft

Yep, it's for the demo.

We need to land something ASAP, and should act as though the deadline to have a mergeable / testable branch is the end of this week.

Internally, I transferred the chunked request to content-length request since the application process can't handle chunked request now.

The design should be rewritten to make more sense. But it's not clear yet. Will do it after the demo.

Sure...

@hongzhidao hongzhidao changed the title Support chunked request body. Support chunked request bodies May 30, 2024
@hongzhidao
Copy link
Contributor Author

Hi,

Ideally, we should send the chunked body to application process for the chunked request.
Unfortunately, the application process only process content-length body. It's not a small change to support it. The imperfect way is to transfer the chunked body to content-length body before sending to application process, and it means we need to receive the entire body in this way.

I'm trying to get the application process to parse chunked request body.

@callahad
Copy link
Collaborator

callahad commented Jun 4, 2024

@hongzhidao For the first release of this, let's ignore sending chunked data to application processes. It is acceptable to buffer and transform from chunked bodies to content-length for now.

Copy link
Contributor

@andrey-zelenkov andrey-zelenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @hongzhidao, some error handling improvements suggested. Full diff:

diff --git a/src/nxt_h1proto.c b/src/nxt_h1proto.c
--- a/src/nxt_h1proto.c
+++ b/src/nxt_h1proto.c
@@ -829,16 +829,27 @@ nxt_h1p_websocket_version(void *ctx, nxt
 static nxt_int_t
 nxt_h1p_transfer_encoding(void *ctx, nxt_http_field_t *field, uintptr_t data)
 {
+    nxt_h1proto_t       *h1p;
     nxt_http_te_t       te;
     nxt_http_request_t  *r;
 
     r = ctx;
+    h1p = r->proto.h1;
+
+    if (nxt_slow_path(r->content_length != NULL) || !nxt_h1p_is_http11(h1p)) {
+        return NXT_HTTP_BAD_REQUEST;
+    }
+
     field->skip = 1;
     field->hopbyhop = 1;
 
     if (field->value_length == 7
         && memcmp(field->value, "chunked", 7) == 0)
     {
+        if (nxt_slow_path(r->proto.h1->transfer_encoding == NXT_HTTP_TE_CHUNKED)) {
+            return NXT_HTTP_BAD_REQUEST;
+        }
+
         te = NXT_HTTP_TE_CHUNKED;
 
     } else {
@@ -856,12 +867,13 @@ nxt_h1p_transfer_encoding(void *ctx, nxt
 static void
 nxt_h1p_request_body_read(nxt_task_t *task, nxt_http_request_t *r)
 {
-    size_t         size, body_buffer_size, body_rest;
-    ssize_t        res, body_length;
-    nxt_str_t      *tmp_path, tmp_name;
-    nxt_buf_t      *in, *b, *out, *chunk;
-    nxt_conn_t     *c;
-    nxt_h1proto_t  *h1p;
+    size_t             size, body_buffer_size, body_rest;
+    ssize_t            res, body_length;
+    nxt_str_t          *tmp_path, tmp_name;
+    nxt_buf_t          *in, *b, *out, *chunk;
+    nxt_conn_t         *c;
+    nxt_h1proto_t      *h1p;
+    nxt_http_status_t  status;
 
     static const nxt_str_t tmp_name_pattern = nxt_string("/req-XXXXXXXX");
 
@@ -878,9 +890,8 @@ nxt_h1p_request_body_read(nxt_task_t *ta
         break;
 
     case NXT_HTTP_TE_UNSUPPORTED:
-        h1p->keepalive = 0;
-        nxt_http_request_error(task, r, NXT_HTTP_NOT_IMPLEMENTED);
-        return;
+        status = NXT_HTTP_NOT_IMPLEMENTED;
+        goto error;
 
     default:
     case NXT_HTTP_TE_NONE:
@@ -903,6 +914,7 @@ nxt_h1p_request_body_read(nxt_task_t *ta
                                body_buffer_size + sizeof(nxt_file_t)
                                + tmp_name.length + 1, 0);
         if (nxt_slow_path(b == NULL)) {
+            status = NXT_HTTP_INTERNAL_SERVER_ERROR;
             goto error;
         }
 
@@ -926,6 +938,7 @@ nxt_h1p_request_body_read(nxt_task_t *ta
         if (nxt_slow_path(b->file->fd == -1)) {
             nxt_alert(task, "mkstemp(%s) failed %E", tmp_name.start, nxt_errno);
 
+            status = NXT_HTTP_INTERNAL_SERVER_ERROR;
             goto error;
         }
 
@@ -941,6 +954,7 @@ nxt_h1p_request_body_read(nxt_task_t *ta
 
         b = nxt_buf_mem_alloc(r->mem_pool, body_buffer_size, 0);
         if (nxt_slow_path(b == NULL)) {
+            status = NXT_HTTP_INTERNAL_SERVER_ERROR;
             goto error;
         }
     }
@@ -965,6 +979,9 @@ nxt_h1p_request_body_read(nxt_task_t *ta
                 out = nxt_http_chunk_parse(task, &h1p->chunked_parse, in);
 
                 if (h1p->chunked_parse.chunk_error || h1p->chunked_parse.error) {
+                    nxt_debug(task, "h1p request body chunk parse error");
+
+                    status = NXT_HTTP_BAD_REQUEST;
                     goto error;
                 }
 
@@ -972,6 +989,7 @@ nxt_h1p_request_body_read(nxt_task_t *ta
                     size = nxt_buf_mem_used_size(&chunk->mem);
                     res = nxt_fd_write(b->file->fd, chunk->mem.pos, size);
                     if (nxt_slow_path(res < (ssize_t) size)) {
+                        status = NXT_HTTP_INTERNAL_SERVER_ERROR;
                         goto error;
                     }
 
@@ -985,6 +1003,7 @@ nxt_h1p_request_body_read(nxt_task_t *ta
             } else {
                 res = nxt_fd_write(b->file->fd, in->mem.pos, size);
                 if (nxt_slow_path(res < (ssize_t) size)) {
+                    status = NXT_HTTP_INTERNAL_SERVER_ERROR;
                     goto error;
                 }
 
@@ -1035,7 +1054,7 @@ error:
 
     h1p->keepalive = 0;
 
-    nxt_http_request_error(task, r, NXT_HTTP_INTERNAL_SERVER_ERROR);
+    nxt_http_request_error(task, r, status);
 }
 
 
diff --git a/src/nxt_http_request.c b/src/nxt_http_request.c
--- a/src/nxt_http_request.c
+++ b/src/nxt_http_request.c
@@ -220,6 +220,10 @@ nxt_http_request_content_length(void *ct
 
     r = ctx;
 
+    if (nxt_slow_path(r->transfer_encoding != NULL)) {
+        return NXT_HTTP_BAD_REQUEST;
+    }
+
     if (nxt_fast_path(r->content_length == NULL)) {
         r->content_length = field;
 
 

Feel free to edit/reject parts or whole diff, as you choose.

@@ -1007,7 +1035,7 @@ nxt_h1p_request_body_read(nxt_task_t *task, nxt_http_request_t *r)

h1p->keepalive = 0;

nxt_http_request_error(task, r, status);
nxt_http_request_error(task, r, NXT_HTTP_INTERNAL_SERVER_ERROR);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to revert this logic back since invalid chunks are actually NXT_HTTP_BAD_REQUEST so we need status to customise logic here. I got following diff:

@@ -856,12 +860,13 @@ nxt_h1p_transfer_encoding(void *ctx, nxt
 static void
 nxt_h1p_request_body_read(nxt_task_t *task, nxt_http_request_t *r)
 {
-    size_t         size, body_buffer_size, body_rest;
-    ssize_t        res, body_length;
-    nxt_str_t      *tmp_path, tmp_name;
-    nxt_buf_t      *in, *b, *out, *chunk;
-    nxt_conn_t     *c;
-    nxt_h1proto_t  *h1p;
+    size_t             size, body_buffer_size, body_rest;
+    ssize_t            res, body_length;
+    nxt_str_t          *tmp_path, tmp_name;
+    nxt_buf_t          *in, *b, *out, *chunk;
+    nxt_conn_t         *c;
+    nxt_h1proto_t      *h1p;
+    nxt_http_status_t  status;
 
     static const nxt_str_t tmp_name_pattern = nxt_string("/req-XXXXXXXX");
 
@@ -878,9 +883,8 @@ nxt_h1p_request_body_read(nxt_task_t *ta
         break;
 
     case NXT_HTTP_TE_UNSUPPORTED:
-        h1p->keepalive = 0;
-        nxt_http_request_error(task, r, NXT_HTTP_NOT_IMPLEMENTED);
-        return;
+        status = NXT_HTTP_NOT_IMPLEMENTED;
+        goto error;
 
     default:
     case NXT_HTTP_TE_NONE:
@@ -903,6 +907,7 @@ nxt_h1p_request_body_read(nxt_task_t *ta
                                body_buffer_size + sizeof(nxt_file_t)
                                + tmp_name.length + 1, 0);
         if (nxt_slow_path(b == NULL)) {
+            status = NXT_HTTP_INTERNAL_SERVER_ERROR;
             goto error;
         }
 
@@ -926,6 +931,7 @@ nxt_h1p_request_body_read(nxt_task_t *ta
         if (nxt_slow_path(b->file->fd == -1)) {
             nxt_alert(task, "mkstemp(%s) failed %E", tmp_name.start, nxt_errno);
 
+            status = NXT_HTTP_INTERNAL_SERVER_ERROR;
             goto error;
         }
 
@@ -941,6 +947,7 @@ nxt_h1p_request_body_read(nxt_task_t *ta
 
         b = nxt_buf_mem_alloc(r->mem_pool, body_buffer_size, 0);
         if (nxt_slow_path(b == NULL)) {
+            status = NXT_HTTP_INTERNAL_SERVER_ERROR;
             goto error;
         }
     }
@@ -965,6 +972,9 @@ nxt_h1p_request_body_read(nxt_task_t *ta
                 out = nxt_http_chunk_parse(task, &h1p->chunked_parse, in);
 
                 if (h1p->chunked_parse.chunk_error || h1p->chunked_parse.error) {
+                    nxt_debug(task, "h1p request body chunk parse error");
+
+                    status = NXT_HTTP_BAD_REQUEST;
                     goto error;
                 }
 
@@ -972,6 +982,7 @@ nxt_h1p_request_body_read(nxt_task_t *ta
                     size = nxt_buf_mem_used_size(&chunk->mem);
                     res = nxt_fd_write(b->file->fd, chunk->mem.pos, size);
                     if (nxt_slow_path(res < (ssize_t) size)) {
+                        status = NXT_HTTP_INTERNAL_SERVER_ERROR;
                         goto error;
                     }
 
@@ -985,6 +996,7 @@ nxt_h1p_request_body_read(nxt_task_t *ta
             } else {
                 res = nxt_fd_write(b->file->fd, in->mem.pos, size);
                 if (nxt_slow_path(res < (ssize_t) size)) {
+                    status = NXT_HTTP_INTERNAL_SERVER_ERROR;
                     goto error;
                 }
 
@@ -1035,7 +1047,7 @@ error:
 
     h1p->keepalive = 0;
 
-    nxt_http_request_error(task, r, NXT_HTTP_INTERNAL_SERVER_ERROR);
+    nxt_http_request_error(task, r, status);
 }
 
 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.


r->body = b;

if (body_length > body_buffer_size) {
tmp_name.start = nxt_pointer_to(b->mem.start, sizeof(nxt_file_t));

memcpy(tmp_name.start, tmp_path->start, tmp_path->length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of this PR but usually nxt_memcpy wrapper is used instead of memcpy. Could you please check if it worth to fix?

Copy link
Contributor Author

@hongzhidao hongzhidao Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't touch it, but I'm fine with using nxt_memcpy instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's worth changing it for changes sake... it's a straight wrapper for memcpy(3)

Comment on lines 850 to 851
r->transfer_encoding = field;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having following quotes from RFC:

The Transfer-Encoding header field lists the transfer coding names corresponding to the sequence of transfer codings that have been (or will be) applied to the content in order to form the message body.

A sender MUST NOT apply the chunked transfer coding more than once to a message body (i.e., chunking an already chunked message is not allowed).

means that it's not possible to have 2 chunked values within single request. So it worth to add following check (need some styling):

diff --git a/src/nxt_h1proto.c b/src/nxt_h1proto.c
--- a/src/nxt_h1proto.c
+++ b/src/nxt_h1proto.c
@@ -839,6 +839,10 @@ nxt_h1p_transfer_encoding(void *ctx, nxt
     if (field->value_length == 7
         && memcmp(field->value, "chunked", 7) == 0)
     {
+        if (nxt_slow_path(r->proto.h1->transfer_encoding == NXT_HTTP_TE_CHUNKED)) {
+            return NXT_HTTP_BAD_REQUEST;
+        }
+
         te = NXT_HTTP_TE_CHUNKED;
 
     } else {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transfer-Encoding and Content-Length should not be sent at the same time:

diff --git a/src/nxt_h1proto.c b/src/nxt_h1proto.c
--- a/src/nxt_h1proto.c
+++ b/src/nxt_h1proto.c
@@ -833,6 +833,11 @@ nxt_h1p_transfer_encoding(void *ctx, nxt
     nxt_http_request_t  *r;
 
     r = ctx;
+
+    if (nxt_slow_path(r->content_length != NULL)) {
+        return NXT_HTTP_BAD_REQUEST;
+    }
+
     field->skip = 1;
     field->hopbyhop = 1;
 
diff --git a/src/nxt_http_request.c b/src/nxt_http_request.c
--- a/src/nxt_http_request.c
+++ b/src/nxt_http_request.c
@@ -220,6 +220,10 @@ nxt_http_request_content_length(void *ct
 
     r = ctx;
 
+    if (nxt_slow_path(r->transfer_encoding != NULL)) {
+        return NXT_HTTP_BAD_REQUEST;
+    }
+
     if (nxt_fast_path(r->content_length == NULL)) {
         r->content_length = field;
 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transfer-Encoding should work for HTTP/1.1 only:

diff --git a/src/nxt_h1proto.c b/src/nxt_h1proto.c
--- a/src/nxt_h1proto.c
+++ b/src/nxt_h1proto.c
@@ -829,12 +829,14 @@ nxt_h1p_websocket_version(void *ctx, nxt
 static nxt_int_t
 nxt_h1p_transfer_encoding(void *ctx, nxt_http_field_t *field, uintptr_t data)
 {
+    nxt_h1proto_t       *h1p;
     nxt_http_te_t       te;
     nxt_http_request_t  *r;
 
     r = ctx;
-
-    if (nxt_slow_path(r->content_length != NULL)) {
+    h1p = r->proto.h1;
+
+    if (nxt_slow_path(r->content_length != NULL) || !nxt_h1p_is_http11(h1p)) {
         return NXT_HTTP_BAD_REQUEST;
     }
 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I reworked the checking about transfer encoding stuff.

@hongzhidao
Copy link
Contributor Author

Hi all,
welcome to review, but the max_body_size check for the chunked body is not supported yet.
I'd suggest checking with @andrey-zelenkov's tests to understand how it is used.
#1307

@ac000
Copy link
Member

ac000 commented Jun 11, 2024

Ignore the Fedora Rawhide failures, Rawhide seems broken at the moment...

Comment on lines -101 to -103

uint64_t chunk_size;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would normally suggest such a change (removing the blank lines) be done in a separate commit, but in this case I think it's OK...

@ac000
Copy link
Member

ac000 commented Jun 11, 2024

Hi @hongzhidao

You can add my

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>

to

http: Move chunked buffer pos pointer while parsing

Comment on lines 939 to 949
tmp_path = NULL;
tmp_name.length = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there was comment that went along with this

-        /* This initialization required for CentOS 6, gcc 4.4.7. */
-        tmp_path = NULL;
-        tmp_name.length = 0;

Is this still required?

The oldest version of GCC we may currently support is 4.7.x for C11/GNU11

4.6.x did have some preliminary support, but

A fourth version of the C standard, known as C1X, is under development; GCC has limited preliminary support for parts of this standard, enabled with -std=c1x.

not via -std=c11/gnu11

4.7.x was the first to introduce -std=c11/gnu11

A fourth version of the C standard, known as C11, was published in 2011 as ISO/IEC 9899:2011. GCC has limited incomplete support for parts of this standard, enabled with -std=c11 or -std=iso9899:2011. (While in development, drafts of this standard version were referred to as C1X.)

Having said all that, I think the oldest version of GCC we actively use/used is 4.8.5 from CentOS 7 which is essentially EOL now...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still required?

I removed it. I feel it's needed in the past because of the logic like this:

if (a) {
    b = alloc1();
} else {
    /* This initialization required for CentOS 6, gcc 4.4.7. */
    b = alloc2()
}

if (b == NULL) {
    return NULL;
}

if (a) {
    use b;
} else {
    use b;
}

I merged the two condition branches, something like this:

if (a) {
    b = alloc1();
    if (b == NULL) {
        return NULL;
    }

    /* more logic will be added in the subsequent patch.
    use b;

} else {
    b = alloc2();
    if (b == NULL) {
        return NULL;
    }
}

But unfortunately, I can't reproduce the warning in the current code.
I feel it's impossible to cause such a warning with the new change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For interest...

Current GCC, 14.x, at least doesn't complain if you remove those dummy initialisations, so it got smarter...

Also just for clarity, the problem was

if (body_length > body_buffer_size) {
    tmp_path = ...
    tmp_name.length = ...
    b = ...
} else {
    b = ...
}

if (body_length > body_buffer_size) {
    tmp_name.start = ...
    ...
    b->mem.start += sizeof(nxt_file_t) + tmp_name.length + 1;
    ...
}

I guess older GCC at least couldn't match those two if () statements and so couldn't tell that those two variables had to be already initialised by the time we enter the second if () statement.

After your refactoring this is a complete non-issue anyway as we no longer make any reference to those two variables after the initital if () statement.

In fact those two variable should be declared in the if () statement

diff --git ./src/nxt_h1proto.c ./src/nxt_h1proto.c
index bb8022d5..40e865af 100644
--- ./src/nxt_h1proto.c
+++ ./src/nxt_h1proto.c
@@ -861,7 +861,6 @@ nxt_h1p_request_body_read(nxt_task_t *task, nxt_http_request_t *r)
 {
     size_t             size, body_length, body_buffer_size, body_rest;
     ssize_t            res;
-    nxt_str_t          *tmp_path, tmp_name;
     nxt_buf_t          *in, *b, *out, *chunk;
     nxt_conn_t         *c;
     nxt_h1proto_t      *h1p;
@@ -903,6 +902,8 @@ nxt_h1p_request_body_read(nxt_task_t *task, nxt_http_request_t *r)
                                body_length);
 
     if (r->chunked || (body_length > body_buffer_size)) {
+        nxt_str_t  *tmp_path, tmp_name;
+
         tmp_path = &r->conf->socket_conf->body_temp_path;
 
         tmp_name.length = tmp_path->length + tmp_name_pattern.length;
@@ -945,9 +946,6 @@ nxt_h1p_request_body_read(nxt_task_t *task, nxt_http_request_t *r)
         unlink((char *) tmp_name.start);
 
     } else {
-        tmp_path = NULL;
-        tmp_name.length = 0;
-
         b = nxt_buf_mem_alloc(r->mem_pool, body_buffer_size, 0);
         if (nxt_slow_path(b == NULL)) {
             status = NXT_HTTP_INTERNAL_SERVER_ERROR;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, thanks.

In fact those two variable should be declared in the if () statement

I'm fine with both the two ways, I'll keep it the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think leaving it as is makes sense...

Previously those dummy initialisations were to stop old versions of GCC from complaining [-Wuninitialized].

There was also a comment explaining this.

Your code re-jigs things, keeping the (now totally unnecessary) dummy initialisations but removing the comment explaining why they are there.

So you can see we are now in an undesirable situation.

The options in increasing order of correctness...

  1. Put the comment back.

However the comment will need updating as it no longer makes sense...

a) It talks about a compiler version that simply cannot be used to build Unit now. You would need to check if it compiles on CentOS 7 with gcc 4.8.5 with those dummy initialisations removed without complaining or not to know whether the comment is even required.

b) However (a) above is a completely pointless exercise as it won't complain now in any case because of the next point.

  1. Simply remove those dummy initialisations. They serve absolutely no purpose. Those two variables are only used in that first if () branch.

  2. Do (2) but also reduce the scope of those two variables to that if () branch as I showed in the above patch.

So really the options are (2) or (3) with (3) being the obviously correct one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Updated with the way (3).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good, though I'm not sure why you did it in the last commit ("http: Support chunked request bodies") and not this one ("http: Refactored nxt_h1p_request_body_read()")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I did it in the wrong commit.
I have updated it, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, that patch looks much better now! :)

You can add my

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>

to that as well.

@hongzhidao
Copy link
Contributor Author

hongzhidao commented Jun 12, 2024

Hi @hongzhidao

You can add my

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>

to

http: Move chunked buffer pos pointer while parsing

Added, and I wonder if this means the review on the commit has been done when the Reviewed-by tag is asked to be added?
If yes, I'll add the same tag on the other commits after they are reviewed.

@ac000
Copy link
Member

ac000 commented Jun 12, 2024

Why remove the link to the chunked issue?\

EDIT: Hmm, I guess because this won't be enabled by default...

@ac000
Copy link
Member

ac000 commented Jun 12, 2024

Hi @hongzhidao
You can add my

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>

to

http: Move chunked buffer pos pointer while parsing

Added, and I wonder if this means the review on the commit has been done when the Reviewed-by tag is asked to be added? If yes, I'll add the same tag on the other commits after they are reviewed.

It means I have reviewed that particular patch. I'll notify you the same for the others...

@hongzhidao
Copy link
Contributor Author

hongzhidao commented Jun 13, 2024

Why remove the link to the chunked issue?\

EDIT: Hmm, I guess because this won't be enabled by default...

I have no idea why it's removed, but I switched the PR status from draft to open.

@hongzhidao hongzhidao force-pushed the req-chunked branch 4 times, most recently from 2493b29 to 63015aa Compare June 17, 2024 07:45
Comment on lines 999 to 1000
if ((size_t) b->file_end
> r->conf->socket_conf->max_body_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the > should be aligned to be under the second ( (i.e pull it back one character) as it's really a continuation of the opening (.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen anything like this before, I'm fine with it.
But it reminds me to go the other way. Recheck it, please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply meant this

+                    if ((size_t) b->file_end
+                        > r->conf->socket_conf->max_body_size)

as if the cast wasn't there...

But you've changed it and it doesn't need splitting over two lines now anyway...

Comment on lines +1012 to +1014
size = nxt_min(size, body_length);
res = nxt_fd_write(b->file->fd, in->mem.pos, size);
if (nxt_slow_path(res < (ssize_t) size)) {
status = NXT_HTTP_INTERNAL_SERVER_ERROR;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is what the previous code was doing, but this seems a little fragile, unless the write is happening with suspended signals? (same with the write above...)

If this is happening with signals blocked then the following can be ignored.

To quote the write(2) man-page

If a write() is interrupted by a signal handler before any bytes are
written, then the call fails with the error EINTR; if it is interrupted
after at least one byte has been written, the call succeeds, and re‐
turns the number of bytes written.

The usual thing is to do the write(2) in a loop and just continue the write again on EINTR (or until all bytes are written barring other errors)..

But this is not really in the scope of this patch and would likely be done as an enhancement to nxt_fd_write()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, "safe-write" can be an enhancement to nxt_fd_write().

size = nxt_min(size, body_length);
size = nxt_min(body_buffer_size, size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I had a double take at this!

The only thing I might suggest is to nest them, i.e

size = nxt_min(size, nxt_min(body_length, body_buffer_size));

I think that better represents the intention and looks less like a mistake!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this can be like this:

size = nxt_min(size, body_length);
size = nxt_min(body_buffer_size, size);

=>

size = nxt_min(body_buffer_size, size);

Since body_buffer_size can't be greater than body_length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better!

@ac000
Copy link
Member

ac000 commented Jun 19, 2024

OK @hongzhidao good stuff!

I think you can add my

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>

to that last commit now...

Previously, Unit didn't move the buffer pointer when parsing chunked
data because the buffer was not used after sent. It's used for upstream
response. With the new requirement to support request chunked body,
the buffer might be used for pipeline request, so it's necessary to move
the pos pointer while parsing.

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>
It's prepared for the subsequent patch.

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>
This is a temporary support for chunked request bodies by converting
to Content-Length. This allows for processing of such requests until
a more permanent solution is developed.

A new configuration option "chunked_transform" has been added to enable
this feature. The option can be set as follows:
  {
    "settings": {
      "chunked_transform": true
    }
  }
By default, this option is set to false, which retains the current
behaviour of rejecting chunked requests with a '411 Length Required'
status code.

Please note that this is an experimental implementation.

Reviewed-by: Andrew Clayton <a.clayton@nginx.com>
@hongzhidao hongzhidao merged commit 64f4c78 into nginx:master Jun 21, 2024
21 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants