Enhance ext_proc filter to support MXN streaming #34942

yanjunxiang-google · 2024-06-26T18:43:30Z

This PR is for issue: #32090. One of the use case is, like compression by the external processing.

This is to let the ext_proc server be able to buffer M request body chunks from Envoy first, processing them, then send N chunks back to Envoy in the STREAMED mode. It also let the server buffer the entire message, i.e, header, body, trailer, before sending back any response.

The ext_proc MXN streaming works this way:

Enable the MXN streaming by configuring the body mode to be BIDIRECTIONAL_STREAMED in the ext_proc filter config.
Config the trailer mode to be SEND in the ext_proc filter config.

With above config, Envoy will send body to the ext_proc server as they arrival. The server can buffer the entire or partial of the body (M chunks) then streaming the mutated body(may need to split into N chunks), back to Envoy.

…r one received chunk Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

repokitteh-read-only · 2024-06-26T18:43:39Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @mattklein123
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #34942 was opened by yanjunxiang-google.

see: more, trace.

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

yanjunxiang-google · 2024-07-11T02:08:26Z

/assign @gbrail @htuch @jmarantz @tyxia @yanavlasov

repokitteh-read-only · 2024-07-11T02:08:31Z

@gbrail cannot be assigned to this issue.

🐱

Caused by: a #34942 (comment) was created by @yanjunxiang-google.

see: more, trace.

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

KBaichoo · 2024-07-16T13:49:05Z

/assign @tyxia

As codeowner for first pass.

jmarantz · 2024-11-05T14:54:23Z

/wait

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

jmarantz · 2024-11-06T13:37:38Z

/wait (for #36999)

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

yanjunxiang-google · 2024-11-07T13:23:57Z

Kind Ping!

jmarantz

flushing comments.

source/extensions/filters/http/ext_proc/ext_proc.cc

jmarantz

I haven't read the tests yet. Are we hitting all the corner cases?

To answer this I'd look at a coverage map generated by CI, which you can do with a few (non-obvious) clicks.

jmarantz · 2024-11-07T19:56:42Z

source/extensions/filters/http/ext_proc/processor_state.cc

-  if (!message_timer_) {
-    message_timer_ = filter_callbacks_->dispatcher().createTimer(cb);
+
+  if (bodyMode() != ProcessingMode::FULL_DUPLEX_STREAMED) {


please comment how the next processing step occurs in full duplex mode.

I think that comment reflects what the code does (which is clear enough already from reading the code), but not for what the plan for how the next step will be.

source/extensions/filters/http/ext_proc/processor_state.cc

jmarantz · 2024-11-07T20:03:03Z

source/extensions/filters/http/ext_proc/processor_state.cc

+      return ProcessorState::CallbackState::TrailersCallback;
+    }
+  }
+
  return ProcessorState::CallbackState::Idle;
 }



handleHeadersResponse is too big to comprehend. It wil be hard to know whether the change you made might have the desired effect, and no undesired ones.

WDYT of breaking this one up also?

There are unit tests and integration tests are added special for the code change in this function.
Sure, due to historical reason, there are quite some technical debt in the ext_proc filter state machine code. I added a TODO here. Let's take care of these technical debts in separate follow up PRs

By technical debt (by that, do you mean "lack of test coverage")?

I think it would be better to get the test coverage really solid before adding a lot of complexity. THis stuff is really complicated to read, and it would help a lot if we at least had confidence in all the code getting covered in tests.

For test coverage, we should be good. These are the test cases we added for this new body mode:

Integration tests:

Server buffer headers, whole body, before sending response.

Server buffer headers, whole body, and trailers, before sending response.

Server buffer headers, and certain amount of body, then send send body response without wait of the end of it. At same time new body is coming in, and server continue do this kind of buffer-processing-response for a while. Then eventually trailers come in. Then server sends last chunk body response, and trailer response.

Unit tests:

Client sends header and body. Server send header response once receive header request, i.e, not waiting for body.

Client sends header and trailer, no body. Server sends header response after receiving trailer.

For one HTTP stream , server do MxN processing for some chunks, then do 1x1(i.e, send one response for one request immediately) for some chunks, then do MxN again.

A couple of server misbehaving test cases

A couple of Envoy misconfiguration test cases.

These tests are trying to cover different scenarios, like client requests may or may not have body, may or may not have trailer. Server sends header response may or may not wait for body, may or may not wait for trailers, may or may not buffer, et.al.

Are we hitting all the lines in the coverage report?

In terms of test coverage for ext_proc code in a whole, I recall it meets the Envoy criteria, like >96.3%. And also the ext_proc fuzzer coverage is > 70%

That's good, but I'd still like you to look at the line coverage and see if we are hitting all your new code.

Yes, based on this LCOV report, the new code are tested:
https://storage.googleapis.com/envoy-pr/8526639/coverage/source/extensions/filters/http/ext_proc/index.html

source/extensions/filters/http/ext_proc/processor_state.cc

source/extensions/filters/http/ext_proc/processor_state.h

jmarantz · 2024-11-07T20:18:54Z

test/extensions/filters/http/ext_proc/BUILD

@@ -39,6 +39,7 @@ envoy_extension_cc_test(
    }),
    extension_names = ["envoy.filters.http.ext_proc"],
    rbe_pool = "2core",
+    shard_count = 8,


I really want to know why this test is so slow you need to use mutliple cores and 8 shards. I think I asked this before but didn't see the answer.

I responded this here: #34942 (comment)

Followed up on that thread but copying here:

I don't have objection to calling it large if it's large.

I just am surprised it takes this long and feel like we must be having some sleeps or something more complex than a unit test usually is, beyond having a number of test cases.

I quickly check the trace log timestamp. it looks the tests themselves are very fast. i.e, <5ms in a local setup. It's the test initialization consumed most of the time (>100ms in the same local setup). And this is the case for existing tests as well, like the very basic SimpestPost test:

envoy/test/extensions/filters/http/ext_proc/filter_test.cc

Line 624 in c414d28

TEST_F(HttpFilterTest, SimplestPost) {

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

adisuissa

/lgtm api

api/envoy/extensions/filters/http/ext_proc/v3/processing_mode.proto

tyxia

A few more comments regarding how to handle server response/behavior. Thanks for patience

tyxia · 2024-11-08T15:43:51Z

source/extensions/filters/http/ext_proc/processor_state.cc

-  if (!message_timer_) {
-    message_timer_ = filter_callbacks_->dispatcher().createTimer(cb);
+
+  // Skip starting timer For FULL_DUPLEX_STREAMED body mode.


Without timer, how we are going to handle timeout situation that side stream server doesn't respond?

If the side stream server does not respond, the router filter idle timeout will kick in, and destroy ext_proc filter. This will be same as a backend server does not respond to a client request.

tyxia · 2024-11-08T15:44:48Z

source/extensions/filters/http/ext_proc/processor_state.cc

+    ENVOY_LOG(debug, "Applying body response to chunk of data. Size = {}", chunk->length);
+    MutationUtils::applyBodyMutations(common_response.body_mutation(), chunk_data);
+  }
+  bool should_continue = chunk->end_stream;


What if the side stream server doesn't(or JUST forget to) respond with end_stream = true, look like this is not handled and will be stuck there?

Yes, if the side stream does not respond or never send end_of_stream to be true. ext_proc filter will keep waiting for for response, and eventually router filter timeout should kick in and destroy the ext_proc filter. This should be same as if backend server misbehaves.

I created this issue #37065 to track the work to add an integration test if the server failed to send response in time and router filter timeout.

This theory doesn't sound solid to me and it seems to make all existing individual filter's error handling pointless.

Besides, tightly coupling side stream error with router/upstream will introduce a bad observability and customer experience, as they are two different errors.

I think we should improve the error handling of this design (maybe extproc can have its own timeout)

Oh, I thought we agreed that no ext_proc timer for FULL_DUPLEX_STREAMED mode. And let the generic Envoy router idle timer(default 15s: ) to take care of the cases if server is not responding.
And this will make the server be able to buffer more data, and maybe all the way to the end_of_stream received. Adding an ext_proc specific timer will limit this capability.

source/extensions/filters/http/ext_proc/processor_state.cc

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

yanjunxiang-google · 2024-11-08T17:38:44Z

@adisuissa I did an upstream merge. It needs your API approval again. Thanks!

adisuissa

/lgtm api

tyxia · 2024-11-08T20:18:22Z

LGTM, as a WIP/good start.

I think those open comments above need to be addressed (and some more tests/loadtest) to complete this feature.

yanjunxiang-google added 5 commits February 2, 2024 16:26

Ext_proc: support ext_proc server sending multiple response chunks fo…

4517562

…r one received chunk Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

fixing format

c740967

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

merge upstream main

f74e72b

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

fix format

b7428dd

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

merge upstream main

22fecd2

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

yanjunxiang-google requested review from tyxia, mattklein123, htuch and yanavlasov as code owners June 26, 2024 18:43

yanjunxiang-google marked this pull request as draft June 26, 2024 18:43

repokitteh-read-only bot added the api label Jun 26, 2024

repokitteh-read-only bot assigned mattklein123 Jun 26, 2024

adding max_more_chunks configuration

a9bdf37

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

yanjunxiang-google changed the title ~~Streamed more chunks~~ Enhance ext_proc filter to support MxN streaming Jul 10, 2024

adding tests

37a5c59

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

yanjunxiang-google mentioned this pull request Jul 10, 2024

Proposal: Support for New Processing Mode in Ext-proc Filter for Large Payloads Processing #32090

Open

yanjunxiang-google added 3 commits July 10, 2024 18:31

fixing format

37f58f4

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

fixing format

69b18c9

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

merge upstream main

0026fb3

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

yanjunxiang-google changed the title ~~Enhance ext_proc filter to support MxN streaming~~ Enhance ext_proc filter to support M:N streaming Jul 10, 2024

yanjunxiang-google added 2 commits July 11, 2024 01:53

adding more error test cases

7afd135

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

fixing format

e73d858

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

yanjunxiang-google marked this pull request as ready for review July 11, 2024 02:07

fix format error

09cc7f3

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

mattklein123 assigned yanavlasov and unassigned mattklein123 Jul 12, 2024

repokitteh-read-only bot added the waiting label Nov 5, 2024

addressing comments

a9248da

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

repokitteh-read-only bot removed the waiting label Nov 5, 2024

ASSERT_TRUE -> EXPECT_TRUE change

cfab5fe

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

repokitteh-read-only bot added the waiting label Nov 6, 2024

yanjunxiang-google mentioned this pull request Nov 6, 2024

ext_proc: refactoring onData() to make it modularized #36999

Merged

merge upstream main also addressing comments

73cbd23

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

repokitteh-read-only bot removed the waiting label Nov 7, 2024

yanjunxiang-google added 2 commits November 7, 2024 12:35

clean up

f29b49e

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

rewording API doc

329f12c

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

jmarantz reviewed Nov 7, 2024

View reviewed changes

source/extensions/filters/http/ext_proc/ext_proc.cc Show resolved Hide resolved

source/extensions/filters/http/ext_proc/ext_proc.cc Show resolved Hide resolved

source/extensions/filters/http/ext_proc/ext_proc.cc Outdated Show resolved Hide resolved

jmarantz reviewed Nov 7, 2024

View reviewed changes

addressing comments

6400b71

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

yanjunxiang-google mentioned this pull request Nov 7, 2024

ext_proc: refactoringhandleHeadersResponse() to make it more modularized #37047

Open

adding TODOs

3f34ef0

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

adisuissa reviewed Nov 8, 2024

View reviewed changes

api/envoy/extensions/filters/http/ext_proc/v3/processing_mode.proto Show resolved Hide resolved

repokitteh-read-only bot removed the api label Nov 8, 2024

tyxia reviewed Nov 8, 2024

View reviewed changes

merge upstream main

e0fa45e

Signed-off-by: Yanjun Xiang <yanjunxiang@google.com>

repokitteh-read-only bot added the api label Nov 8, 2024

adisuissa reviewed Nov 8, 2024

View reviewed changes

repokitteh-read-only bot removed the api label Nov 8, 2024

yanavlasov approved these changes Nov 8, 2024

View reviewed changes

yanavlasov merged commit 72a2067 into envoyproxy:main Nov 8, 2024
21 checks passed

yanjunxiang-google deleted the streamed_more_chunks branch November 11, 2024 14:50

Enhance ext_proc filter to support MXN streaming #34942

Enhance ext_proc filter to support MXN streaming #34942

Conversation

yanjunxiang-google commented Jun 26, 2024 • edited Loading

repokitteh-read-only bot commented Jun 26, 2024

yanjunxiang-google commented Jul 11, 2024

repokitteh-read-only bot commented Jul 11, 2024

KBaichoo commented Jul 16, 2024

jmarantz commented Nov 5, 2024

jmarantz commented Nov 6, 2024

yanjunxiang-google commented Nov 7, 2024

jmarantz left a comment

Choose a reason for hiding this comment

jmarantz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanjunxiang-google Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adisuissa left a comment

Choose a reason for hiding this comment

tyxia left a comment • edited Loading

Choose a reason for hiding this comment

tyxia Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tyxia Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanjunxiang-google commented Nov 8, 2024

adisuissa left a comment

Choose a reason for hiding this comment

tyxia commented Nov 8, 2024 • edited Loading

yanjunxiang-google commented Jun 26, 2024 •

edited

Loading

yanjunxiang-google Nov 8, 2024 •

edited

Loading

tyxia left a comment •

edited

Loading

tyxia Nov 8, 2024 •

edited

Loading

tyxia Nov 8, 2024 •

edited

Loading

tyxia commented Nov 8, 2024 •

edited

Loading