http2: remove exceptions from H/2 codec #11575

yanavlasov · 2020-06-12T20:49:43Z

Commit Message:
Remove all throw statements from H/2 codec

Additional Description:
This change removed all uses of C++ exceptions from H/2 codec.
There are two cases that are important to consider in this change: processing incoming data and building outgoing data.
Processing of incoming data is done by the dispatch() method. The dispatch() method had the try/catch block that would handle all errors, thrown as exceptions, encountered during processing of incoming data by aborting the connection.
Building of outgoing data happens in multiple different contexts, such as sending health check requests, early responses or during processing of upstream responses. Only one context had the try/catch block - the processing of upstream response. In all other contexts a thrown exception would have caused abnormal process termination. The call sites in these cases are encode...() methods (+ a few others) which then call the sendPendingFrames() method to build outbound frames.

This change replaces all throw statements with a return of corresponding error Status and adds plumbing to return the status to codec callers.
The dispatch() method returns the encountered error to the caller, which will be handled accordingly.

The calls to the sendPendingFrames() NOT called from dispatch() method will RELEASE_ASSERT if an error code is returned. This does not alter the existing behavior of abnormally terminating the process, just the method of termination: RELEASE_ASSERT vs uncaught exception.

Risk Level: High (Codec changes)
Testing: Unit Tests
Docs Changes: N/A
Release Notes: N/A

Signed-off-by: Yan Avlasov <yavlasov@google.com>

asraa · 2020-06-15T16:59:30Z

source/common/http/http2/codec_impl.cc

@@ -208,7 +209,8 @@ void ConnectionImpl::StreamImpl::encodeTrailersBase(const HeaderMap& trailers) {
    pending_trailers_to_encode_ = createHeaderMap<HeaderMapImpl>(trailers);
  } else {
    submitTrailers(trailers);
-    parent_.sendPendingFrames();
+    auto status = parent_.sendPendingFrames();
+    RELEASE_ASSERT(status.ok(), "sendPendingFrames() failure in non dispatching context");


Can we ASSERT that dispatching_ != true in these methods?

Sadly I think this might not always be true. It's definitely possible that we can encode in the context of dispatch (like an immediate local reply). I think ultimately we are going to have to pivot on whether dispatching to decide how/when to handle errors in these functions.

Yes, some of these methods, such as encodeHeadersBase() can be called when dispatching data. Note that the sendPendingFrames() is a noop when dispatching_ != true and always returns OK status.

Note that the sendPendingFrames() is a noop when dispatching_ != true and always returns OK status.

You mean == true, right? Since in that case we send pending frames at the end of dispatch. The problem is when we are not in dispatch. That is when I think we have to latch all errors and save them for the next dispatch.

Oh, yes, it was a typo. dispatching_ == true of course.

source/common/http/http2/codec_impl.cc

alyssawilk · 2020-06-16T13:59:24Z

source/common/http/http2/codec_impl.cc

@@ -146,7 +146,8 @@ void ConnectionImpl::StreamImpl::encodeHeadersBase(const std::vector<nghttp2_nv>

  local_end_stream_ = end_stream;
  submitHeaders(final_headers, end_stream ? nullptr : &provider);
-  parent_.sendPendingFrames();
+  auto status = parent_.sendPendingFrames();
+  RELEASE_ASSERT(status.ok(), "sendPendingFrames() failure in non dispatching context");


I'm pretty concerned about adding a bunch of release asserts to the codec path.
Can we find another solution? I'm not against a TODO here if we have a plan. return the errors up and handle them or change sendPendingFrames so it say latches the error and handles it later or have two sendPendingFrames function, one called from the dispatching path, one called from the other?

#11503
Ideally would be replaced with ENVOY_BUGs

I think maybe separating dispatching context might actually be a good thing. It would ensure coverage over both paths, which I think has caused an issue before.

See https://github.com/envoyproxy/envoy-setec/pull/181 for other context on this. This whole situation is a total mess on many accounts. I actually think with the current code the only sane thing to do is to crash here, as in the current code this would be an uncaught exception. I think using ENVOY_BUG here would lead to undefined behavior.

As I mention in the other PR/issue, I think the only sane way to handle errors in non-dispatch context is to latch them as mentioned, and then handle them on the next dispatch. Another option (depending on context) would be to close the underlying connection and let the close event do its job.

I think the question is whether we want to try to fix ^ in this PR, or do it iteratively. I would be in favor of doing it iteratively.

I think iteratively is fine, we should add some clear TODOs explaining this.

I have looked through (I think) all call stacks to make sure that this RELEASE_ASSERT did not modify the existing behavior. It should just replace abnormal termination due to unhandled exception. Yes all this will be cleaned up as part of hardening against untrusted upstreams.
Current PoR is to latch the error for later handling. But this is a much bigger change.
Also this change will be flag protected, so if someone is hitting any of these RELEASE_ASSERTS they can go flip to old codecs with exceptions until the problem is resolved.

Yes all this will be cleaned up as part of hardening against untrusted upstreams.

Pedantically (and I know you know this), this is not just about untrusted upstreams. In the current code it's possible (I think) to hit flood exceptions in the out of dispatch encode cases. We need to protect against this also.

But yes I think we are all on the same page about latching.

Actually I want to come back to this because I'm still uncomfortable with the release assert. Let's say there's some corner case where sendPendingFrames fails unexpectedly. Would we rather have a write failure trigger a crash (and potentially cause an outage - very fast to debug but potentially expensive) or log an error and increment a stat and fail (no outage, but maybe have to spend a bunch of time figuring out why traffic for RandomClient wasn't working as expected). I know Envoy culture has tended towards the fast crash but I'd really prefer we err in the direction of stat and log. Or maybe we could rewrite sendPendingFrames to take an argument "am I being error checked" now and do exceptions once we've fixed whatever bug we think there might be?

I have not problem replacing RELEASE_ASSERT with that new macro we have added recently to do logging and stat increment. My gut feeling is that it might not be much of an improvement, given that it is very likely something else will crash elsewhere very soon as we are proceeding with processing data where before we would not. It is possible however that in some cases it would not crash, and it is worth reducing the likelihood of the worst outcome.

source/common/http/http2/codec_impl.cc

htuch

General approach looks good, just curious if there is any additional testing needed or if this is a pure refactor on all acounts.

mattklein123

Thanks in general this looks like the right approach. My suggestion per the comment thread is to do the minimal amount of changes necessary to match the current behavior, even though we know it's broken in various ways, then we can go back and fix the broken behavior.

/wait

source/common/http/http2/codec_impl.cc

Signed-off-by: Yan Avlasov <yavlasov@google.com>

yanavlasov · 2020-06-19T23:31:22Z

There should be no changes to unit tests as the change is purely internal to the codec and does not change the public API.

Signed-off-by: Yan Avlasov <yavlasov@google.com>

yanavlasov · 2020-06-22T15:58:52Z

The clang-tidy error is not related to my change but to a problem in the LinkedObject::moveIntoList

stale · 2020-07-03T04:24:18Z

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

stale · 2020-07-11T06:39:15Z

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Signed-off-by: Yan Avlasov <yavlasov@google.com>

yanavlasov · 2020-07-21T13:59:47Z

clang tidy errors are pre-existing

asraa · 2020-07-21T14:07:28Z

source/common/http/http2/codec_impl.cc

@@ -1070,25 +1107,40 @@ void ConnectionImpl::sendSettings(
  }
 }

+int ConnectionImpl::setAndCheckNghttp2CallbackStatus(Status&& status) {
+  nghttp2_callback_status_ = std::move(status);


ASSERT that the status was ok in case of overwriting

I'm unsure we can safely ASSERT this. I'm thinking that it might be possible that sendPendingFrames is called twice, i.e. by filters. I think the question is whether we want to keep the first error status or the last.

Signed-off-by: Yan Avlasov <yavlasov@google.com>

alyssawilk · 2020-07-23T17:40:16Z

source/common/http/http2/codec_impl.cc

@@ -146,7 +146,8 @@ void ConnectionImpl::StreamImpl::encodeHeadersBase(const std::vector<nghttp2_nv>

  local_end_stream_ = end_stream;
  submitHeaders(final_headers, end_stream ? nullptr : &provider);
-  parent_.sendPendingFrames();
+  auto status = parent_.sendPendingFrames();
+  RELEASE_ASSERT(status.ok(), "sendPendingFrames() failure in non dispatching context");


Actually I want to come back to this because I'm still uncomfortable with the release assert. Let's say there's some corner case where sendPendingFrames fails unexpectedly. Would we rather have a write failure trigger a crash (and potentially cause an outage - very fast to debug but potentially expensive) or log an error and increment a stat and fail (no outage, but maybe have to spend a bunch of time figuring out why traffic for RandomClient wasn't working as expected). I know Envoy culture has tended towards the fast crash but I'd really prefer we err in the direction of stat and log. Or maybe we could rewrite sendPendingFrames to take an argument "am I being error checked" now and do exceptions once we've fixed whatever bug we think there might be?

alyssawilk · 2020-07-23T18:23:01Z

source/common/http/http2/codec_impl.h

+  // Returns Ok Status on success or error if outbound queue limits were exceeded.
+  Status addOutboundFrameFragment(Buffer::OwnedImpl& output, const uint8_t* data, size_t length);
+  virtual Status checkOutboundQueueLimits() PURE;
+  Status incrementOutboundFrameCount(bool is_outbound_flood_monitored_control_frame);


What do we think of ABSL_MUST_USE_RESULT for these to make it clear (and compiler checked ) we're doing the right thing?

The whole Status class is marked like this here, so any time a method returns Status it is marked that it must be checked.

Signed-off-by: Yan Avlasov <yavlasov@google.com>

mattklein123

LGTM w/ two small comments/questions.

My personal view is that we should use RELEASE_ASSERT vs. ENVOY_BUG in this case, as crashing is better than undefined behavior (and consistent w/ what we have today), but will defer to all of you.

mattklein123 · 2020-07-24T18:12:13Z

test/common/http/http2/codec_impl_test.cc

@@ -1749,7 +1749,12 @@ TEST_P(Http2CodecImplTest, PingFlood) {
        buffer.move(frame);
      }));

-  EXPECT_THROW(client_->sendPendingFrames(), ServerCodecError);
+  // Legacy codec does not propagate error details and uses generic error message
+  EXPECT_THROW_WITH_MESSAGE(


nit: it seems odd to me that in the tests we still use exceptions. Would these be better written as a helper which pivots what it does based on new vs. old and then we can just clean up old later? I don't feel strongly about it.

This is the artifact of test construction as it uses codec to test codec. So when dispatch of the codec under test fails the stack has to be unwound through the client codec that was used to build the test wire bytes. The test really needs to be redesigned to avoid doing this. Note that these exceptions are specific to the test, they are not codec exceptions.

mattklein123 · 2020-07-24T18:13:36Z

source/common/http/http2/codec_impl.cc

+    // internal mechanisms. Most flood protection is done by Envoy's codec and this error
+    // should never be returned. However it is handled here in case nghttp2 has some flood
+    // protections that Envoy's codec does not have.
+    if (rc == NGHTTP2_ERR_FLOODED) {


Does this path have coverage? If not can you add a TODO and try to add it later? In general do we have good coverage on all of the new error handling?

No, we do not have test coverage for this case. We try to avoid tripping this case in the integration tests:

envoy/test/integration/http2_integration_test.cc

Line 1502 in dbbcc69

// nghttp2 library has its own internal mitigation for outbound control frames (see

It is a good question what should be done with internal nghttp2 flood protection. You could not disable it in the past (maybe you can now), so we just set the flood detection limits lower in Envoy. I could construct test for this specifically. Maybe not an integration test as it depends on being able to do a big read in one go, which is harder to control in the integration test.

No, we do not have any tests for error cases that have would end in RELEASE_ASSERT now. These would have to be added as RELEASE_ASSERTs are removed.

This reverts commit 74bf197. Signed-off-by: Yan Avlasov <yavlasov@google.com>

Signed-off-by: Yan Avlasov <yavlasov@google.com>

alyssawilk · 2020-07-29T19:29:14Z

FWIW I'm fine with the release asserts as long as it's off by default, so feel free to land this with me in absentia if we don't sort out clang tidy by then :-)

Signed-off-by: Yan Avlasov <yavlasov@google.com>

yanavlasov · 2020-07-30T19:55:11Z

I have unfortunately no idea how to fix the clang-tidy error. I checked command lines for clang-tidy and they appear to be correct and include path to the source/common/common/statusor.h. clang-tidy command lines for codec_impl.cc and codec_impl.h seems to be the same yet clang-tidy fails with this error for the .h file but passes for the .cc file. Searching did not help.
Also since this is compiler error you can not // NOLINT it.
I'm unsure what to do at this point.

Remove all throw statements from H/2 codec and replace error plumbing with Status and StatusOr objects. Signed-off-by: Yan Avlasov <yavlasov@google.com>

Remove all throw statements from H/2 codec and replace error plumbing with Status and StatusOr objects. Signed-off-by: Yan Avlasov <yavlasov@google.com> Signed-off-by: chaoqinli <chaoqinli@google.com>

Commit Message: Remove all throw statements from H/1 codec This change removed all uses of C++ exceptions from H/1 codec. I modeled the flow after Yan's H/2 work (#11575). Codec status are set in uniform helper methods. This is the only change from the previous PR (#11101), besides merging newer exceptions. This change replaces all throw statements with a return of corresponding error Status and adds plumbing to return the status to codec callers. The dispatch() method returns the encountered error to the caller, which will be handled accordingly. The calls to the RequestEncoder::encodeHeaders() NOT called from dispatch() method will RELEASE_ASSERT if an error code is returned. This does not alter the existing behavior of abnormally terminating the process, just the method of termination: RELEASE_ASSERT vs uncaught exception. Risk Level: High (Codec changes) Signed-off-by: Asra Ali <asraa@google.com>

yanavlasov added 6 commits June 11, 2020 15:36

H/2 codec: remove FrameFloodException

f4e28cd

Signed-off-by: Yan Avlasov <yavlasov@google.com>

Remove trailers without end stream error

08d90c4

Signed-off-by: Yan Avlasov <yavlasov@google.com>

Remove exception from sendPendingFrames

bdeccf1

Signed-off-by: Yan Avlasov <yavlasov@google.com>

Remove exception from the innerDispatch

713f094

Signed-off-by: Yan Avlasov <yavlasov@google.com>

Cleanup

07cfb26

Signed-off-by: Yan Avlasov <yavlasov@google.com>

Cleanup

8c96d58

Signed-off-by: Yan Avlasov <yavlasov@google.com>

yanavlasov assigned asraa, mattklein123, htuch and alyssawilk Jun 12, 2020

asraa reviewed Jun 15, 2020

View reviewed changes

alyssawilk reviewed Jun 16, 2020

View reviewed changes

htuch reviewed Jun 16, 2020

View reviewed changes

source/common/http/http2/codec_impl.cc Show resolved Hide resolved

htuch reviewed Jun 16, 2020

View reviewed changes

mattklein123 reviewed Jun 17, 2020

View reviewed changes

source/common/http/http2/codec_impl.cc Outdated Show resolved Hide resolved

repokitteh-read-only bot added the waiting label Jun 17, 2020

Address comments

b05fb3c

Signed-off-by: Yan Avlasov <yavlasov@google.com>

repokitteh-read-only bot removed the waiting label Jun 19, 2020

Address comments

b54c35c

Signed-off-by: Yan Avlasov <yavlasov@google.com>

mattklein123 added the waiting label Jun 22, 2020

asraa mentioned this pull request Jun 26, 2020

http1: remove exceptions from H/1 codec #11778

Merged

stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jul 3, 2020

stale bot closed this Jul 11, 2020

yanavlasov reopened this Jul 20, 2020

stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Jul 20, 2020

Merge branch 'master' into exceptions-1

b91be2a

Signed-off-by: Yan Avlasov <yavlasov@google.com>

Fix tests

a2f868e

Signed-off-by: Yan Avlasov <yavlasov@google.com>

repokitteh-read-only bot removed the waiting label Jul 21, 2020

yanavlasov changed the title ~~WiP http2: remove exceptions from H/2 codec~~ http2: remove exceptions from H/2 codec Jul 21, 2020

asraa reviewed Jul 21, 2020

View reviewed changes

yanavlasov added 2 commits July 22, 2020 12:31

Address comments

19039bc

Signed-off-by: Yan Avlasov <yavlasov@google.com>

Merge branch 'master' into exceptions-1

a0a4192

Signed-off-by: Yan Avlasov <yavlasov@google.com>

alyssawilk reviewed Jul 23, 2020

View reviewed changes

yanavlasov added 2 commits July 23, 2020 21:28

Use RETURN_IF_ERROR

7ac6e33

Signed-off-by: Yan Avlasov <yavlasov@google.com>

Replace RELEASE_ASSERT with ENVOY_BUG

74bf197

Signed-off-by: Yan Avlasov <yavlasov@google.com>

mattklein123 reviewed Jul 24, 2020

View reviewed changes

yanavlasov added 2 commits July 28, 2020 12:37

Revert "Replace RELEASE_ASSERT with ENVOY_BUG"

9d26d43

This reverts commit 74bf197. Signed-off-by: Yan Avlasov <yavlasov@google.com>

Merge branch 'master' into exceptions-1

8b9e5e7

Signed-off-by: Yan Avlasov <yavlasov@google.com>

Merge branch 'master' into exceptions-1

a252cd2

Signed-off-by: Yan Avlasov <yavlasov@google.com>

alyssawilk removed their assignment Jul 30, 2020

mattklein123 approved these changes Jul 30, 2020

View reviewed changes

yanavlasov merged commit a2ebf09 into envoyproxy:master Jul 31, 2020

yanavlasov deleted the exceptions-1 branch August 19, 2020 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http2: remove exceptions from H/2 codec #11575

http2: remove exceptions from H/2 codec #11575

yanavlasov commented Jun 12, 2020 •

edited

asraa Jun 15, 2020

mattklein123 Jun 17, 2020

yanavlasov Jun 19, 2020

mattklein123 Jun 20, 2020

yanavlasov Jun 22, 2020

alyssawilk Jun 16, 2020

asraa Jun 16, 2020 •

edited

mattklein123 Jun 16, 2020 •

edited

htuch Jun 16, 2020

yanavlasov Jun 20, 2020

mattklein123 Jun 20, 2020

alyssawilk Jul 23, 2020

yanavlasov Jul 23, 2020

yanavlasov Jul 24, 2020

htuch left a comment

mattklein123 left a comment

yanavlasov commented Jun 19, 2020

yanavlasov commented Jun 22, 2020

stale bot commented Jul 3, 2020

stale bot commented Jul 11, 2020

yanavlasov commented Jul 21, 2020

asraa Jul 21, 2020

yanavlasov Jul 21, 2020

alyssawilk Jul 23, 2020

alyssawilk Jul 23, 2020

yanavlasov Jul 23, 2020

mattklein123 left a comment

mattklein123 Jul 24, 2020

yanavlasov Jul 28, 2020

mattklein123 Jul 24, 2020

yanavlasov Jul 28, 2020

alyssawilk commented Jul 29, 2020

yanavlasov commented Jul 30, 2020 •

edited

http2: remove exceptions from H/2 codec #11575

http2: remove exceptions from H/2 codec #11575

Conversation

yanavlasov commented Jun 12, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asraa Jun 16, 2020 • edited

Choose a reason for hiding this comment

mattklein123 Jun 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

htuch left a comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

yanavlasov commented Jun 19, 2020

yanavlasov commented Jun 22, 2020

stale bot commented Jul 3, 2020

stale bot commented Jul 11, 2020

yanavlasov commented Jul 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alyssawilk commented Jul 29, 2020

yanavlasov commented Jul 30, 2020 • edited

yanavlasov commented Jun 12, 2020 •

edited

asraa Jun 16, 2020 •

edited

mattklein123 Jun 16, 2020 •

edited

yanavlasov commented Jul 30, 2020 •

edited