Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

listener: remove the peek from the listener filters #17395

Merged
merged 177 commits into from Mar 29, 2022

Conversation

soulxu
Copy link
Member

@soulxu soulxu commented Jul 18, 2021

Commit Message: listener: remove the peek from the listener filters
Additional Description:
To avoid the listener filters to peek the data from the socket directly, allow the ActiveTcpSocket to peek the data
into the buffer. The ActiveTcpSocket will get the max data expected from the filters through the
filter interface size_t maxReadBytes(). Then ActiveTcpSocket will listen on the file event on the socket.
And peek the data into the buffer. And calling the listener filter callback onData(Buffer::Instance&).

If the filter doesn't need any data from the connection, it should return 0 on the interface 'maxReadBytes()'.
The ActiveTcpSocket won't listen on the file event if there is no filter expecting any data.

The buffer will be shared across the listener filter. no duplicated peek need anymore.

The buffer is implemented by ListenerFilterBufferImpl. It will store the data peeked from the socket, and provide
the const pointer to the listener filter to access the data. It also provides the interface for listener filter to drain the data,
and the data will be drained on the socket in the end.

Risk Level: high
Testing: unit tests and integration tests
Docs Changes: add document for the new stats
Release Notes: add release note for new stats
Fixes: #17229

Signed-off-by: He Jie Xu hejie.xu@intel.com

@soulxu
Copy link
Member Author

soulxu commented Jul 18, 2021

cc @mattklein123 @ggreenway @antoniovicente @lambdai

I created this PoC for showing the interface design, so I didn't work out the unittest/functional test since I thought there will be some discussion on the interface design. But if you guys think the test should be work out, I will work on it.

@soulxu
Copy link
Member Author

soulxu commented Jul 18, 2021

/wait

@ggreenway
Copy link
Contributor

I like the direction this is going. I'd like to take it even further, and do a read, not a peek. Then the listener filter can drain data from the buffer if it wants to consume the data (proxy proto listener filter needs to do this), or just look at the contents of the buffer if it only wants to inspect. At the end of the listener filters, pass the buffer (probably non-empty) along with already-read data.

If we went that route, then callback name just becomes onData() like in a network filter.

@ggreenway ggreenway self-assigned this Jul 20, 2021
@soulxu
Copy link
Member Author

soulxu commented Jul 20, 2021

I like the direction this is going. I'd like to take it even further, and do a read, not a peek. Then the listener filter can drain data from the buffer if it wants to consume the data (proxy proto listener filter needs to do this), or just look at the contents of the buffer if it only wants to inspect. At the end of the listener filters, pass the buffer (probably non-empty) along with already-read data.

If we went that route, then callback name just becomes onData() like in a network filter.

thanks for your review! l will deep into the way of read instead of peek

@lambdai
Copy link
Contributor

lambdai commented Jul 21, 2021

I like the direction this is going. I'd like to take it even further, and do a read, not a peek. Then the listener filter can drain data from the buffer if it wants to consume the data (proxy proto listener filter needs to do this), or just look at the contents of the buffer if it only wants to inspect. At the end of the listener filters, pass the buffer (probably non-empty) along with already-read data.

If we went that route, then callback name just becomes onData() like in a network filter.

+1

The current duplicate peek includes independent peek among listener filters which you are addressing.
Another potential huge part is the duplicated peek attempt within each listener filter.
The latter one even introduce the risk of 1 extra byte per peek. Not an vulnerability yet but it's can be mitigated with the buffer shared by listener filter attempt.

@soulxu
Copy link
Member Author

soulxu commented Aug 6, 2021

@ggreenway @lambdai I update the PoC, using read instead of peek. But it is the same as previous the buffer kept in the ActiveTcpSocket. In the end, the buffer will transfer to the ConnectionImpl.

Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach a lot! This will make the listener filter interface so much nicer.

The biggest thing left to figure out is getting the data that is read for listener filters to go as input to the Transport Socket, not directly into the ConnectionImpl read buffer.

/wait

envoy/event/dispatcher.h Outdated Show resolved Hide resolved
envoy/network/filter.h Outdated Show resolved Hide resolved
source/common/event/dispatcher_impl.h Outdated Show resolved Hide resolved
source/common/network/connection_impl.cc Outdated Show resolved Hide resolved
source/server/active_tcp_socket.cc Outdated Show resolved Hide resolved
@soulxu
Copy link
Member Author

soulxu commented Aug 6, 2021

I like this approach a lot! This will make the listener filter interface so much nicer.

The biggest thing left to figure out is getting the data that is read for listener filters to go as input to the Transport Socket, not directly into the ConnectionImpl read buffer.

/wait

Thanks for the feedback! I will dig into the transport socket.

@soulxu
Copy link
Member Author

soulxu commented Aug 13, 2021

I like this approach a lot! This will make the listener filter interface so much nicer.

The biggest thing left to figure out is getting the data that is read for listener filters to go as input to the Transport Socket, not directly into the ConnectionImpl read buffer.

I update the PoC. I found we have a custom BIO for the SSL socket. And I think all the transport sockets are using ioHandle to read the data.

BIO* bio = BIO_new_io_handle(&callbacks_->ioHandle());
SSL_set_bio(rawSsl(), bio, bio);

So I inject the data into the ioHandle. When read the data from ioHandle, it will return the injected data first, then ready the data from the real socket.

Then the data read by the listener filter will be as an input to all the transport socket.

@soulxu
Copy link
Member Author

soulxu commented Aug 17, 2021

@mattklein123 @ggreenway I find out a way to move the inspect data into transport, looking for your feedback when you have time, thanks in advance!

Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on this; I've been thinking through these changes.

I think putting the data into the IoHandle makes sense, because then none of the transport sockets need to change. But I'm not thrilled with any IoHandle being able to have data injected to it at any time. Maybe just documenting the usage and adding ASSERTs is good enough. Another idea (not sure if this would work or not) is to have an IoHandle that wraps the real IoHandle, stores the listener-filter data, returns that data from read operations, and passes all other interface calls to the real/wrapped IoHandle (including reads when the buffer is empty). I'm not sure if that will actually be an improvement or not, but it would at least mean that all of these changes aren't present for outgoing/upstream connections.

@mattklein123 what are your thoughts on these interfaces?

There are some other comments I'll have regarding the implementation, but I want to settle the interfaces somewhat before spending time on that.

/wait-any

envoy/network/connection.h Outdated Show resolved Hide resolved
@mattklein123
Copy link
Member

mattklein123 commented Aug 19, 2021

Thanks for working on this. I think this will be a great improvement. Here are my high level thoughts, riffing a bit on some of @ggreenway ideas. I think the way I would think about this at a high level would look something like:

  1. I think I would modify the virtual ConnectionSocket& socket() PURE; ListenerFilterCallbacks function to return something other than ConnectionSocket, which is really the root of the problem (we are giving listener filters way too much power). I think I would create a new class called something like ListenerFilterSocket with a very limited set of functions to access the underlying socket/connection.
  2. I like your idea around having each listener filter say how much data it needs to look at, but I would go a step further and have each listener filter say whether it only needs to peek the data, or if it needs to modify it. Most listener filters only need to peek the data. In these cases, you don't actually have to read the data out of the socket and do the injection dance. You can just do the peeking outside the listener filter iteration and provide a const buffer of the peeked data up to the limit, much like each filter is doing today.
  3. In the cases where the listener filter asks for data to modify up to a limit, we would read out into a buffer, but I think we can do this entirely outside of the IoHandle() interface, and then provide this data to listener filters in their read functions (potentially both a const and non-const version depending on what they ask for).

It's possible that we can greatly simplify this if we really only have 2 cases which are:

  1. Peeking only.
  2. Removal only (proxy proto)

Do we have any cases of straight up modification? If not I think we can punt this and at that point we don't need to inject at all. We either peek only read and remove only.

If we do have to modify I think we could then have the logic to send the modified data + potential peeked data to further filters, and then finally inject any extra data directly into the connection when it is created.

^ is not fully formed but this is how I would think about it personally. @ggreenway wdyt?

@ggreenway
Copy link
Contributor

I'm not sure we can make removing data work without passing some read data to the next filters, because we need to read the data to know exactly how many bytes we need to consume.

But I suppose a hybrid of these approaches is to always pass data to the listener filter via peek, but give the listener filter a callback to drain some number of bytes from the connection. The implementation of that would be a read() call, but discarding the data (since it has already been inspected).

I've never really liked using PEEK because if the data arrives in multiple segments, we have to re-read data multiple times. For something like a large TLS clienthello, sent pathologically (1 byte at a time), this can add up to a substantial amount of wasted effort. But it does avoid having to muck with IoHandle at all, which is nice. I think the most important thing at this point is to get the interface to the ListenerFilter correct; the interface shouldn't depend on our underlying implementation of how we read from sockets.

@soulxu
Copy link
Member Author

soulxu commented Aug 20, 2021

I think putting the data into the IoHandle makes sense, because then none of the transport sockets need to change. But I'm not thrilled with any IoHandle being able to have data injected to it at any time. Maybe just documenting the usage and adding ASSERTs is good enough. Another idea (not sure if this would work or not) is to have an IoHandle that wraps the real IoHandle, stores the listener-filter data, returns that data from read operations, and passes all other interface calls to the real/wrapped IoHandle (including reads when the buffer is empty). I'm not sure if that will actually be an improvement or not, but it would at least mean that all of these changes aren't present for outgoing/upstream connections.

Yes, totally agree with you. It is bad we have injection but we don't use it for some place. It is a great idea to have wrapped IoHandle for listener. If we are going to the road to use read, I will try the way you suggested.

@soulxu
Copy link
Member Author

soulxu commented Aug 20, 2021

  1. I think I would modify the virtual ConnectionSocket& socket() PURE; ListenerFilterCallbacks function to return something other than ConnectionSocket, which is really the root of the problem (we are giving listener filters way too much power). I think I would create a new class called something like ListenerFilterSocket with a very limited set of functions to access the underlying socket/connection.

+1, totally agree that we shouldn't expose the underlayer to the listner filter

  1. I like your idea around having each listener filter say how much data it needs to look at, but I would go a step further and have each listener filter say whether it only needs to peek the data, or if it needs to modify it. Most listener filters only need to peek the data. In these cases, you don't actually have to read the data out of the socket and do the injection dance. You can just do the peeking outside the listener filter iteration and provide a const buffer of the peeked data up to the limit, much like each filter is doing today.
  2. In the cases where the listener filter asks for data to modify up to a limit, we would read out into a buffer, but I think we can do this entirely outside of the IoHandle() interface, and then provide this data to listener filters in their read functions (potentially both a const and non-const version depending on what they ask for).

If the listener filter ask for data modifying, then we still need the data injection dance, right?

It's possible that we can greatly simplify this if we really only have 2 cases which are:

  1. Peeking only.
  2. Removal only (proxy proto)

Do we have any cases of straight up modification? If not I think we can punt this and at that point we don't need to inject at all. We either peek only read and remove only.

@lambdai mentioned the data modification in the issue, maybe he has any usecase.

If no data modification usecase, I guess you mean it back to the initial version of this PoC, peek the data, and an interface for listener to tell how much data they want to remove from the socket. Then using the read to drain the data from the socket.

I kind of like the current interface onData(Buffer::Instance), since it feels simple and clear for the listener filter writer. We needn't another interface for how much the data we want to remove. The filter only operates on the Buffer::Instance, then that is what will get for network filters, I feel it is straight forward for the listener filter developer.

If we do have to modify I think we could then have the logic to send the modified data + potential peeked data to further filters, and then finally inject any extra data directly into the connection when it is created.

emm...so we still need the inject data for modifying. I'm not sure about the benefit of hybrid the peek and read here, it feels like a more complex interface and implementation. maybe I missed some points here.

@soulxu
Copy link
Member Author

soulxu commented Mar 25, 2022

@soulxu, @ggreenway is out this week. We will get this merged next week. Thank you for your patience.

got it, thanks!

@adisuissa
Copy link
Contributor

@ggreenway @yanavlasov gentle ping :)

@soulxu DCO seems to have some issue, can you PTAL?

@soulxu
Copy link
Member Author

soulxu commented Mar 29, 2022

Seems stuck at somewhere

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Check envoy-presubmit didn't fail.

🐱

Caused by: a #17395 (comment) was created by @soulxu.

see: more, trace.

@ggreenway
Copy link
Contributor

I don't think you can re-trigger DCO-bot. Can you push an empty commit, or merge main and push, to re-trigger DCO-bot?

Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for the all the work you've put into this!

@mattklein123
Copy link
Member

I can just force merge as all the commits have sign off. Thank you!

@mattklein123 mattklein123 merged commit 93b0a4e into envoyproxy:main Mar 29, 2022
@soulxu
Copy link
Member Author

soulxu commented Mar 30, 2022

This looks great! Thanks for the all the work you've put into this!

@ggreenway @mattklein123 thanks for the guidance again!

@moderation
Copy link
Contributor

I thought it might be due to my upgrade to Bazel 5.1.0 but I'm now getting a MacOS M1 aarch compilation error on a file changed by this PR

ERROR: /Users/moderation/Applications/envoyproxy/envoy/source/extensions/filters/listener/tls_inspector/BUILD:15:17: Compiling source/extensions/filters/listener/tls_inspector/tls_inspector.cc failed: (Aborted): wrapped_clang_pp failed: error executing command external/local_config_cc/wrapped_clang_pp '-D_FORTIFY_SOURCE=1' -fstack-protector -fcolor-diagnostics -Wall -Wthread-safety -Wself-assign -fno-omit-frame-pointer -g0 -O2 -DNDEBUG ... (remaining 141 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox
In file included from source/extensions/filters/listener/tls_inspector/tls_inspector.cc:1:
./source/extensions/filters/listener/tls_inspector/tls_inspector.h:83:12: error: virtual function 'maxReadBytes' has a different return type ('uint64_t' (aka 'unsigned long long')) than the function it overrides (which has return type 'size_t' (aka 'unsigned long'))
  uint64_t maxReadBytes() const override { return config_->maxClientHelloSize(); }
  ~~~~~~~~ ^
./envoy/network/filter.h:347:18: note: overridden virtual function is here
  virtual size_t maxReadBytes() const PURE;
          ~~~~~~ ^
1 error generated.
Error in child process '/usr/bin/xcrun'. 1
Target //source/exe:envoy-static.stripped failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 407.055s, Critical Path: 50.81s
INFO: 4027 processes: 1165 internal, 2862 darwin-sandbox.
FAILED: Build did NOT complete successfully

@soulxu
Copy link
Member Author

soulxu commented Mar 30, 2022

source/extensions/filters/listener/tls_inspector/tls_inspector.cc

Yea, it should be the size_t, let me fix it. Thanks for pointing this out!

@kyessenov
Copy link
Contributor

I suspect this is causing issues with Istio import. The following minimal bootstrap is now not performing TCP proxying:

node:
  id: test
static_resources:
  listeners:
    name: client
    traffic_direction: OUTBOUND
    address:
      socket_address:
        address: 127.0.0.1
        port_value: 9090
    listener_filters:
    - name: "envoy.filters.listener.tls_inspector"
    - name: "envoy.filters.listener.http_inspector"
    filter_chains:
    - filters:
      - name: tcp_proxy
        typed_config:
          "@type": type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
          value:
            stat_prefix: outbound
            cluster: outbound
  clusters:
  - name: outbound
    type: STATIC
    load_assignment:
      cluster_name: outbound
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 9091

Every connection is terminated. Seems like some bug got introduced into HTTP inspector.

@kyessenov kyessenov mentioned this pull request Apr 7, 2022
@mattklein123
Copy link
Member

@kyessenov not surprised there would be a regression here. Feel free to revert until we sort it out. Thank you. cc @soulxu

@kyessenov
Copy link
Contributor

I looked at the PR, and it didn't change cb.socket().close() statement after unsuccessful parse so I'm puzzled how this worked before. Will try to compare more.

@ggreenway
Copy link
Contributor

Given this is something that istio depends on, can you create an integration test that captures the failure? Both to help diagnose this issue, and to prevent future regressions. It may be a failure when there are two listener filters; i don't think we have much coverage for that situation at all.

@kyessenov
Copy link
Contributor

I can confirm it's this PR for the test case above. It's just one filter I think - before the socket close wasn't doing anything somehow.

@kyessenov
Copy link
Contributor

I think I understand what happened - prior version was signaling ParseError on read errors. The refactor conflated the actual parse error and continued closing the socket.

@mattklein123
Copy link
Member

OK if you want to fix forward we can do that or revert for now. Up to you.

@kyessenov
Copy link
Contributor

Although this looks like a one liner fix to remove socket.close() and return Continue, let's revert, and an integration test for HTTP (which is missing) and then validate again.

@soulxu
Copy link
Member Author

soulxu commented Apr 7, 2022

@kyessenov Thanks for diagnosing this, let me fix it.

soulxu added a commit to soulxu/envoy that referenced this pull request Apr 8, 2022
Signed-off-by: He Jie Xu <hejie.xu@intel.com>
// As per discussion in https://github.com/envoyproxy/envoy/issues/7864
// we don't add new enum in FilterStatus so we have to signal the caller
// the new condition.
cb.socket().close();
cb_->socket().close();
return Network::FilterStatus::StopIteration;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be continue, and not close the socket.

EXPECT_CALL(socket_, close());
file_event_callback_(Event::FileReadyType::Read);
auto status = filter_->onData(*buffer_);
EXPECT_EQ(status, Network::FilterStatus::StopIteration);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong, I changed the original test behavior.

ravenblackx pushed a commit to ravenblackx/envoy that referenced this pull request Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet