-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add access logging to tcp_proxy #1953
Conversation
fixes envoyproxy#1159 Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
…ges. Signed-off-by: Greg Greenway <ggreenway@apple.com>
This still needs tests and doc changes, but I wanted to make sure this is roughly the way we want to implement this. @mattklein123 |
77e6475
to
b96200a
Compare
Also not present is moving source/common/http/access_log/* to source/common/access_log/*, and re-namespacing all of those things out of Http. But I'll do that as a separate PR so as to not mix real changes and code movement. |
Signed-off-by: Greg Greenway <ggreenway@apple.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do the config changes in data plane api ?
source/common/json/config_schemas.cc
Outdated
}, | ||
"required" : ["path"], | ||
"additionalProperties" : false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one suggestion. Can we just do this in the v2 api protos, so that it starts becoming the one place where we add stuff to configs? I just synced the whole thing and I am hoping to not diverge anymore.
We could just copy the access log stuff in filter/http/ into filter/ or something along those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, tcp_proxy is configured via json, not via v2 APIs. Until that is converted, this is the only way to do it, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sucks! Atleast could you send a pr to data plane api to add the corresponding configuration bits? It’s fine if it’s duplicated for the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll do that. I wanted to get buy-in that this is the right approach. Once I have that, it should be a one-line change in the api (adding one config option to tcp_proxy, which uses the existing http-access-log type/message).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, making TCP proxy configurable via v2 is probably a relatively easy PR, comparable to #1952. The proto in data-plane-api
is already there.
case Protocol::Http2: | ||
return Http2String; | ||
const std::string& AccessLogFormatUtils::protocolToString(Optional<Protocol> protocol) { | ||
if (protocol.valid()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to make this a bit more generic like protocol is TCP/mongo/redis/etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was tricky because there's one filter that's using the request_info to figure out whether it is http1 or http2. I'd like to make this better, but I didn't want to have the scope of this increase too much. I mostly wanted to get the core of tcp access logging in, and then we can enable logging more and more as we go.
source/common/filter/tcp_proxy.h
Outdated
@@ -79,6 +82,7 @@ class TcpProxyConfig { | |||
|
|||
std::vector<Route> routes_; | |||
const TcpProxyStats stats_; | |||
std::vector<Http::AccessLog::InstanceSharedPtr> access_logs_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this for the access log system to collect the access logs? Or is this a handle to the access log handler? IF so, might be good to rename it to something appropriate or add a comment to that effect. On first read, it looks like its storing access log per connection, in a vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied this from http_connection_manager.h, and there's no comment there either. I think readers just need to look up the definition of the type to figure out what it is (as is the case with many things).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a handle to the access log. Feel free to rename / add comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks good to me. My main question is whether we should try to jam everything to the same interface. I haven't gone through all the fields in RequestInfo to have a real opinion here so would like to hear your opinion.
I agree with @rshriram that we should do the data-plane-api PR in parallel. Feel free to do that and then push a PR which uses a branch/fork there. It's often helpful to review both at the same time so we can all agree on things.
Also per previous discussion please do any namespace changes in their own PR. Thank you. :)
include/envoy/http/access_log.h
Outdated
@@ -99,7 +99,7 @@ class RequestInfo { | |||
/** | |||
* @return the protocol of the request. | |||
*/ | |||
virtual Protocol protocol() const PURE; | |||
virtual Optional<Protocol> protocol() const PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some of these things it's going to be awkward to have it all in 1 interface/class. WDYT about having a "BaseAccessLog" interface and then "HttpAccessLog", "TcpAccessLog", etc.? I don't have a good sense of what will happen but I do think it might be strange to try to mash it all together.
source/common/filter/tcp_proxy.h
Outdated
@@ -79,6 +82,7 @@ class TcpProxyConfig { | |||
|
|||
std::vector<Route> routes_; | |||
const TcpProxyStats stats_; | |||
std::vector<Http::AccessLog::InstanceSharedPtr> access_logs_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a handle to the access log. Feel free to rename / add comments.
That was my biggest question as I was doing this. Pros of jamming together:
Cons of jamming together:
I guess the other way to do it would be to have some downcasts in the field extractors (probably dynamic_cast for safety), but that doesn't feel like it's any better than this. Also, if we change our minds later, there's nothing preventing us from breaking it up later. Having looked at the code, I'm in favor of RequestInfo being the big union, and jamming it all together. It saves a large amount of code, both in the implementation, and in the tests, and in the v2 api. |
Yeah I think potentially it could be factored in such a way as to have a base set of extractors, etc. and then allow derived implementations to add others, but I don't feel that strongly about it. Let's just start with this and see how it goes. If it gets too awkward we can split it up later as you say. |
Related API PR: envoyproxy/data-plane-api#214 |
source/common/json/config_schemas.cc
Outdated
}, | ||
"required" : ["path"], | ||
"additionalProperties" : false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, making TCP proxy configurable via v2 is probably a relatively easy PR, comparable to #1952. The proto in data-plane-api
is already there.
return std::to_string(std::chrono::duration_cast<std::chrono::milliseconds>( | ||
request_info.responseReceivedDuration()) | ||
.count()); | ||
Optional<std::chrono::microseconds> duration = request_info.responseReceivedDuration(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When looking at this, I wondered about the relationship to https://github.com/envoyproxy/data-plane-api/blob/master/api/accesslog.proto and proto logging. It feels like RequestInfo
is similar to EnvoyAccessLog
in terms of what it represents at log generation time (although it's more stateful, and keeps track of stuff during requests).
Would it makes sense to be building up a proto suitable for sending to gRPC access logs during HTTP/TCP requests, and then do a final pass to convert it to text if needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this does make sense to see where we can consolidate. I can also take a look at this in the context of the streaming gRPC log handler work that I need to do.
@ggreenway, @goaway has some concerns about a unified access log type for HTTP vs. TCP vs. whatever (Redis/Mongo/etc.) at least at the proto level. He can share his concerns here and perhaps you two can quickly chat offline. I'm happy to split the AccessLogEntry proto into a base proto as well as an HTTP proto, but now is about the last time we can easily make this change. |
Thanks for looping me in, @mattklein123. To expand on the above summary, we've internally struggled for a while with managing datasets that have inconsistently-populated content (generally due to many special-cased fields appropriate only for a single context). It's hard to work on these datasets without having a lot of domain knowledge, and over time people make bad assumptions about what will be present and what won't, due to a lack of visibility. (Can go into more depth about this and other issues we've encountered when going down the "unified" route, if you're interested.) In general, I think one of the major advantages to structured logging is consistency, and we start lose that as we add an increasing number of "optional" fields. My feeling at this point is that it might make more sense to have some sort of Base access log struct to capture generalizable context, and then have protocol-specific exchange structs that capture case-specific data. HTTP would be one, TCP could be another, and other special cases could expose their own useful information. These structs would then drive case-specific access logs with internally-consistent data. |
I think there are two intersecting issues here:
I hadn't thought much about (2) because I'm most interested in (1) for my current needs. There are also 3 changes (that I'm aware of) that affect logging right now: this one, upstream logging, and defining a protobuf logging format. I agree it's important to figure out the correct format for protobuf logging right now. Do we need to defer work on this PR until that is sorted out? |
I don't think we need to block work on this PR, but we do need to sort out the proto situation ASAP. I think I'm in favor of splitting the access log protos into a base, HTTP, and TCP (which might just be base). Does that work for you @ggreenway ? |
@mattklein123 that's fine with me. |
Sounds good to me, too - thanks @ggreenway , @mattklein123 . |
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
@mattklein123 @htuch This is now ready for further review. The only known-work from this that's missing is the bulk move and renamespacing of source/common/http/access_log, but as discussed earlier, I'll put that into it's own PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggreenway at a very high level this LGTM. Can we do the move change and then merge that into this? Then I can do the full review.
source/common/json/config_schemas.cc
Outdated
@@ -464,6 +464,62 @@ const std::string Json::Schema::TCP_PROXY_NETWORK_FILTER_SCHEMA(R"EOF( | |||
{ | |||
"$schema": "http://json-schema.org/schema#", | |||
"type" : "object", | |||
"definitions": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer to not duplicate the schema. Can we carve this out into a new schema object and validate at a lower level?
@ggreenway P.S., thanks for all the work on the docs. This is great. |
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
@mattklein123 the "big move" is merged into this now, and schema validation is done in a common place. Please take a look. |
Signed-off-by: Greg Greenway <ggreenway@apple.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff. This is going to be a really useful feature for us at Lyft. Just a few nits.
docs/configuration/access_log.rst
Outdated
Total duration in milliseconds of the request from the start time to the last byte out. | ||
|
||
TCP | ||
Total duration of the downstream connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unit? milliseconds?
docs/configuration/access_log.rst
Outdated
* **DI**: The request processing was delayed for a period specified via :ref:`fault injection <config_http_filters_fault_injection>`. | ||
* **FI**: The request was aborted with a response code specified via :ref:`fault injection <config_http_filters_fault_injection>`. | ||
* **RL**: The request was ratelimited locally by the :ref:`HTTP rate limit filter <config_http_filters_rate_limit>` in addition to 429 response code. | ||
Additional details about the response or connection, if any. For TCP connections, the specified response code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "For TCP connections, the specified response code does not apply" is a bit confusing. I think you mean to say something like all codes don't apply, etc.
Access logging | ||
=================== | ||
|
||
The HTTP connection manager and tcp proxy supports extensible access logging with the following |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: x-link to filters.
@@ -118,7 +118,7 @@ class RequestInfo { | |||
/** | |||
* @return the protocol of the request. | |||
*/ | |||
virtual Http::Protocol protocol() const PURE; | |||
virtual Optional<Http::Protocol> protocol() const PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately I think we will need this to be an enum specific to access logs if we don't split this apart later. On this topic, should this potentially print "TCP" ? (We could have it print "UDP" for example once we support UDP proxy). NBD, just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we'll want that eventually. I wanted to limit the scope of this PR where I could.
@@ -88,7 +88,7 @@ class RequestInfo { | |||
* @return duration from request start to when the entire request was received from the | |||
* downstream client in microseconds. Note: if unset, will return 0 microseconds. | |||
*/ | |||
virtual std::chrono::microseconds requestReceivedDuration() const PURE; | |||
virtual Optional<std::chrono::microseconds> requestReceivedDuration() const PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use const Optional<...>&
for all of the ones you changed.
@@ -53,7 +53,7 @@ class AccessLogFormatParser { | |||
class AccessLogFormatUtils { | |||
public: | |||
static FormatterPtr defaultAccessLogFormatter(); | |||
static const std::string& protocolToString(Http::Protocol protocol); | |||
static const std::string& protocolToString(Optional<Http::Protocol> protocol); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const Optional&
@@ -28,7 +28,9 @@ Http::FilterHeadersStatus Http1BridgeFilter::decodeHeaders(Http::HeaderMap& head | |||
setupStatTracking(headers); | |||
} | |||
|
|||
if (decoder_callbacks_->requestInfo().protocol() != Http::Protocol::Http2 && grpc_request) { | |||
Optional<Http::Protocol> protocol = decoder_callbacks_->requestInfo().protocol(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const Optional&
test/common/filter/tcp_proxy_test.cc
Outdated
class TcpProxyTest : public testing::Test { | ||
public: | ||
TcpProxyTest() { | ||
ON_CALL(*factory_context_.access_log_manager_.file_, write(_)) | ||
.WillByDefault(SaveArg<0>(&access_log_data_)); | ||
reconfigureAccessLogging(std::string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we just call below setup() and have every test call it?
@ggreenway one other random thought: I think for TCP logging there is probably value in having a log line at the beginning of a connection as well as the end, since it may be common to have very long lived connections. I might throw a TODO to that effect in the TcpProxy filter destructor. |
Signed-off-by: Greg Greenway <ggreenway@apple.com>
@mattklein123 I think logging at the beginning of a connection should probably be filed as a new GH issue. A few thoughts:
|
Sure sounds good. I do think that IP address is useful for long lived. I need to see for our use case whether that becomes an issue or not. |
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Description: this PR uses config templating to introduce fallback DNS servers for c-ares in Android. Subsequent PR will add EngineBuilder configurability. Risk Level: low -- the current config is a no-op. Testing: e2e testing with the Lyft App Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>
Description: this PR uses config templating to introduce fallback DNS servers for c-ares in Android. Subsequent PR will add EngineBuilder configurability. Risk Level: low -- the current config is a no-op. Testing: e2e testing with the Lyft App Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>
fixes #1159