Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add access logging to tcp_proxy #1953

Merged
merged 15 commits into from
Nov 10, 2017

Conversation

ggreenway
Copy link
Contributor

fixes #1159

fixes envoyproxy#1159

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
…ges.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway
Copy link
Contributor Author

This still needs tests and doc changes, but I wanted to make sure this is roughly the way we want to implement this. @mattklein123

@ggreenway ggreenway force-pushed the tcp-access-log branch 2 times, most recently from 77e6475 to b96200a Compare October 26, 2017 22:59
@ggreenway
Copy link
Contributor Author

Also not present is moving source/common/http/access_log/* to source/common/access_log/*, and re-namespacing all of those things out of Http. But I'll do that as a separate PR so as to not mix real changes and code movement.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do the config changes in data plane api ?

},
"required" : ["path"],
"additionalProperties" : false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one suggestion. Can we just do this in the v2 api protos, so that it starts becoming the one place where we add stuff to configs? I just synced the whole thing and I am hoping to not diverge anymore.

We could just copy the access log stuff in filter/http/ into filter/ or something along those lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, tcp_proxy is configured via json, not via v2 APIs. Until that is converted, this is the only way to do it, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sucks! Atleast could you send a pr to data plane api to add the corresponding configuration bits? It’s fine if it’s duplicated for the moment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll do that. I wanted to get buy-in that this is the right approach. Once I have that, it should be a one-line change in the api (adding one config option to tcp_proxy, which uses the existing http-access-log type/message).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, making TCP proxy configurable via v2 is probably a relatively easy PR, comparable to #1952. The proto in data-plane-api is already there.

case Protocol::Http2:
return Http2String;
const std::string& AccessLogFormatUtils::protocolToString(Optional<Protocol> protocol) {
if (protocol.valid()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to make this a bit more generic like protocol is TCP/mongo/redis/etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was tricky because there's one filter that's using the request_info to figure out whether it is http1 or http2. I'd like to make this better, but I didn't want to have the scope of this increase too much. I mostly wanted to get the core of tcp access logging in, and then we can enable logging more and more as we go.

@@ -79,6 +82,7 @@ class TcpProxyConfig {

std::vector<Route> routes_;
const TcpProxyStats stats_;
std::vector<Http::AccessLog::InstanceSharedPtr> access_logs_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for the access log system to collect the access logs? Or is this a handle to the access log handler? IF so, might be good to rename it to something appropriate or add a comment to that effect. On first read, it looks like its storing access log per connection, in a vector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from http_connection_manager.h, and there's no comment there either. I think readers just need to look up the definition of the type to figure out what it is (as is the case with many things).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a handle to the access log. Feel free to rename / add comments.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good to me. My main question is whether we should try to jam everything to the same interface. I haven't gone through all the fields in RequestInfo to have a real opinion here so would like to hear your opinion.

I agree with @rshriram that we should do the data-plane-api PR in parallel. Feel free to do that and then push a PR which uses a branch/fork there. It's often helpful to review both at the same time so we can all agree on things.

Also per previous discussion please do any namespace changes in their own PR. Thank you. :)

@@ -99,7 +99,7 @@ class RequestInfo {
/**
* @return the protocol of the request.
*/
virtual Protocol protocol() const PURE;
virtual Optional<Protocol> protocol() const PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some of these things it's going to be awkward to have it all in 1 interface/class. WDYT about having a "BaseAccessLog" interface and then "HttpAccessLog", "TcpAccessLog", etc.? I don't have a good sense of what will happen but I do think it might be strange to try to mash it all together.

@@ -79,6 +82,7 @@ class TcpProxyConfig {

std::vector<Route> routes_;
const TcpProxyStats stats_;
std::vector<Http::AccessLog::InstanceSharedPtr> access_logs_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a handle to the access log. Feel free to rename / add comments.

@ggreenway
Copy link
Contributor Author

My main question is whether we should try to jam everything to the same interface.

That was my biggest question as I was doing this.

Pros of jamming together:

  • Maximum code reuse. We're reusing all the code for parsing the format string, building up the structure to emit the log lines (and extracting the needed data from RequestInfo), config parsing and validation, etc.
  • Could use the exact same format string for http and tcp access logs if you wanted to (with missing fields getting printed as "-")

Cons of jamming together:

  • RequestInfo is the union of all fields needed for all types of access logging, which is kind of clunky.
  • tcp access log format strings can contain fields that will never have data (like http response code). This could be overcome by having some additional validation by the instantiator of the access logs, probably a whitelist of fields that can be used.

I guess the other way to do it would be to have some downcasts in the field extractors (probably dynamic_cast for safety), but that doesn't feel like it's any better than this.

Also, if we change our minds later, there's nothing preventing us from breaking it up later.

Having looked at the code, I'm in favor of RequestInfo being the big union, and jamming it all together. It saves a large amount of code, both in the implementation, and in the tests, and in the v2 api.

@mattklein123
Copy link
Member

Also, if we change our minds later, there's nothing preventing us from breaking it up later.

Yeah I think potentially it could be factored in such a way as to have a base set of extractors, etc. and then allow derived implementations to add others, but I don't feel that strongly about it. Let's just start with this and see how it goes. If it gets too awkward we can split it up later as you say.

@ggreenway
Copy link
Contributor Author

Related API PR: envoyproxy/data-plane-api#214

},
"required" : ["path"],
"additionalProperties" : false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, making TCP proxy configurable via v2 is probably a relatively easy PR, comparable to #1952. The proto in data-plane-api is already there.

return std::to_string(std::chrono::duration_cast<std::chrono::milliseconds>(
request_info.responseReceivedDuration())
.count());
Optional<std::chrono::microseconds> duration = request_info.responseReceivedDuration();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When looking at this, I wondered about the relationship to https://github.com/envoyproxy/data-plane-api/blob/master/api/accesslog.proto and proto logging. It feels like RequestInfo is similar to EnvoyAccessLog in terms of what it represents at log generation time (although it's more stateful, and keeps track of stuff during requests).

Would it makes sense to be building up a proto suitable for sending to gRPC access logs during HTTP/TCP requests, and then do a final pass to convert it to text if needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this does make sense to see where we can consolidate. I can also take a look at this in the context of the streaming gRPC log handler work that I need to do.

@mattklein123
Copy link
Member

@ggreenway, @goaway has some concerns about a unified access log type for HTTP vs. TCP vs. whatever (Redis/Mongo/etc.) at least at the proto level. He can share his concerns here and perhaps you two can quickly chat offline. I'm happy to split the AccessLogEntry proto into a base proto as well as an HTTP proto, but now is about the last time we can easily make this change.

@goaway
Copy link
Contributor

goaway commented Nov 1, 2017

Thanks for looping me in, @mattklein123. To expand on the above summary, we've internally struggled for a while with managing datasets that have inconsistently-populated content (generally due to many special-cased fields appropriate only for a single context). It's hard to work on these datasets without having a lot of domain knowledge, and over time people make bad assumptions about what will be present and what won't, due to a lack of visibility. (Can go into more depth about this and other issues we've encountered when going down the "unified" route, if you're interested.) In general, I think one of the major advantages to structured logging is consistency, and we start lose that as we add an increasing number of "optional" fields.

My feeling at this point is that it might make more sense to have some sort of Base access log struct to capture generalizable context, and then have protocol-specific exchange structs that capture case-specific data. HTTP would be one, TCP could be another, and other special cases could expose their own useful information. These structs would then drive case-specific access logs with internally-consistent data.

@ggreenway
Copy link
Contributor Author

I think there are two intersecting issues here:

  1. Text-file logging. For this, the final format is supplied in the configuration. Users are free to only define the fields that make sense in their logging format, so making the set of available fields be the union of all logging types is harmless.

  2. Protobuf/structured logging. For this, we're defining the final output format. Having fields present that don't make sense in the context makes it harder to deal with.

I hadn't thought much about (2) because I'm most interested in (1) for my current needs. There are also 3 changes (that I'm aware of) that affect logging right now: this one, upstream logging, and defining a protobuf logging format. I agree it's important to figure out the correct format for protobuf logging right now. Do we need to defer work on this PR until that is sorted out?

@mattklein123
Copy link
Member

I don't think we need to block work on this PR, but we do need to sort out the proto situation ASAP. I think I'm in favor of splitting the access log protos into a base, HTTP, and TCP (which might just be base). Does that work for you @ggreenway ?

@ggreenway
Copy link
Contributor Author

@mattklein123 that's fine with me.

@goaway
Copy link
Contributor

goaway commented Nov 1, 2017

Sounds good to me, too - thanks @ggreenway , @mattklein123 .

Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway ggreenway changed the title WIP/RFC: Add access logging to tcp_proxy WIP: Add access logging to tcp_proxy Nov 6, 2017
Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway ggreenway changed the title WIP: Add access logging to tcp_proxy Add access logging to tcp_proxy Nov 6, 2017
@ggreenway
Copy link
Contributor Author

@mattklein123 @htuch This is now ready for further review.

The only known-work from this that's missing is the bulk move and renamespacing of source/common/http/access_log, but as discussed earlier, I'll put that into it's own PR.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggreenway at a very high level this LGTM. Can we do the move change and then merge that into this? Then I can do the full review.

@@ -464,6 +464,62 @@ const std::string Json::Schema::TCP_PROXY_NETWORK_FILTER_SCHEMA(R"EOF(
{
"$schema": "http://json-schema.org/schema#",
"type" : "object",
"definitions": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer to not duplicate the schema. Can we carve this out into a new schema object and validate at a lower level?

@mattklein123
Copy link
Member

@ggreenway P.S., thanks for all the work on the docs. This is great.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway
Copy link
Contributor Author

@mattklein123 the "big move" is merged into this now, and schema validation is done in a common place. Please take a look.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff. This is going to be a really useful feature for us at Lyft. Just a few nits.

Total duration in milliseconds of the request from the start time to the last byte out.

TCP
Total duration of the downstream connection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unit? milliseconds?

* **DI**: The request processing was delayed for a period specified via :ref:`fault injection <config_http_filters_fault_injection>`.
* **FI**: The request was aborted with a response code specified via :ref:`fault injection <config_http_filters_fault_injection>`.
* **RL**: The request was ratelimited locally by the :ref:`HTTP rate limit filter <config_http_filters_rate_limit>` in addition to 429 response code.
Additional details about the response or connection, if any. For TCP connections, the specified response code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "For TCP connections, the specified response code does not apply" is a bit confusing. I think you mean to say something like all codes don't apply, etc.

Access logging
===================

The HTTP connection manager and tcp proxy supports extensible access logging with the following
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: x-link to filters.

@@ -118,7 +118,7 @@ class RequestInfo {
/**
* @return the protocol of the request.
*/
virtual Http::Protocol protocol() const PURE;
virtual Optional<Http::Protocol> protocol() const PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately I think we will need this to be an enum specific to access logs if we don't split this apart later. On this topic, should this potentially print "TCP" ? (We could have it print "UDP" for example once we support UDP proxy). NBD, just a thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we'll want that eventually. I wanted to limit the scope of this PR where I could.

@@ -88,7 +88,7 @@ class RequestInfo {
* @return duration from request start to when the entire request was received from the
* downstream client in microseconds. Note: if unset, will return 0 microseconds.
*/
virtual std::chrono::microseconds requestReceivedDuration() const PURE;
virtual Optional<std::chrono::microseconds> requestReceivedDuration() const PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use const Optional<...>& for all of the ones you changed.

@@ -53,7 +53,7 @@ class AccessLogFormatParser {
class AccessLogFormatUtils {
public:
static FormatterPtr defaultAccessLogFormatter();
static const std::string& protocolToString(Http::Protocol protocol);
static const std::string& protocolToString(Optional<Http::Protocol> protocol);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const Optional&

@@ -28,7 +28,9 @@ Http::FilterHeadersStatus Http1BridgeFilter::decodeHeaders(Http::HeaderMap& head
setupStatTracking(headers);
}

if (decoder_callbacks_->requestInfo().protocol() != Http::Protocol::Http2 && grpc_request) {
Optional<Http::Protocol> protocol = decoder_callbacks_->requestInfo().protocol();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const Optional&

class TcpProxyTest : public testing::Test {
public:
TcpProxyTest() {
ON_CALL(*factory_context_.access_log_manager_.file_, write(_))
.WillByDefault(SaveArg<0>(&access_log_data_));
reconfigureAccessLogging(std::string());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we just call below setup() and have every test call it?

@mattklein123
Copy link
Member

@ggreenway one other random thought: I think for TCP logging there is probably value in having a log line at the beginning of a connection as well as the end, since it may be common to have very long lived connections. I might throw a TODO to that effect in the TcpProxy filter destructor.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
@ggreenway
Copy link
Contributor Author

@mattklein123 I think logging at the beginning of a connection should probably be filed as a new GH issue. A few thoughts:

  1. http connections can also be long-lived, so this would probably apply there too
  2. there isn't much information to log at the beginning of the connection, especially for tcp connections. Pretty much just when, and the upstream/downstream addresses involved.

@mattklein123
Copy link
Member

@mattklein123 I think logging at the beginning of a connection should probably be filed as a new GH issue. A few thoughts:

Sure sounds good. I do think that IP address is useful for long lived. I need to see for our use case whether that becomes an issue or not.

Signed-off-by: Greg Greenway <ggreenway@apple.com>
@mattklein123 mattklein123 merged commit d668797 into envoyproxy:master Nov 10, 2017
@ggreenway ggreenway deleted the tcp-access-log branch November 10, 2017 17:17
jpsim pushed a commit that referenced this pull request Nov 28, 2022
Description: this PR uses config templating to introduce fallback DNS servers for c-ares in Android. Subsequent PR will add EngineBuilder configurability.
Risk Level: low -- the current config is a no-op.
Testing: e2e testing with the Lyft App

Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
Description: this PR uses config templating to introduce fallback DNS servers for c-ares in Android. Subsequent PR will add EngineBuilder configurability.
Risk Level: low -- the current config is a no-op.
Testing: e2e testing with the Lyft App

Signed-off-by: Jose Nino <jnino@lyft.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tcp proxy: add access log
5 participants