Add access logging to tcp_proxy #1953

ggreenway · 2017-10-26T22:57:06Z

fixes envoyproxy#1159 Signed-off-by: Greg Greenway <ggreenway@apple.com>

Signed-off-by: Greg Greenway <ggreenway@apple.com>

…ges. Signed-off-by: Greg Greenway <ggreenway@apple.com>

ggreenway · 2017-10-26T22:57:36Z

This still needs tests and doc changes, but I wanted to make sure this is roughly the way we want to implement this. @mattklein123

ggreenway · 2017-10-26T23:49:11Z

Also not present is moving source/common/http/access_log/* to source/common/access_log/*, and re-namespacing all of those things out of Http. But I'll do that as a separate PR so as to not mix real changes and code movement.

Signed-off-by: Greg Greenway <ggreenway@apple.com>

rshriram

can we do the config changes in data plane api ?

rshriram · 2017-10-27T15:44:35Z

source/common/json/config_schemas.cc

+            },
+            "required" : ["path"],
+            "additionalProperties" : false
+          }


one suggestion. Can we just do this in the v2 api protos, so that it starts becoming the one place where we add stuff to configs? I just synced the whole thing and I am hoping to not diverge anymore.

We could just copy the access log stuff in filter/http/ into filter/ or something along those lines.

Right now, tcp_proxy is configured via json, not via v2 APIs. Until that is converted, this is the only way to do it, I think.

That sucks! Atleast could you send a pr to data plane api to add the corresponding configuration bits? It’s fine if it’s duplicated for the moment

Yes, I'll do that. I wanted to get buy-in that this is the right approach. Once I have that, it should be a one-line change in the api (adding one config option to tcp_proxy, which uses the existing http-access-log type/message).

FWIW, making TCP proxy configurable via v2 is probably a relatively easy PR, comparable to #1952. The proto in data-plane-api is already there.

rshriram · 2017-10-27T15:51:15Z

source/common/http/access_log/access_log_formatter.cc

-  case Protocol::Http2:
-    return Http2String;
+const std::string& AccessLogFormatUtils::protocolToString(Optional<Protocol> protocol) {
+  if (protocol.valid()) {


is it possible to make this a bit more generic like protocol is TCP/mongo/redis/etc?

This was tricky because there's one filter that's using the request_info to figure out whether it is http1 or http2. I'd like to make this better, but I didn't want to have the scope of this increase too much. I mostly wanted to get the core of tcp access logging in, and then we can enable logging more and more as we go.

rshriram · 2017-10-27T15:56:07Z

source/common/filter/tcp_proxy.h

@@ -79,6 +82,7 @@ class TcpProxyConfig {

  std::vector<Route> routes_;
  const TcpProxyStats stats_;
+  std::vector<Http::AccessLog::InstanceSharedPtr> access_logs_;


is this for the access log system to collect the access logs? Or is this a handle to the access log handler? IF so, might be good to rename it to something appropriate or add a comment to that effect. On first read, it looks like its storing access log per connection, in a vector.

I copied this from http_connection_manager.h, and there's no comment there either. I think readers just need to look up the definition of the type to figure out what it is (as is the case with many things).

It's a handle to the access log. Feel free to rename / add comments.

mattklein123

In general looks good to me. My main question is whether we should try to jam everything to the same interface. I haven't gone through all the fields in RequestInfo to have a real opinion here so would like to hear your opinion.

I agree with @rshriram that we should do the data-plane-api PR in parallel. Feel free to do that and then push a PR which uses a branch/fork there. It's often helpful to review both at the same time so we can all agree on things.

Also per previous discussion please do any namespace changes in their own PR. Thank you. :)

mattklein123 · 2017-10-27T21:25:31Z

include/envoy/http/access_log.h

@@ -99,7 +99,7 @@ class RequestInfo {
  /**
   * @return the protocol of the request.
   */
-  virtual Protocol protocol() const PURE;
+  virtual Optional<Protocol> protocol() const PURE;


For some of these things it's going to be awkward to have it all in 1 interface/class. WDYT about having a "BaseAccessLog" interface and then "HttpAccessLog", "TcpAccessLog", etc.? I don't have a good sense of what will happen but I do think it might be strange to try to mash it all together.

mattklein123 · 2017-10-27T21:27:40Z

source/common/filter/tcp_proxy.h

@@ -79,6 +82,7 @@ class TcpProxyConfig {

  std::vector<Route> routes_;
  const TcpProxyStats stats_;
+  std::vector<Http::AccessLog::InstanceSharedPtr> access_logs_;


It's a handle to the access log. Feel free to rename / add comments.

ggreenway · 2017-10-27T21:50:38Z

My main question is whether we should try to jam everything to the same interface.

That was my biggest question as I was doing this.

Pros of jamming together:

Maximum code reuse. We're reusing all the code for parsing the format string, building up the structure to emit the log lines (and extracting the needed data from RequestInfo), config parsing and validation, etc.
Could use the exact same format string for http and tcp access logs if you wanted to (with missing fields getting printed as "-")

Cons of jamming together:

RequestInfo is the union of all fields needed for all types of access logging, which is kind of clunky.
tcp access log format strings can contain fields that will never have data (like http response code). This could be overcome by having some additional validation by the instantiator of the access logs, probably a whitelist of fields that can be used.

I guess the other way to do it would be to have some downcasts in the field extractors (probably dynamic_cast for safety), but that doesn't feel like it's any better than this.

Also, if we change our minds later, there's nothing preventing us from breaking it up later.

Having looked at the code, I'm in favor of RequestInfo being the big union, and jamming it all together. It saves a large amount of code, both in the implementation, and in the tests, and in the v2 api.

mattklein123 · 2017-10-27T22:53:11Z

Also, if we change our minds later, there's nothing preventing us from breaking it up later.

Yeah I think potentially it could be factored in such a way as to have a base set of extractors, etc. and then allow derived implementations to add others, but I don't feel that strongly about it. Let's just start with this and see how it goes. If it gets too awkward we can split it up later as you say.

ggreenway · 2017-10-27T23:20:55Z

Related API PR: envoyproxy/data-plane-api#214

htuch · 2017-10-29T18:17:58Z

source/common/json/config_schemas.cc

+            },
+            "required" : ["path"],
+            "additionalProperties" : false
+          }


FWIW, making TCP proxy configurable via v2 is probably a relatively easy PR, comparable to #1952. The proto in data-plane-api is already there.

htuch · 2017-10-29T18:23:06Z

source/common/http/access_log/access_log_formatter.cc

-      return std::to_string(std::chrono::duration_cast<std::chrono::milliseconds>(
-                                request_info.responseReceivedDuration())
-                                .count());
+      Optional<std::chrono::microseconds> duration = request_info.responseReceivedDuration();


When looking at this, I wondered about the relationship to https://github.com/envoyproxy/data-plane-api/blob/master/api/accesslog.proto and proto logging. It feels like RequestInfo is similar to EnvoyAccessLog in terms of what it represents at log generation time (although it's more stateful, and keeps track of stuff during requests).

Would it makes sense to be building up a proto suitable for sending to gRPC access logs during HTTP/TCP requests, and then do a final pass to convert it to text if needed?

I think this does make sense to see where we can consolidate. I can also take a look at this in the context of the streaming gRPC log handler work that I need to do.

mattklein123 · 2017-11-01T18:23:40Z

@ggreenway, @goaway has some concerns about a unified access log type for HTTP vs. TCP vs. whatever (Redis/Mongo/etc.) at least at the proto level. He can share his concerns here and perhaps you two can quickly chat offline. I'm happy to split the AccessLogEntry proto into a base proto as well as an HTTP proto, but now is about the last time we can easily make this change.

goaway · 2017-11-01T19:03:22Z

Thanks for looping me in, @mattklein123. To expand on the above summary, we've internally struggled for a while with managing datasets that have inconsistently-populated content (generally due to many special-cased fields appropriate only for a single context). It's hard to work on these datasets without having a lot of domain knowledge, and over time people make bad assumptions about what will be present and what won't, due to a lack of visibility. (Can go into more depth about this and other issues we've encountered when going down the "unified" route, if you're interested.) In general, I think one of the major advantages to structured logging is consistency, and we start lose that as we add an increasing number of "optional" fields.

My feeling at this point is that it might make more sense to have some sort of Base access log struct to capture generalizable context, and then have protocol-specific exchange structs that capture case-specific data. HTTP would be one, TCP could be another, and other special cases could expose their own useful information. These structs would then drive case-specific access logs with internally-consistent data.

ggreenway · 2017-11-01T19:14:32Z

I think there are two intersecting issues here:

Text-file logging. For this, the final format is supplied in the configuration. Users are free to only define the fields that make sense in their logging format, so making the set of available fields be the union of all logging types is harmless.
Protobuf/structured logging. For this, we're defining the final output format. Having fields present that don't make sense in the context makes it harder to deal with.

I hadn't thought much about (2) because I'm most interested in (1) for my current needs. There are also 3 changes (that I'm aware of) that affect logging right now: this one, upstream logging, and defining a protobuf logging format. I agree it's important to figure out the correct format for protobuf logging right now. Do we need to defer work on this PR until that is sorted out?

mattklein123 · 2017-11-01T19:57:37Z

I don't think we need to block work on this PR, but we do need to sort out the proto situation ASAP. I think I'm in favor of splitting the access log protos into a base, HTTP, and TCP (which might just be base). Does that work for you @ggreenway ?

ggreenway · 2017-11-01T20:26:20Z

@mattklein123 that's fine with me.

goaway · 2017-11-01T20:55:22Z

Sounds good to me, too - thanks @ggreenway , @mattklein123 .

Signed-off-by: Greg Greenway <ggreenway@apple.com>

ggreenway · 2017-11-06T23:30:05Z

@mattklein123 @htuch This is now ready for further review.

The only known-work from this that's missing is the bulk move and renamespacing of source/common/http/access_log, but as discussed earlier, I'll put that into it's own PR.

mattklein123

@ggreenway at a very high level this LGTM. Can we do the move change and then merge that into this? Then I can do the full review.

mattklein123 · 2017-11-07T18:27:21Z

source/common/json/config_schemas.cc

@@ -464,6 +464,62 @@ const std::string Json::Schema::TCP_PROXY_NETWORK_FILTER_SCHEMA(R"EOF(
  {
      "$schema": "http://json-schema.org/schema#",
      "type" : "object",
+      "definitions": {


Would prefer to not duplicate the schema. Can we carve this out into a new schema object and validate at a lower level?

mattklein123 · 2017-11-07T18:28:02Z

@ggreenway P.S., thanks for all the work on the docs. This is great.

Signed-off-by: Greg Greenway <ggreenway@apple.com>

ggreenway · 2017-11-08T23:38:35Z

@mattklein123 the "big move" is merged into this now, and schema validation is done in a common place. Please take a look.

Signed-off-by: Greg Greenway <ggreenway@apple.com>

mattklein123

Great stuff. This is going to be a really useful feature for us at Lyft. Just a few nits.

mattklein123 · 2017-11-09T00:22:12Z

docs/configuration/access_log.rst

+    Total duration in milliseconds of the request from the start time to the last byte out.
+
+  TCP
+    Total duration of the downstream connection.


nit: unit? milliseconds?

mattklein123 · 2017-11-09T00:23:03Z

docs/configuration/access_log.rst

-  * **DI**: The request processing was delayed for a period specified via :ref:`fault injection <config_http_filters_fault_injection>`.
-  * **FI**: The request was aborted with a response code specified via :ref:`fault injection <config_http_filters_fault_injection>`.
-  * **RL**: The request was ratelimited locally by the :ref:`HTTP rate limit filter <config_http_filters_rate_limit>` in addition to 429 response code.
+  Additional details about the response or connection, if any. For TCP connections, the specified response code


nit: "For TCP connections, the specified response code does not apply" is a bit confusing. I think you mean to say something like all codes don't apply, etc.

mattklein123 · 2017-11-09T00:23:35Z

docs/intro/arch_overview/access_logging.rst

+Access logging
+===================
+
+The HTTP connection manager and tcp proxy supports extensible access logging with the following


nit: x-link to filters.

mattklein123 · 2017-11-09T00:25:15Z

include/envoy/access_log/access_log.h

@@ -118,7 +118,7 @@ class RequestInfo {
  /**
   * @return the protocol of the request.
   */
-  virtual Http::Protocol protocol() const PURE;
+  virtual Optional<Http::Protocol> protocol() const PURE;


Ultimately I think we will need this to be an enum specific to access logs if we don't split this apart later. On this topic, should this potentially print "TCP" ? (We could have it print "UDP" for example once we support UDP proxy). NBD, just a thought.

Yeah, I think we'll want that eventually. I wanted to limit the scope of this PR where I could.

mattklein123 · 2017-11-09T00:27:43Z

include/envoy/access_log/access_log.h

@@ -88,7 +88,7 @@ class RequestInfo {
   * @return duration from request start to when the entire request was received from the
   * downstream client in microseconds. Note: if unset, will return 0 microseconds.
   */
-  virtual std::chrono::microseconds requestReceivedDuration() const PURE;
+  virtual Optional<std::chrono::microseconds> requestReceivedDuration() const PURE;


I think you can use const Optional<...>& for all of the ones you changed.

mattklein123 · 2017-11-09T00:28:02Z

source/common/access_log/access_log_formatter.h

@@ -53,7 +53,7 @@ class AccessLogFormatParser {
 class AccessLogFormatUtils {
 public:
  static FormatterPtr defaultAccessLogFormatter();
-  static const std::string& protocolToString(Http::Protocol protocol);
+  static const std::string& protocolToString(Optional<Http::Protocol> protocol);


nit: const Optional&

mattklein123 · 2017-11-09T00:31:07Z

source/common/grpc/http1_bridge_filter.cc

@@ -28,7 +28,9 @@ Http::FilterHeadersStatus Http1BridgeFilter::decodeHeaders(Http::HeaderMap& head
    setupStatTracking(headers);
  }

-  if (decoder_callbacks_->requestInfo().protocol() != Http::Protocol::Http2 && grpc_request) {
+  Optional<Http::Protocol> protocol = decoder_callbacks_->requestInfo().protocol();


const Optional&

mattklein123 · 2017-11-09T00:33:23Z

test/common/filter/tcp_proxy_test.cc

 class TcpProxyTest : public testing::Test {
 public:
  TcpProxyTest() {
+    ON_CALL(*factory_context_.access_log_manager_.file_, write(_))
+        .WillByDefault(SaveArg<0>(&access_log_data_));
+    reconfigureAccessLogging(std::string());


nit: can we just call below setup() and have every test call it?

mattklein123 · 2017-11-09T20:49:43Z

@ggreenway one other random thought: I think for TCP logging there is probably value in having a log line at the beginning of a connection as well as the end, since it may be common to have very long lived connections. I might throw a TODO to that effect in the TcpProxy filter destructor.

Signed-off-by: Greg Greenway <ggreenway@apple.com>

ggreenway · 2017-11-09T23:55:09Z

@mattklein123 I think logging at the beginning of a connection should probably be filed as a new GH issue. A few thoughts:

http connections can also be long-lived, so this would probably apply there too
there isn't much information to log at the beginning of the connection, especially for tcp connections. Pretty much just when, and the upstream/downstream addresses involved.

mattklein123 · 2017-11-09T23:57:32Z

@mattklein123 I think logging at the beginning of a connection should probably be filed as a new GH issue. A few thoughts:

Sure sounds good. I do think that IP address is useful for long lived. I need to see for our use case whether that becomes an issue or not.

Signed-off-by: Greg Greenway <ggreenway@apple.com>

Description: this PR uses config templating to introduce fallback DNS servers for c-ares in Android. Subsequent PR will add EngineBuilder configurability. Risk Level: low -- the current config is a no-op. Testing: e2e testing with the Lyft App Signed-off-by: Jose Nino <jnino@lyft.com> Signed-off-by: JP Simard <jp@jpsim.com>

ggreenway added 4 commits October 25, 2017 22:41

Add access logging to tcp_proxy

b96200a

fixes envoyproxy#1159 Signed-off-by: Greg Greenway <ggreenway@apple.com>

Fix release build failure

6061b5d

Signed-off-by: Greg Greenway <ggreenway@apple.com>

Merge branch 'master' into tcp-access-log

297a108

Signed-off-by: Greg Greenway <ggreenway@apple.com>

Fix build failures from merge conflicts, and missed file of test chan…

334f48c

…ges. Signed-off-by: Greg Greenway <ggreenway@apple.com>

ggreenway force-pushed the tcp-access-log branch 2 times, most recently from 77e6475 to b96200a Compare October 26, 2017 22:59

Merge branch 'master' into tcp-access-log

71c3717

Signed-off-by: Greg Greenway <ggreenway@apple.com>

rshriram suggested changes Oct 27, 2017

View reviewed changes

mattklein123 reviewed Oct 27, 2017

View reviewed changes

htuch reviewed Oct 29, 2017

View reviewed changes

Merge branch 'master' into tcp-access-log

45967ae

ggreenway added 2 commits November 6, 2017 10:38

Fix merge conflict

a8b2b4c

Signed-off-by: Greg Greenway <ggreenway@apple.com>

Merge branch 'master' into tcp-access-log

4b09518

ggreenway changed the title ~~WIP/RFC: Add access logging to tcp_proxy~~ WIP: Add access logging to tcp_proxy Nov 6, 2017

ggreenway added 2 commits November 6, 2017 14:17

Add tests, and fix a few bugs

2cc065f

Signed-off-by: Greg Greenway <ggreenway@apple.com>

Add docs

359666e

Signed-off-by: Greg Greenway <ggreenway@apple.com>

ggreenway changed the title ~~WIP: Add access logging to tcp_proxy~~ Add access logging to tcp_proxy Nov 6, 2017

mattklein123 reviewed Nov 7, 2017

View reviewed changes

ggreenway mentioned this pull request Nov 8, 2017

Move AccessLog from namespace Envoy::Http to namespace Envoy #2023

Merged

ggreenway added 2 commits November 8, 2017 14:31

Merge branch 'master' into tcp-access-log

9c64eb5

Signed-off-by: Greg Greenway <ggreenway@apple.com>

Move access log schema validation so that it isn't duplicated.

9fc8128

Signed-off-by: Greg Greenway <ggreenway@apple.com>

Remove file that got accidentally re-added in merge.

a8e807e

Signed-off-by: Greg Greenway <ggreenway@apple.com>

mattklein123 reviewed Nov 9, 2017

View reviewed changes

Fix code review issues

63cf792

Signed-off-by: Greg Greenway <ggreenway@apple.com>

One more issue that I missed

0a37b4a

Signed-off-by: Greg Greenway <ggreenway@apple.com>

mattklein123 approved these changes Nov 10, 2017

View reviewed changes

mattklein123 merged commit d668797 into envoyproxy:master Nov 10, 2017

ggreenway deleted the tcp-access-log branch November 10, 2017 17:17

rgs1 mentioned this pull request Aug 27, 2021

thrift proxy: add access logger support #17898

Closed

Add access logging to tcp_proxy #1953

Add access logging to tcp_proxy #1953

Conversation

ggreenway commented Oct 26, 2017

ggreenway commented Oct 26, 2017

ggreenway commented Oct 26, 2017

rshriram left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggreenway commented Oct 27, 2017

mattklein123 commented Oct 27, 2017

ggreenway commented Oct 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 commented Nov 1, 2017

goaway commented Nov 1, 2017

ggreenway commented Nov 1, 2017

mattklein123 commented Nov 1, 2017

ggreenway commented Nov 1, 2017

goaway commented Nov 1, 2017

ggreenway commented Nov 6, 2017

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 commented Nov 7, 2017

ggreenway commented Nov 8, 2017

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 commented Nov 9, 2017

ggreenway commented Nov 9, 2017

mattklein123 commented Nov 9, 2017