Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel Tracing #10259

Closed
wants to merge 29 commits into from
Closed

Channel Tracing #10259

wants to merge 29 commits into from

Conversation

ncteisen
Copy link
Contributor

@ncteisen ncteisen commented Mar 22, 2017

First PR for the Channel Tracing project. See the gRFC here.

This step includes the new channel_tracing object, and the surface through which it is used. There is a new unit test and a new end2end test.

The tracer is used only once in the channel code. It logs channel created as a proof of concept. The next series of PRs will plumb the tracing code through all of the relevant paths.


This change is Reviewable

Copy link
Member

@ctiller ctiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some fairly minor questions/suggestions


// One node of tracing data
struct grpc_trace_node {
const char* data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we go for a grpc_slice immediately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that the data would always be a string literal. But I guess if that were the case then a slice would still be just as fast, and if would additionally provide the opportunity for custom strings. I will make that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

new_trace_node->subchannel = subchannel;

if (subchannel) {
GRPC_CHANNEL_TRACER_REF(subchannel);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a decision here: is it safe to ref a null node (remove this if) or it's unsafe (remove the one in ref)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll choose the former. I added the if (!tracer) return to any code that might be called in channel code, so the channel doesn't have to worry about if tracing is enabled or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// TODO(ncteisen): pull this function into a helper location
static grpc_json* link_json(grpc_json* child, grpc_json* brother,
grpc_json* parent) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) What's the advantage of creating a json tree here (versus just generating a string)?

(2) Given we're already thinking about a proto format for this, is it worth pulling the presentation of traces (the json generation) into a utility file so that we don't accidentally link presentation with runtime storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could potentially share a utility file with the error string generation. They accomplish similar goals. We can discuss this one more offline.

@markdroth
Copy link
Member

Overall, this looks pretty good. Most of my comments are cosmetic, but a few are structural issues. Please let me know if you'd like to discuss anything further. Thanks!


Reviewed 5 of 36 files at r1, 28 of 37 files at r2, 5 of 7 files at r3, 7 of 7 files at r4.
Review status: all files reviewed at latest revision, 42 unresolved discussions, some commit checks failed.


include/grpc/impl/codegen/grpc_types.h, line 245 at r4 (raw file):

/** Toggles channel tracing behavior. 0 means off. Non zero sets max nodes

  • tracked per tracing object */
    #define GRPC_ARG_CHANNEL_TRACING "grpc.channel_tracing"

If the non-zero value is actually the number of nodes to track, then I suggest calling this something like GRPC_ARG_CHANNEL_TRACING_MAX_NODES, and document the fact that setting the value to 0 disables tracing completely.

Also, please document what the default value is.


src/core/lib/channel/channel_tracer.c, line 198 at r3 (raw file):

Previously, ncteisen (Noah Eisen) wrote…

This could potentially share a utility file with the error string generation. They accomplish similar goals. We can discuss this one more offline.

I think I agree that it makes more sense to return a string. We don't want to expose the grpc_json type through the C-core public API, so we need to convert the parsed tree to a string either here or in the surface channel code. It seems a bit cleaner to do it here.


src/core/lib/channel/channel_tracer.c, line 56 at r4 (raw file):

// The tracer object that owns this trace node. This is used to ref and
// unref the tracing object as nodes are added or overwritten
grpc_channel_tracer* subchannel;

The comment seems incorrect here -- this isn't the tracer that owns this node, it's the optional tracer for the subchannel to which this trace node refers.


src/core/lib/channel/channel_tracer.c, line 58 at r4 (raw file):

  // The tracer object that owns this trace node. This is used to ref and
  // unref the tracing object as nodes are added or overwritten
  grpc_channel_tracer* subchannel;

I suggest generalizing this to be a reference to some other related tracer, without explicitly mentioning subchannels. That way, we can eliminate the is_parent parameter in several places below and simply treat this as a generic reference, which could work just as well for arbitrarily deep nesting (as long as there are no circular references).


src/core/lib/channel/channel_tracer.c, line 61 at r4 (raw file):

};

struct grpc_trace_node_list {

Any reason this needs to be a separate struct, as opposed to putting these fields directly into grpc_channel_tracer?


src/core/lib/channel/channel_tracer.c, line 72 at r4 (raw file):

  gpr_refcount refs;
  gpr_mu tracer_mu;
  int64_t num_nodes_logged;

Could use an unsigned type here.


src/core/lib/channel/channel_tracer.c, line 88 at r4 (raw file):

  gpr_ref_init(&tracer->refs, 1);
#ifdef GRPC_CHANNEL_TRACER_REFCOUNT_DEBUG
  gpr_log(GPR_DEBUG, "%p create [%s:%d %s]", tracer, file, line, func);

Could put this line up inside the preceding #ifdef GRPC_CHANNEL_TRACER_REFCOUNT_DEBUG block, so you don't need to duplicate the block.


src/core/lib/channel/channel_tracer.c, line 115 at r4 (raw file):

static void free_node(grpc_trace_node* node) {
  // no need to free string, since they are always static

I don't understand this comment.


src/core/lib/channel/channel_tracer.c, line 118 at r4 (raw file):

  GRPC_ERROR_UNREF(node->error);
  GRPC_CHANNEL_TRACER_UNREF(node->subchannel);
  grpc_exec_ctx exec_ctx = GRPC_EXEC_CTX_INIT;

How do we know it's safe to create our own exec_ctx here? (Hint: It probably isn't. :) )

I suspect that we actually need to pass the exec_ctx as a parameter to grpc_channel_tracer_unref(). And yes, this means we probably also need to add that parameter to grpc_channel_tracer_add_trace(), since it has to garbage-collect old nodes when the tracer is full.


src/core/lib/channel/channel_tracer.c, line 124 at r4 (raw file):

}

void grpc_channel_tracer_destroy(grpc_channel_tracer* tracer) {

static


src/core/lib/channel/channel_tracer.c, line 126 at r4 (raw file):

void grpc_channel_tracer_destroy(grpc_channel_tracer* tracer) {
  grpc_trace_node* it = tracer->node_list.head_trace;
  while (it) {

Instead of using a boolean evaluation for pointers, please explicitly compare the value to NULL, so that it's clear to the reader that this is a pointer and not a bool.

Same thing throughout.


src/core/lib/channel/channel_tracer.c, line 154 at r4 (raw file):

#endif

static void add_trace(grpc_trace_node_list* list, grpc_slice trace,

Why make this a separate function, rather than putting this code directly into grpc_channel_tracer_add_trace()?


src/core/lib/channel/channel_tracer.c, line 164 at r4 (raw file):

new_trace_node->subchannel = subchannel;
 
GRPC_CHANNEL_TRACER_REF(subchannel);

Can combine these into one line:

new_trace_node->subchannel = GRPC_CHANNEL_TRACER_REF(subchannel);


src/core/lib/channel/channel_tracer.c, line 167 at r4 (raw file):

  GRPC_CHANNEL_TRACER_REF(subchannel);

Nit: Please avoid unnecessary blank lines within functions. I generally find that whenever you could use a blank line to split up the logical pieces of code, you're better off using a single-line comment to explain what the next part of the code is doing.

Same thing throughout.


src/core/lib/channel/channel_tracer.c, line 168 at r4 (raw file):

  GRPC_CHANNEL_TRACER_REF(subchannel);

  // first node in case

s/in //


src/core/lib/channel/channel_tracer.c, line 177 at r4 (raw file):

    list->tail_trace = list->tail_trace->next;
  }
  list->size++;

Nit: I prefer using preincrement over postincrement if we're not actually using the original value. (Yes, I know the compiler will avoid the inefficiency here for us, but I tend to be a bit pedantic about such things.)

Same thing throughout.


src/core/lib/channel/channel_tracer.c, line 197 at r4 (raw file):

}

// TODO(ncteisen): pull this function into a helper location

I think these two methods could go right into the JSON library.


src/core/lib/channel/channel_tracer.c, line 223 at r4 (raw file):

  child = create_child(child, json, "data", grpc_slice_to_c_string(node->data),
                       GRPC_JSON_STRING, true);
  child = create_child(child, json, "error",

If the error is GRPC_ERROR_NONE, we should probably skip adding this field.


src/core/lib/channel/channel_tracer.c, line 227 at r4 (raw file):

                       GRPC_JSON_STRING, true);
  char* time_str;
  gpr_asprintf(&time_str, "%" PRId64 ".%09d", node->time_created.tv_sec,

Isn't this field supposed to be in RFC-3339 format?


src/core/lib/channel/channel_tracer.c, line 233 at r4 (raw file):

                       grpc_connectivity_state_name(node->connectivity_state),
                       GRPC_JSON_STRING, false);
  if (is_parent && node->subchannel) {

I think it's sufficient to check if node->subchannel != NULL. There's probably no need for the is_parent parameter.


src/core/lib/channel/channel_tracer.c, line 258 at r4 (raw file):

  if (!is_parent) {
    char* subchannel_addr_str;
    gpr_asprintf(&subchannel_addr_str, "%p", tracer);

It might be useful for the subchannel ID to include something more informative than just the address of its tracer. How about including the server address to which the subchannel is connected (which we can get from GRPC_ARG_SUBCHANNEL_ADDRESS, although we may want to grab it before the proxy mapper optionally overwrites it)?

Structurally, we could add a char *id field to grpc_channel_tracer, which could be set from the constructor. In the parent, we could set it to the value of GRPC_ARG_SERVER_URI, which would eliminate the need for the is_parent parameter.


src/core/lib/channel/channel_tracer.c, line 269 at r4 (raw file):

  char* time_str;
  gpr_asprintf(&time_str, "%" PRId64 ".%09d", tracer->time_created.tv_sec,

Isn't this field supposed to be in RFC-3339 format?


src/core/lib/channel/channel_tracer.c, line 294 at r4 (raw file):

  grpc_trace_node* it = list->head_trace;
  while (it) {

Instead of doing a separate iteration here, I suggest collecting the list of referenced tracers as we go through the nodes of the parent tracer. We can then continue adding to the list as we go through the referenced tracers and keep going until we have listed everything that's been referenced anywhere in the tree.


src/core/lib/channel/channel_tracer.h, line 41 at r4 (raw file):

#ifdef __cplusplus
extern "C" {
#endif

Is this actually going to be used from C++ somewhere? I would think it would be purely internal to C-core.


src/core/lib/channel/channel_tracer.h, line 48 at r4 (raw file):

typedef struct grpc_trace_node grpc_trace_node;
typedef struct grpc_trace_node_list grpc_trace_node_list;

I think these two can move to the .c file, since they're not used outside of this module.


src/core/lib/channel/channel_tracer.h, line 51 at r4 (raw file):

/* Adds a new trace node to the tracing object */
void grpc_channel_tracer_add_trace(grpc_channel_tracer* tracer,

Nit about the order of declarations here: I suggest declaring grpc_channel_tracer_create() first, followed by the ref and unref methods. That way, you have the equivalent of the constructor and destructor at the top, and all of the methods for acting on the object after that.


src/core/lib/channel/channel_tracer.h, line 52 at r4 (raw file):

/* Adds a new trace node to the tracing object */
void grpc_channel_tracer_add_trace(grpc_channel_tracer* tracer,
                                   grpc_slice trace, struct grpc_error* error,

s/struct grpc_error/grpc_error/


src/core/lib/channel/channel_tracer.h, line 52 at r4 (raw file):

/* Adds a new trace node to the tracing object */
void grpc_channel_tracer_add_trace(grpc_channel_tracer* tracer,
                                   grpc_slice trace, struct grpc_error* error,

Suggest renaming the trace parameter to something like msg or data.


src/core/lib/channel/channel_tracer.h, line 74 at r4 (raw file):

#endif

/* Dumps all of the trace to stderr */

Might be more accurate to just say "Logs all trace data via gpr_log()." There are cases where the log goes to something other than stderr (or is inhibited completely).


src/core/lib/channel/channel_tracer.h, line 75 at r4 (raw file):

/* Dumps all of the trace to stderr */
void grpc_channel_tracer_log_trace(grpc_channel_tracer* tracer);

What's the use-case for this method? Is it just for debugging? Would it make sense to use a preprocessor macro to conditionally enable it?


src/core/lib/channel/channel_tracer.h, line 77 at r4 (raw file):

/* Returns the tracing data in the form of a grpc json string.
The string is owned by the caller and must be freed. /
grpc_json
grpc_channel_tracer_get_trace(grpc_channel_tracer* tracer);

The comment is misleading here -- the return value is a parsed JSON tree, not a string.


src/core/lib/channel/channel_tracer.h, line 82 at r4 (raw file):

/* Initializes the tracing object with gpr_malloc. The caller has
ownership over the returned tracing object */

Suggest the following slightly clearer wording:

Creates a new tracer. The caller owns a reference to the returned tracer.


src/core/lib/surface/channel.c, line 77 at r4 (raw file):

  registered_call *registered_calls;

  grpc_channel_tracer *tracer;

Seeing this code made me wonder whether it makes sense to put the tracer here in the surface channel object or whether it would be better to put it in the client_channel filter. But after a bit of consideration, I decided that this is the right place, because it avoids the need for a new method to be added to the filter API to return the tracer.

Note that this does require us to pass the tracer down into the filter stack (presumably via channel args) so that the client_channel filter can access it (and so that the subchannel stack can pass it down to the transport). But we can do that in a separate PR.


src/core/lib/surface/channel.c, line 177 at r4 (raw file):

    } else if (0 == strcmp(args->args[i].key, GRPC_ARG_CHANNEL_TRACING)) {
      channel->tracer =
          GRPC_CHANNEL_TRACER_CREATE((uint32_t)args->args[i].value.integer);

Let's use grpc_channel_arg_get_integer() here, to provide input validation.


src/core/lib/surface/channel.h, line 78 at r4 (raw file):

void grpc_channel_update_call_size_estimate(grpc_channel *channel, size_t size);

grpc_json *grpc_channel_get_trace(grpc_channel *channel);

See my comment in the gRFC about returning the result as a string instead of a parsed JSON tree.


test/core/channel/channel_tracing_test.c, line 52 at r4 (raw file):

}

static void validate_tracer(grpc_channel_tracer* tracer, size_t expected,

Suggest calling this parameter expected_num_nodes_logged.


test/core/util/channel_tracing_utils.c, line 43 at r4 (raw file):

grpc_json* child = parent->child;
while (child) {

Quoted 4 lines of code… > if (child->key && !strcmp(child->key, key)) { > return child; > } > child = child->next;
> }

This would be a bit more concise as a for loop:

for (grpc_json* child = parent->child; child != NULL; child = child->next) {
  if (child->key != NULL && strcmp(child->key, key) == 0) return child;
}

test/core/util/channel_tracing_utils.c, line 56 at r4 (raw file):

grpc_json* child = arr->child;
while (child) {
count++;
child = child->next;
}

Same here:

for (grpc_json* child = arr->child; child != NULL; child = child->next) {
  ++count;
}

test/core/util/channel_tracing_utils.c, line 75 at r4 (raw file):

  GPR_ASSERT(num_nodes_logged == num_nodes_logged_golden);

  grpc_json* nodes = get_json_child(channel_data, "nodes");

It would be good to also validate that each node has all of the expected fields. At minimum, we can at least check that the nodes have the right set of keys. But ideally, we should actually have the caller pass in an array of structs containing the values we expect to see for each node (minus things that we can't easily validate, like the timestamps), and then verify that the actual values match the expected values.


test/core/util/channel_tracing_utils.h, line 39 at r4 (raw file):

#include "src/core/lib/channel/channel_tracer.h"

void validate_channel_data(grpc_json* json, size_t num_nodes_logged_golden,

Suggest using "expected" instead of "golden" for these two parameters.


Comments from Reviewable

@ctiller
Copy link
Member

ctiller commented Mar 24, 2017

Review status: all files reviewed at latest revision, 42 unresolved discussions, some commit checks failed.


src/core/lib/surface/channel.c, line 77 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Seeing this code made me wonder whether it makes sense to put the tracer here in the surface channel object or whether it would be better to put it in the client_channel filter. But after a bit of consideration, I decided that this is the right place, because it avoids the need for a new method to be added to the filter API to return the tracer.

Note that this does require us to pass the tracer down into the filter stack (presumably via channel args) so that the client_channel filter can access it (and so that the subchannel stack can pass it down to the transport). But we can do that in a separate PR.

At the channel layer is definitely the right placement for this, but it might not be insane to have it owned by the channel stack (since that's common between subchannel and channel).


Comments from Reviewable

@ncteisen
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 42 unresolved discussions, some commit checks failed.


include/grpc/impl/codegen/grpc_types.h, line 245 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

/** Toggles channel tracing behavior. 0 means off. Non zero sets max nodes

  • tracked per tracing object */
    #define GRPC_ARG_CHANNEL_TRACING "grpc.channel_tracing"

If the non-zero value is actually the number of nodes to track, then I suggest calling this something like GRPC_ARG_CHANNEL_TRACING_MAX_NODES, and document the fact that setting the value to 0 disables tracing completely.

Also, please document what the default value is.

Done.


src/core/lib/channel/channel_tracer.c, line 198 at r3 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I think I agree that it makes more sense to return a string. We don't want to expose the grpc_json type through the C-core public API, so we need to convert the parsed tree to a string either here or in the surface channel code. It seems a bit cleaner to do it here.

Done.


src/core/lib/channel/channel_tracer.c, line 56 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

// The tracer object that owns this trace node. This is used to ref and
// unref the tracing object as nodes are added or overwritten
grpc_channel_tracer* subchannel;

The comment seems incorrect here -- this isn't the tracer that owns this node, it's the optional tracer for the subchannel to which this trace node refers.

Done.


src/core/lib/channel/channel_tracer.c, line 58 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I suggest generalizing this to be a reference to some other related tracer, without explicitly mentioning subchannels. That way, we can eliminate the is_parent parameter in several places below and simply treat this as a generic reference, which could work just as well for arbitrarily deep nesting (as long as there are no circular references).

Done.


src/core/lib/channel/channel_tracer.c, line 61 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Any reason this needs to be a separate struct, as opposed to putting these fields directly into grpc_channel_tracer?

Done.


src/core/lib/channel/channel_tracer.c, line 72 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Could use an unsigned type here.

Done.


src/core/lib/channel/channel_tracer.c, line 88 at r4 (raw file):

GRPC_ARG_CHANNEL_TRACING_MAX_NODES
I wanted to log the pointer value, so I think we need two separate blocks here, even though it's not very clean.


src/core/lib/channel/channel_tracer.c, line 115 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I don't understand this comment.

Done.


src/core/lib/channel/channel_tracer.c, line 118 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

How do we know it's safe to create our own exec_ctx here? (Hint: It probably isn't. :) )

I suspect that we actually need to pass the exec_ctx as a parameter to grpc_channel_tracer_unref(). And yes, this means we probably also need to add that parameter to grpc_channel_tracer_add_trace(), since it has to garbage-collect old nodes when the tracer is full.

Done.


src/core/lib/channel/channel_tracer.c, line 124 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

static

Done.


src/core/lib/channel/channel_tracer.c, line 126 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Instead of using a boolean evaluation for pointers, please explicitly compare the value to NULL, so that it's clear to the reader that this is a pointer and not a bool.

Same thing throughout.

Done (though I have always preferred the other way to avoid ==/= typos).


src/core/lib/channel/channel_tracer.c, line 154 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Why make this a separate function, rather than putting this code directly into grpc_channel_tracer_add_trace()?

Done.


src/core/lib/channel/channel_tracer.c, line 164 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

new_trace_node->subchannel = subchannel;
 
GRPC_CHANNEL_TRACER_REF(subchannel);

Can combine these into one line:

new_trace_node->subchannel = GRPC_CHANNEL_TRACER_REF(subchannel);

Done.


src/core/lib/channel/channel_tracer.c, line 167 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Nit: Please avoid unnecessary blank lines within functions. I generally find that whenever you could use a blank line to split up the logical pieces of code, you're better off using a single-line comment to explain what the next part of the code is doing.

Same thing throughout.

Done.


src/core/lib/channel/channel_tracer.c, line 168 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

s/in //

Done.


src/core/lib/channel/channel_tracer.c, line 177 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Nit: I prefer using preincrement over postincrement if we're not actually using the original value. (Yes, I know the compiler will avoid the inefficiency here for us, but I tend to be a bit pedantic about such things.)

Same thing throughout.

Done.


src/core/lib/channel/channel_tracer.c, line 197 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I think these two methods could go right into the JSON library.

Done.


src/core/lib/channel/channel_tracer.c, line 223 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

If the error is GRPC_ERROR_NONE, we should probably skip adding this field.

Done.


src/core/lib/channel/channel_tracer.c, line 227 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Isn't this field supposed to be in RFC-3339 format?

Done.


src/core/lib/channel/channel_tracer.c, line 233 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I think it's sufficient to check if node->subchannel != NULL. There's probably no need for the is_parent parameter.

Done.


src/core/lib/channel/channel_tracer.c, line 258 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

It might be useful for the subchannel ID to include something more informative than just the address of its tracer. How about including the server address to which the subchannel is connected (which we can get from GRPC_ARG_SUBCHANNEL_ADDRESS, although we may want to grab it before the proxy mapper optionally overwrites it)?

Structurally, we could add a char *id field to grpc_channel_tracer, which could be set from the constructor. In the parent, we could set it to the value of GRPC_ARG_SERVER_URI, which would eliminate the need for the is_parent parameter.

Done.


src/core/lib/channel/channel_tracer.c, line 269 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Isn't this field supposed to be in RFC-3339 format?

Done.


src/core/lib/channel/channel_tracer.c, line 294 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Instead of doing a separate iteration here, I suggest collecting the list of referenced tracers as we go through the nodes of the parent tracer. We can then continue adding to the list as we go through the referenced tracers and keep going until we have listed everything that's been referenced anywhere in the tree.

Done.


src/core/lib/channel/channel_tracer.h, line 41 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

#ifdef __cplusplus
extern "C" {
#endif

Is this actually going to be used from C++ somewhere? I would think it would be purely internal to C-core.

Done.


src/core/lib/channel/channel_tracer.h, line 48 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

typedef struct grpc_trace_node grpc_trace_node;
typedef struct grpc_trace_node_list grpc_trace_node_list;

I think these two can move to the .c file, since they're not used outside of this module.

Done.


src/core/lib/channel/channel_tracer.h, line 51 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Nit about the order of declarations here: I suggest declaring grpc_channel_tracer_create() first, followed by the ref and unref methods. That way, you have the equivalent of the constructor and destructor at the top, and all of the methods for acting on the object after that.

Done.


src/core/lib/channel/channel_tracer.h, line 52 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

s/struct grpc_error/grpc_error/

Done.


src/core/lib/channel/channel_tracer.h, line 52 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Suggest renaming the trace parameter to something like msg or data.

Done.


src/core/lib/channel/channel_tracer.h, line 74 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Might be more accurate to just say "Logs all trace data via gpr_log()." There are cases where the log goes to something other than stderr (or is inhibited completely).

Done.


src/core/lib/channel/channel_tracer.h, line 75 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

What's the use-case for this method? Is it just for debugging? Would it make sense to use a preprocessor macro to conditionally enable it?

Done.


src/core/lib/channel/channel_tracer.h, line 77 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

/* Returns the tracing data in the form of a grpc json string.
The string is owned by the caller and must be freed. /
grpc_json
grpc_channel_tracer_get_trace(grpc_channel_tracer* tracer);

The comment is misleading here -- the return value is a parsed JSON tree, not a string.

Done.


src/core/lib/channel/channel_tracer.h, line 82 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

/* Initializes the tracing object with gpr_malloc. The caller has
ownership over the returned tracing object */

Suggest the following slightly clearer wording:

Creates a new tracer. The caller owns a reference to the returned tracer.

Done.


src/core/lib/surface/channel.c, line 177 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Let's use grpc_channel_arg_get_integer() here, to provide input validation.

Done.


src/core/lib/surface/channel.h, line 78 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

See my comment in the gRFC about returning the result as a string instead of a parsed JSON tree.

Done.


test/core/channel/channel_tracing_test.c, line 52 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Suggest calling this parameter expected_num_nodes_logged.

Done.


test/core/util/channel_tracing_utils.c, line 43 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

grpc_json* child = parent->child;
while (child) {
if (child->key && !strcmp(child->key, key)) {
return child;
}
child = child->next;
}

This would be a bit more concise as a for loop:

for (grpc_json* child = parent->child; child != NULL; child = child->next) {
  if (child->key != NULL && strcmp(child->key, key) == 0) return child;
}

Done.


test/core/util/channel_tracing_utils.c, line 56 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

grpc_json* child = arr->child;
while (child) {
count++;
child = child->next;
}

Same here:

for (grpc_json* child = arr->child; child != NULL; child = child->next) {
  ++count;
}

Done.


test/core/util/channel_tracing_utils.h, line 39 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Suggest using "expected" instead of "golden" for these two parameters.

Done.


Comments from Reviewable

@ncteisen
Copy link
Contributor Author

I restructured the JSON formation so that the structure is now completely recursive. All tracers can hold onto a list of referenced children, as long as there are no loops (i.e. the subchannel cannot refer back to the parent). This might be useful if we want to expand the project to track channel -> subchannel -> connection.

I need to make the tests a bit better, but wanted to get eyes on these changes ASAP.

@markdroth
Copy link
Member

This is moving in the right direction. As usual, most of my comments are minor things, but there are a few significant issues. Please let me know if you'd like to discuss anything further. Thanks!


Reviewed 28 of 31 files at r5, 3 of 3 files at r6.
Review status: all files reviewed at latest revision, 35 unresolved discussions, some commit checks failed.


include/grpc/impl/codegen/grpc_types.h, line 246 at r6 (raw file):

/** If set, channel tracing will be turned on. The integer value will set the
maximum trace nodes each tracer will track. The default is 10. */

The "If set, channel tracing will be turned on" implies that the default is 0, which is a bit unclear. I suggest the following alternative wording:

The maximum number of trace nodes to keep in the tracer for each channel or subchannel. The default is 10. If set to 0, channel tracing is disabled.


src/core/lib/channel/channel_tracer.c, line 48 at r6 (raw file):

typedef struct grpc_trace_node grpc_trace_node;
 
// One node of tracing data
struct grpc_trace_node {

Can combine these two statements into one:

typedef struct {
  // ...fields here...
} grpc_trace_node;

src/core/lib/channel/channel_tracer.c, line 192 at r6 (raw file):

}

static char* fmt_time(gpr_timespec tm) {

Please add a comment indicating that this generates a string in RFC-3339 format.


src/core/lib/channel/channel_tracer.c, line 193 at r6 (raw file):

static char* fmt_time(gpr_timespec tm) {
  char buffer[100];

It looks like 20 characters should be enough, including the trailing NUL.


src/core/lib/channel/channel_tracer.c, line 194 at r6 (raw file):

struct tm* tm_info;
tm_info = localtime((const time_t*)&tm.tv_sec);

Can combine these two lines into one.


src/core/lib/channel/channel_tracer.c, line 202 at r6 (raw file):

typedef struct tracer_tracker tracer_tracker;
struct tracer_tracker {

typedef struct {
  // ...fields...
} tracer_tracker;

src/core/lib/channel/channel_tracer.c, line 202 at r6 (raw file):

}

typedef struct tracer_tracker tracer_tracker;

Suggest calling this seen_tracers.


src/core/lib/channel/channel_tracer.c, line 209 at r6 (raw file):

};

static void track_tracer(tracer_tracker* tracker, grpc_channel_tracer* tracer) {

Should we check to see that the tracer is not already in the list before we add it?


src/core/lib/channel/channel_tracer.c, line 209 at r6 (raw file):

};

static void track_tracer(tracer_tracker* tracker, grpc_channel_tracer* tracer) {

Suggest calling this seen_tracers_add().


src/core/lib/channel/channel_tracer.c, line 217 at r6 (raw file):

}

static bool check_tracer_tracked(tracer_tracker* tracker,

Suggest calling this seen_tracers_check().


src/core/lib/channel/channel_tracer.c, line 225 at r6 (raw file):

}

static grpc_json* create_child_find_brother(grpc_json* parent, const char* key,

Instead of having a separate function to do this, how about changing grpc_json_link_child() to find the sibling itself if the parameter is NULL? In other words, the argument to grpc_json_link_child() would basically become a hint for optimization purposes, as opposed to being a required parameter.

Or, alternatively, we could add a function to the JSON library that returns the last child of a given node, and use that to pass in the sibling parameter to grpc_json_link_child().

Yet a third alternative would be to add a last_child field to the grpc_json struct, so that we always track the pointer to the last child. But I'm not sure if that would increase memory usage for any existing callers in an undesirable way.


src/core/lib/channel/channel_tracer.c, line 310 at r6 (raw file):

  grpc_json* channel_data = grpc_json_create_child(
      NULL, json, "channelData", NULL, GRPC_JSON_OBJECT, false);
  grpc_json* children = grpc_json_create_child(channel_data, json, "children",

The gRFC doesn't say anything about a node called "children", so it's not quite clear to me how this is supposed to look.

Note that my suggestion from my last review about making the references more generic instead of being specific to subchannels wasn't necessarily intended to mean that we needed to change the output format -- it is possible for the code here to be more general than the output format supports. However, I'm fine with making the output more generic if you'd like to go that route.

If we do want to make the output format more generic, then I think it should be a flatter format, since our data structure doesn't necessarily guarantee that the tracers will be organized in a strict tree structure. For example, if two tracers A and B refer to the same tracer C, then C is a child of both A and B, so we would need to choose between listing it twice or simply choosing one of the two parents, neither of which seems like a very clear representation. A better alternative would be a flatter structure, where all tracers are provided in a single list, and the references between them are indicated by the "id" fields.

Anyway, there are a lot of options here, but I think we should document the desired format before we start trying to implement code that generates it.


src/core/lib/channel/channel_tracer.c, line 322 at r6 (raw file):

  memset(&tracker, 0, sizeof(tracker));

  recursively_populate_json(tracer, &tracker, json);

Instead of using recursion here, I think we can use a simple iteration approach. That would look something like this:

seen_tracers_add(&tracers_seen, tracer);
for (size_t i = 0; i < tracers_seen.size; ++i) {
  dump_tracer(tracer, &tracers_seen, json);
}

The idea is that dump_tracer() may add additional tracers to tracers_seen as we go, in which case we'll wind up doing additional iterations of this loop.


src/core/lib/channel/channel_tracer.c, line 326 at r6 (raw file):

  gpr_free(tracker.tracers);

  char* json_str = grpc_json_dump_to_string(json, 0);

Why use indent of 0 here? Seems like it might be better to use 1, so that it's a bit more human readable.


src/core/lib/channel/channel_tracer.c, line 333 at r6 (raw file):

char* json_str = grpc_channel_tracer_get_trace(tracer);
grpc_json* json = grpc_json_parse_string(json_str);
char* fmt_json_str = grpc_json_dump_to_string(json, 1);

Why are we parsing the string just to dump it again? Is this just because of the indentation difference? If so, hopefully we can eliminate this by using the same indentation in both places.


src/core/lib/channel/channel_tracer.h, line 51 at r4 (raw file):

Previously, ncteisen (Noah Eisen) wrote…

Done.

Looks like grpc_channel_tracer_add_trace() is still declared before the ctor, ref, and unref methods.

Also, please make sure to use the same declaration order in the .c and .h files.


src/core/lib/channel/channel_tracer.h, line 88 at r6 (raw file):

#endif

#ifdef GRPC_CHANNEL_TRACER_REFCOUNT_DEBUG

This logging function isn't actually related to refcounting. Would it make sense to use a separate preprocessor definition for this, or do you expect that this will only be used when debugging refcounting issues?


src/core/lib/json/json.c, line 70 at r6 (raw file):

grpc_json* grpc_json_link_child(grpc_json* child, grpc_json* brother,
                                grpc_json* parent) {
  if (brother != NULL) brother->next = child;

Should we assert that brother->next == NULL before we set this value?


src/core/lib/json/json.h, line 52 at r6 (raw file):

  /* if set, destructor will free value */
  bool owns_value;

Suggest moving this down below the value field, since it's basically a modifier for how that field is handled.


src/core/lib/json/json.h, line 97 at r6 (raw file):

 * brother parameter.
 */
grpc_json* grpc_json_link_child(grpc_json* child, grpc_json* brother,

Sexism nit: How about using the term "sibling" instead of "brother"?

Same thing throughout.


src/core/lib/json/json.h, line 97 at r6 (raw file):

 * brother parameter.
 */
grpc_json* grpc_json_link_child(grpc_json* child, grpc_json* brother,

I think the parameters would make more sense in the order parent, child, sibling. The parent is the one we're adding to, so it goes first. The child is the one we're adding. And the sibling is a helper for linking purposes.


src/core/lib/json/json.h, line 100 at r6 (raw file):

                                grpc_json* parent);

/* Creates and links a new child json object. */

Please document the parameters here.


src/core/lib/json/json.h, line 101 at r6 (raw file):

/* Creates and links a new child json object. */
grpc_json* grpc_json_create_child(grpc_json* brother, grpc_json* parent,

Instead of having a single function that both creates the child and links it into the parent, how about just having a function that creates the child, and then having the caller pass the result of that function to grpc_json_link_child()? That approach seems like it offers more easily composable building blocks.


src/core/lib/json/json.h, line 101 at r6 (raw file):

/* Creates and links a new child json object. */
grpc_json* grpc_json_create_child(grpc_json* brother, grpc_json* parent,

I think the parameters would make more sense in the order type, key, value, owns_value, so that they match the order in the struct.


src/core/lib/json/json_string.c, line 313 at r6 (raw file):

  if ((status != GRPC_JSON_DONE) && json) {
    gpr_log(GPR_DEBUG, "%d, %d", GRPC_JSON_DONE, status);

Is this something you just added for debugging and forgot to remove, or did you intend to keep it here?


src/core/lib/surface/channel.c, line 77 at r4 (raw file):

Previously, ctiller (Craig Tiller) wrote…

At the channel layer is definitely the right placement for this, but it might not be insane to have it owned by the channel stack (since that's common between subchannel and channel).

It would probably not buy us much to have it owned by the channel stack. The memory usage would be the same, and we'd need an additional access method on the channel stack object. So I think this is fine the way it is.


src/core/lib/surface/channel.c, line 178 at r6 (raw file):

               strcmp(args->args[i].key, GRPC_ARG_CHANNEL_TRACING_MAX_NODES)) {
      // max_nodes defaults to 10, clamped between 0 and 100.
      grpc_integer_options options = {10, 0, 100};

const


src/core/lib/surface/channel.c, line 179 at r6 (raw file):

      // max_nodes defaults to 10, clamped between 0 and 100.
      grpc_integer_options options = {10, 0, 100};
      channel->tracer = GRPC_CHANNEL_TRACER_CREATE(

The current channel args design allows the same arg to be specified more than once. So let's GPR_ASSERT that channel->tracer == NULL before we reassign it.

Also, if the channel arg is not specified at all (i.e., the caller just wants the default), then we will never execute this block, and no tracer will ever be created.


src/core/lib/surface/channel.c, line 180 at r6 (raw file):

      grpc_integer_options options = {10, 0, 100};
      channel->tracer = GRPC_CHANNEL_TRACER_CREATE(
          (size_t)grpc_channel_arg_get_integer(&args->args[i], options),

If this returns 0, shouldn't we skip creating the tracer?


src/core/lib/surface/channel.c, line 181 at r6 (raw file):

      channel->tracer = GRPC_CHANNEL_TRACER_CREATE(
          (size_t)grpc_channel_arg_get_integer(&args->args[i], options),
          target);

Might want to include the address of the channel object (pointer) as well, since there can be multiple channels open to the same target.


test/core/util/channel_tracing_utils.c, line 43 at r6 (raw file):

static grpc_json* get_json_child(grpc_json* parent, const char* key) {
  GPR_ASSERT(parent && parent->child);

Please use != NULL for both operands.

Actually, do we really need to check parent->child != NULL? It seems like we'd just return NULL in that case.


Comments from Reviewable

@ncteisen
Copy link
Contributor Author

Medium sized update to the design after our discussions earlier this week. The tracer API now supports two types of data retrieval. First, there is the fully recursive version, which, given a channel, will return all of its data, as well as all of the data for its referenced subchannels. This is what we had initially discussed.

I have added a version that simply returns the trace for the parent channel. Contained within this trace is nodes that refer to subchannels. Though the full data for the subchannel will not be available, the trace will contain the subchannel's uuid, and those uuids can be used with the new tracer API to query any specific subchannel's data tree. This fits into the ongoing channelz design.

This addition depends on my PR to add the object registry, #10346.


Review status: 43 of 47 files reviewed at latest revision, 35 unresolved discussions, some commit checks failed.


include/grpc/impl/codegen/grpc_types.h, line 246 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

/** If set, channel tracing will be turned on. The integer value will set the
maximum trace nodes each tracer will track. The default is 10. */

The "If set, channel tracing will be turned on" implies that the default is 0, which is a bit unclear. I suggest the following alternative wording:

The maximum number of trace nodes to keep in the tracer for each channel or subchannel. The default is 10. If set to 0, channel tracing is disabled.

Done.


src/core/lib/channel/channel_tracer.c, line 48 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

typedef struct grpc_trace_node grpc_trace_node;
 
// One node of tracing data
struct grpc_trace_node {

Can combine these two statements into one:

typedef struct {
  // ...fields here...
} grpc_trace_node;

Done.


src/core/lib/channel/channel_tracer.c, line 192 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please add a comment indicating that this generates a string in RFC-3339 format.

Done.


src/core/lib/channel/channel_tracer.c, line 193 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

It looks like 20 characters should be enough, including the trailing NUL.

By my count, I think we need 31 with the null char (if we want to keep the same precision we have in timespec):

2017-03-27T14:11:06.234642000Z


src/core/lib/channel/channel_tracer.c, line 194 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

struct tm* tm_info;
tm_info = localtime((const time_t*)&tm.tv_sec);

Can combine these two lines into one.

Done.


src/core/lib/channel/channel_tracer.c, line 202 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Suggest calling this seen_tracers.

Done.


src/core/lib/channel/channel_tracer.c, line 202 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

typedef struct tracer_tracker tracer_tracker;
struct tracer_tracker {

typedef struct {
  // ...fields...
} tracer_tracker;

Done.


src/core/lib/channel/channel_tracer.c, line 209 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Suggest calling this seen_tracers_add().

Done.


src/core/lib/channel/channel_tracer.c, line 209 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Should we check to see that the tracer is not already in the list before we add it?

As of now I enforce that a tracer will never be doubly added in the code that calls seen_tracers_add().


src/core/lib/channel/channel_tracer.c, line 217 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Suggest calling this seen_tracers_check().

Done.


src/core/lib/channel/channel_tracer.c, line 225 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Instead of having a separate function to do this, how about changing grpc_json_link_child() to find the sibling itself if the parameter is NULL? In other words, the argument to grpc_json_link_child() would basically become a hint for optimization purposes, as opposed to being a required parameter.

Or, alternatively, we could add a function to the JSON library that returns the last child of a given node, and use that to pass in the sibling parameter to grpc_json_link_child().

Yet a third alternative would be to add a last_child field to the grpc_json struct, so that we always track the pointer to the last child. But I'm not sure if that would increase memory usage for any existing callers in an undesirable way.

Done.


src/core/lib/channel/channel_tracer.c, line 310 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

The gRFC doesn't say anything about a node called "children", so it's not quite clear to me how this is supposed to look.

Note that my suggestion from my last review about making the references more generic instead of being specific to subchannels wasn't necessarily intended to mean that we needed to change the output format -- it is possible for the code here to be more general than the output format supports. However, I'm fine with making the output more generic if you'd like to go that route.

If we do want to make the output format more generic, then I think it should be a flatter format, since our data structure doesn't necessarily guarantee that the tracers will be organized in a strict tree structure. For example, if two tracers A and B refer to the same tracer C, then C is a child of both A and B, so we would need to choose between listing it twice or simply choosing one of the two parents, neither of which seems like a very clear representation. A better alternative would be a flatter structure, where all tracers are provided in a single list, and the references between them are indicated by the "id" fields.

Anyway, there are a lot of options here, but I think we should document the desired format before we start trying to implement code that generates it.

As discussed, the generalized "children" design fits more closely with the channelz design, which is why I switched to it. It also fully supports our original, 2-level design as well. More on this topic is discussed in the overall review discussion.


src/core/lib/channel/channel_tracer.c, line 322 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Instead of using recursion here, I think we can use a simple iteration approach. That would look something like this:

seen_tracers_add(&tracers_seen, tracer);
for (size_t i = 0; i < tracers_seen.size; ++i) {
  dump_tracer(tracer, &tracers_seen, json);
}

The idea is that dump_tracer() may add additional tracers to tracers_seen as we go, in which case we'll wind up doing additional iterations of this loop.

This structure would lose a nice feature in that a channel data's children will only contain the subtracing information that the parent cares about. For example:

{
  nodes: {"sc 1 created", "sc 5 created", "sc 1 destroyed"}
  children: [
    "sc1":{...}, 
    "sc5":{...}
  ]
}

We can discuss these two structures more offline.


src/core/lib/channel/channel_tracer.c, line 326 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Why use indent of 0 here? Seems like it might be better to use 1, so that it's a bit more human readable.

Done.


src/core/lib/channel/channel_tracer.c, line 333 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

char* json_str = grpc_channel_tracer_get_trace(tracer);
grpc_json* json = grpc_json_parse_string(json_str);
char* fmt_json_str = grpc_json_dump_to_string(json, 1);

Why are we parsing the string just to dump it again? Is this just because of the indentation difference? If so, hopefully we can eliminate this by using the same indentation in both places.

This function has been removed


src/core/lib/channel/channel_tracer.h, line 51 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Looks like grpc_channel_tracer_add_trace() is still declared before the ctor, ref, and unref methods.

Also, please make sure to use the same declaration order in the .c and .h files.

Done.


src/core/lib/channel/channel_tracer.h, line 88 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

This logging function isn't actually related to refcounting. Would it make sense to use a separate preprocessor definition for this, or do you expect that this will only be used when debugging refcounting issues?

removed


src/core/lib/json/json.c, line 70 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Should we assert that brother->next == NULL before we set this value?

restructured the logic, and this was accomplished.


src/core/lib/json/json.h, line 52 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Suggest moving this down below the value field, since it's basically a modifier for how that field is handled.

Done.


src/core/lib/json/json.h, line 97 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Sexism nit: How about using the term "sibling" instead of "brother"?

Same thing throughout.

Done.


src/core/lib/json/json.h, line 97 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I think the parameters would make more sense in the order parent, child, sibling. The parent is the one we're adding to, so it goes first. The child is the one we're adding. And the sibling is a helper for linking purposes.

Done.


src/core/lib/json/json.h, line 100 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please document the parameters here.

Done.


src/core/lib/json/json.h, line 101 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Instead of having a single function that both creates the child and links it into the parent, how about just having a function that creates the child, and then having the caller pass the result of that function to grpc_json_link_child()? That approach seems like it offers more easily composable building blocks.

This function is used in src/core/lib/security/credentials/jwt/json_token.c. I'd like to keep the interface the same for now, and then make this change to the json lib in a future PR.

However, if you think this is the right time to make this change, I can do it.


src/core/lib/json/json.h, line 101 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

I think the parameters would make more sense in the order type, key, value, owns_value, so that they match the order in the struct.

Same answer as previous comment. Happy to enact this change, but then I will have to change a file that is unrelated to this project.


src/core/lib/json/json_string.c, line 313 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Is this something you just added for debugging and forgot to remove, or did you intend to keep it here?

Done.


src/core/lib/surface/channel.c, line 178 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

const

Done.


src/core/lib/surface/channel.c, line 179 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

The current channel args design allows the same arg to be specified more than once. So let's GPR_ASSERT that channel->tracer == NULL before we reassign it.

Also, if the channel arg is not specified at all (i.e., the caller just wants the default), then we will never execute this block, and no tracer will ever be created.

Done.


src/core/lib/surface/channel.c, line 180 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

If this returns 0, shouldn't we skip creating the tracer?

Done.


src/core/lib/surface/channel.c, line 181 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Might want to include the address of the channel object (pointer) as well, since there can be multiple channels open to the same target.

Changed to the uuid design we discussed


test/core/util/channel_tracing_utils.c, line 75 at r4 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

It would be good to also validate that each node has all of the expected fields. At minimum, we can at least check that the nodes have the right set of keys. But ideally, we should actually have the caller pass in an array of structs containing the values we expect to see for each node (minus things that we can't easily validate, like the timestamps), and then verify that the actual values match the expected values.

Done. All keys are checked for.


test/core/util/channel_tracing_utils.c, line 43 at r6 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Please use != NULL for both operands.

Actually, do we really need to check parent->child != NULL? It seems like we'd just return NULL in that case.

done


Comments from Reviewable

@ncteisen
Copy link
Contributor Author

Oops, I responded to the comments but forgot to push the commit...

@markdroth
Copy link
Member

Reviewed 5 of 53 files at r8, 50 of 50 files at r10.
Review status: all files reviewed at latest revision, 18 unresolved discussions, some commit checks failed.


src/core/lib/channel/channel_tracer.c, line 66 at r6 (raw file):

  gpr_refcount refs;
  gpr_mu tracer_mu;
  const char* id;

Even with a uuid for each object, I think we're still going to want a human-readable name for it, so that people can understand the data.


src/core/lib/channel/channel_tracer.c, line 193 at r6 (raw file):

Previously, ncteisen (Noah Eisen) wrote…

By my count, I think we need 31 with the null char (if we want to keep the same precision we have in timespec):

2017-03-27T14:11:06.234642000Z

Ah, right -- I'd somehow forgotten to include the fractional seconds.


src/core/lib/channel/channel_tracer.c, line 209 at r6 (raw file):

Previously, ncteisen (Noah Eisen) wrote…

As of now I enforce that a tracer will never be doubly added in the code that calls seen_tracers_add().

I think this is tied in with the discussion below about whether we use a recursive or iterative approach here.


src/core/lib/channel/channel_tracer.c, line 310 at r6 (raw file):

Previously, ncteisen (Noah Eisen) wrote…

As discussed, the generalized "children" design fits more closely with the channelz design, which is why I switched to it. It also fully supports our original, 2-level design as well. More on this topic is discussed in the overall review discussion.

I don't see any updates to the design gRFC that cover this.


src/core/lib/channel/channel_tracer.c, line 322 at r6 (raw file):

Previously, ncteisen (Noah Eisen) wrote…

This structure would lose a nice feature in that a channel data's children will only contain the subtracing information that the parent cares about. For example:

{
  nodes: {"sc 1 created", "sc 5 created", "sc 1 destroyed"}
  children: [
    "sc1":{...}, 
    "sc5":{...}
  ]
}

We can discuss these two structures more offline.

I don't understand this. Why would the iterative approach generate a different set of data than the recursive approach?

Let's definitely chat about this.


src/core/lib/channel/channel_tracer.c, line 201 at r10 (raw file):

}

typedef struct seen_tracers {

No need for seen_tracers on this line.


src/core/lib/channel/channel_tracer.c, line 251 at r10 (raw file):

    child = grpc_json_create_child(child, json, "uuid", uuid_str,
                                   GRPC_JSON_NUMBER, true);
    if (children && !seen_tracers_check(tracker, node->referenced_tracer)) {

children != NULL


src/core/lib/channel/channel_tracer.h, line 93 at r10 (raw file):

char* grpc_channel_tracer_render_trace(grpc_channel_tracer* tracer,
                                       bool recursive);
/* util functions that perform the uuid -> tracer step for you, and then

This description is a little unclear. What is "the uuid -> tracer step"? What is the return value? Does the caller take ownership of the result?


src/core/lib/json/json.h, line 101 at r6 (raw file):

Previously, ncteisen (Noah Eisen) wrote…

This function is used in src/core/lib/security/credentials/jwt/json_token.c. I'd like to keep the interface the same for now, and then make this change to the json lib in a future PR.

However, if you think this is the right time to make this change, I can do it.

I think we can do both: Provide the two composable functions, and then provide a wrapper that does the composition for you.


src/core/lib/json/json.h, line 101 at r6 (raw file):

Previously, ncteisen (Noah Eisen) wrote…

Same answer as previous comment. Happy to enact this change, but then I will have to change a file that is unrelated to this project.

I think it's fine to change the other file as part of this PR.


src/core/lib/json/json.h, line 99 at r10 (raw file):

grpc_json* grpc_json_link_child(grpc_json* parent, grpc_json* child, grpc_json* sibling);

/* Creates a child json object into the parent's json tree then links it in

What does it mean to "create into "?


src/core/lib/surface/channel.c, line 179 at r6 (raw file):

Previously, ncteisen (Noah Eisen) wrote…

Done.

It looks like we're still not creating the tracer if the channel arg is not specified at all.


test/core/channel/channel_tracing_test.c, line 54 at r10 (raw file):

}

// checks for the existence of all the required members of the tracer.

It doesn't look like the code actually does this. I think it currently just checks the values of num_nodes_logged and num_nodes.


test/core/util/channel_tracing_utils.c, line 75 at r4 (raw file):

Previously, ncteisen (Noah Eisen) wrote…

Done. All keys are checked for.

Let's at least add a TODO about checking the values.


Comments from Reviewable

@ncteisen
Copy link
Contributor Author

After attempting a few merges and even a rebase, I think the easiest course is to close this and start anew. There have been too many large changes and renaming to do a merge.

@ncteisen ncteisen closed this Dec 28, 2017
ncteisen added a commit to ncteisen/grpc that referenced this pull request Dec 28, 2017
Picks up work from grpc#10259.
A merge was impossible due to the many sweeping changed that
have occured since I last touched that PR (c++-ization, exec_ctx,
reorganitation of filters, etc).
@lock lock bot locked as resolved and limited conversation to collaborators Jan 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants