Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support XRay features (Sql, Cause, aws) in Encoder #59

Merged
merged 12 commits into from
Dec 4, 2017

Conversation

cemo
Copy link
Contributor

@cemo cemo commented Nov 4, 2017

No description provided.

@cemo
Copy link
Contributor Author

cemo commented Nov 4, 2017

Please don't merge yet. I am still testing some cases since span.remoteServiceName() might be null in some cases.

@cemo cemo force-pushed the b-xray-subsegment-naming branch 2 times, most recently from 144e823 to 493f549 Compare November 6, 2017 15:09
@cemo cemo force-pushed the b-xray-subsegment-naming branch 3 times, most recently from cd6c570 to 97c3223 Compare November 7, 2017 18:11
@cemo cemo force-pushed the b-xray-subsegment-naming branch 2 times, most recently from 27dfb33 to c3090d9 Compare November 8, 2017 07:20
Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like good work. We are accumulating some tech-debt. Can you try to add a unit test? for example, in the test instantiate a zipkin2.Span with the input data and test the output json on string equality (ex assertThat(json).isEqualTo("{...}").

@@ -55,8 +57,13 @@
|| span.kind() != Span.Kind.SERVER && span.kind() != Span.Kind.CONSUMER) {
writer.name("type").value("subsegment");
if (span.kind() != null) writer.name("namespace").value("remote");
writer.name("name").value(span.remoteServiceName() == null ? "" : span.remoteServiceName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to write empty name? or is leaving out name tolerable? If possible, I'd prefer to not write placeholder values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test it and let you know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when it is null or empty string, it ignores span so it does not appear at all. There is two possibility when it is null or empty:

  1. name it "unknown"
  2. ignore span

Which one seems better for you @adriancole ?

}
writer.name("name").value(span.localServiceName());
// override with the user remote tag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do it like this. we are setting the value of the xray field "namespace" to the value of a zipkin key "remote".

per amazon "Any calls to a remote service/broker should be namespace as ‘remote’. See ‘Optional Subsegment Fields’ under http://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-subsegments."

So, we should automatically default "namespace" to "remote" when it is a downstream PRODUCER or CLIENT span. If you look in the docs, the other supported choice is "aws". We don't support that in brave, yet, but we could.. in that case, I'd recommend using the zipkin key "xray.namespace" as a way to conditionally override.

make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xray.namespace sound good but all other variables are using something similar to JSON path dot notation.

I will change it as you suggest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I instrument a method (It might not be available at the time of this writing), what kind of span it will be? Since it is not connecting something else, I have some doubts on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im referring about a tag key ex span.tag("xray.namespace", "aws"). Reporter only sends complete spans.. how would we have a problem in routine use? For example there are only two values remote (we know) and aws (solved as soon as we write aws sdk instrumentation or integrate with it). Users will not do this manually in other words

import java.io.IOException;
import okio.Buffer;

public final class XRayFormatter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docs to how you came up with the format.. also make the type not public unless required to be. If required to be, mention why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a utility class, basically brave.SpanCustomizer#tag is using <String, String> we can not pass an exception to it. So I had to prepare JSON to be passed to subsegments. Here is the place which is used https://github.com/openzipkin/zipkin-aws/pull/59/files#diff-5669c8fcacad45292f4bc1d457640d24R243. This is the dirtiest place since it is not using JsonWriter. Could you review this part again please? If you are agree with this solution I will come up some documents regarding your notes before you stated.

Here is the relevant docs on XRay as well: The structure is deeply nested and there is not an easy solution with the current state.
http://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-errors
I had also decompiled XRay source codes and found that place overly complicated.

That formatter class is not something very nice but it gives some information about exception and also let the user to use another logic.

@cemo
Copy link
Contributor Author

cemo commented Nov 8, 2017

I will add more test regarding Encoder.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Nov 9, 2017 via email

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Nov 9, 2017 via email

*
* Formating an exception to be consumed by XRay by Spans annotations
*
* @param exceptionId an unique exception id @see brave.internal.Platform#randomLong()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an ID? Sometimes I have seen hashes for this .. ex hash of the stack trace. Either way i would hide this type, using internally in aws specific http and sql parsers. Hard to change public types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc: id – A 64-bit identifier for the exception, unique among segments in the same trace, in 16 hexadecimal digits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably can reuse span ID as zipkin instrumentation only record one error tag per span

* Formating an exception to be consumed by XRay by Spans annotations
*
* @param exceptionId an unique exception id @see brave.internal.Platform#randomLong()
* @param isRemote Any calls to a remote service/broker should be true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presume applies to client and server side?

Copy link
Contributor Author

@cemo cemo Nov 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc: remote – boolean indicating that the exception was caused by an error returned by a downstream service.

*
* @param exceptionId an unique exception id @see brave.internal.Platform#randomLong()
* @param isRemote Any calls to a remote service/broker should be true
* @param maxStackTraceElement XRay has a limit of 60KB per segment. Choose a wise number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be global configurable as ex transports have limits that can be much smaller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I configure this number since the method is static? Should I declare another static variable can be changeable? Or should I make it static class and force users to create an instance of it?

.shared(false)
.build();

byte[] bytes = UDPMessageEncoder.doEncode(span);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit i would extract a method that only encodes span json as that lets us in future do json asserts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got your point. I made it JSON assertions friendly.

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to defer exception mapping as it is complex to map and includes state assumptions.. it is better imho to get the easier stuff done and landed before harder stuff. What do you think? Punt the stack trace parsing to a separate issue (which we can follow up on)

@@ -121,11 +201,56 @@
writer.endObject();
}

Integer errorStatus = httpResponseStatus;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have standard http tag for this. http.status_code I believe. For input tags, reuse zipkin ones and namespace (ex xxxx.foo) where exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an http error and available http.status_code, I am using it. But there might be some other errors as well. For example a DB exception or something else. For that part our instrumentation codes must provide error_status.

Copy link
Contributor Author

@cemo cemo Nov 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For input tags, reuse zipkin ones and namespace (ex xxxx.foo) where exist

I could not get this, can you expand please

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly it is getting cognitively difficult to follow things. Let's try not to move ahead of the ball, as there's an advantage in x-ray which is that it is model based. Instead of creating error_status which isn't defined anywhere in zipkin, for possible correlated use between http and sql, perhaps define them separately, in namespaces such as http.status_code (and whatever it is for sql, ex sql.status_code). Eventhough I have more time than others I'm having trouble following this, so lets try and keep it organized as possible and using zipkin conventions whereever, and without support for hypothetical things (ex only code things you have a test case and actual brave instrumentation to support)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be other services can cause status_code. How can I add status_code for a redis call or a method instrumentation. If you believe that it is confusing. Let's
change error_status to xray.error_status since it is a top level property of Subsegments.

@codefromthecrypt
Copy link
Member

By the way, it is indeed easier to review seeing things applied. I think we were discussing making a brave-instrumentation-xray module which would produce the tags consumed here. Seeing together makes it easier to tell style impact. How do you feel about creating that as a maven module? Would you need a hand? Cc @devinsba

@codefromthecrypt
Copy link
Member

in zipkin, a generic error is the tag "error" and the UI pays attention to it. for redis, it would likely be redis.error_status if there is something more specific. Lets' please stick to things concrete, it will be more smooth for both of us to only focus on things we can test

@cemo
Copy link
Contributor Author

cemo commented Nov 9, 2017

OK, I removed it as well. It is better to address it in a separated issue.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Nov 9, 2017 via email

@cemo
Copy link
Contributor Author

cemo commented Nov 9, 2017

No problem @adriancole. It was my pleasure. You are leading this community, the things we are doing nothing when it is compared with your work. I am just new to this land and had some difficulties to grab context. :)

@cemo cemo changed the title Fixed xray subsegment naming Support XRay features in Encoder Nov 10, 2017
@cemo cemo changed the title Support XRay features in Encoder Support XRay features (Sql, Cause, aws) in Encoder Nov 10, 2017
@cemo
Copy link
Contributor Author

cemo commented Nov 10, 2017

@adriancole is there anything else I can provide for this pr?

Copy link
Member

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test nit. Please add http server and client tests based on data brave produces.

Also remove the error section as it isnt tested or referenced yet. Reintroduce it later.

Good stuff will merge after above

.newBuilder()
.kind(Span.Kind.CLIENT)
.name("test-cemo")
.remoteEndpoint(Endpoint.newBuilder().build())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty endpoint is not normal. Better left unset

}

@Test
public void doEncodeUnkown() throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe encode_nameAsUnknownWhenRemoteServiceNameNull

@codefromthecrypt
Copy link
Member

hi, @cemo sorry if I've asked a lot. Let me know if you need a hand wrapping up last things I mentioned.

@cemo
Copy link
Contributor Author

cemo commented Nov 20, 2017

No problem but I could not find time to complete. I would be glad if you polish and deliver it.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Nov 20, 2017 via email

@codefromthecrypt codefromthecrypt merged commit d282aca into openzipkin:master Dec 4, 2017
@codefromthecrypt
Copy link
Member

I will do some polishing post-merge.. promise

// using "unknown" subsegment name will help to detect missing names
writer.name("name").value(span.remoteServiceName() == null ? "unknown" : span.remoteServiceName());
}else{
writer.name("name").value(span.localServiceName());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, I am now getting all my traces labelled as "unkown", previously they used to obey the localServiceName(). How is remoteServiceName supposed to be set? I usually set the localServiceName through Tracing.newBuilder().localServiceName, but there is no similar method for remote service name.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Dec 5, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants