-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Makes Lens more tolerant of missing localEndpoint.serviceName or .name #2834
Conversation
Should we add the offending trace to the folder of sample data? |
horrible-trace.json? :D |
2d46a79
to
a209cea
Compare
PS it is tempting to make the span-cleaner fill empty names and such, but it actually would spread debt into other things like clock skew and dependency linker which looks for presence of service names. Instead, I made it a goal to make the white screen go away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bungle @jeremyjpj0916 the comments on this part of the review are some hints about data problems for Kong/kong-plugin-zipkin#49 Hope they help.
"parentId": "d65495313f944801", | ||
"id": "d5cb70daead478dc", | ||
"name": "kong.header_filter", | ||
"timestamp": 1570005203864000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should have been an annotation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on this a little bit? The Zipkin docs make it very clear that the timestamp
field exists, and also say:
The bottom-line is that choosing not to record Span.timestamp and duration will result in less accurate data and less functionality. Since it is very easy to record these authoritatively before reporting, all Zipkin instrumentation should do it or ask someone to help them do it.
Wouldn't removing the timestamp
field making Zipkin less accurate?
Also: can I name an annotation timestamp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what's missing here is that a span can include annotations.. annotations have a required timestamp. So instead of making a sub-span that only includes a name and a timestamp, make an annotation with that name and timestamp.
ex
{
"traceId": "6cf5624e3452d1bc202a57dad35921c0",
"parentId": "ed395cb866d02665",
"id": "d65495313f944801",
"kind": "CLIENT",
"name": "get /users/{userId}",
"timestamp": 1570005203829000,
"duration": 35000,
--snip--
"annotations": [
{"name": "kong.header_filter", "timestamp": 1570005203864000}
]
"parentId": "d65495313f944801", | ||
"id": "80b8efd52d66af27", | ||
"name": "kong.body_filter", | ||
"timestamp": 1570005203864000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should have been an annotation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are spans in Kong; I recall that when they execute in under a millisecond then they appear as a 0-length span in zipkin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minimum duration is 1us, so prefer to use that vs rounding down to zero, especially if precision is millis (1000us) on the kong side. Zero implies incomplete.
The main thing is to differentiate annotations (timestamped events on a timeline) from spans (some wrapped thing with a duration).
"parentId": "ed395cb866d02665", | ||
"id": "2a49858b1f1d506b", | ||
"name": "kong.rewrite", | ||
"timestamp": 1570005203829000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should have been an annotation
"traceId": "6cf5624e3452d1bc202a57dad35921c0", | ||
"parentId": "d65495313f944801", | ||
"id": "3e53779bdde7c111", | ||
"name": "kong.access", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is missing localEndpoint.serviceName or this should be an annotation, and the duration moved to where it is missing..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at the examples in Zipkin's source code, I think this means that the span should look like so:
{
"traceId": "6cf5624e3452d1bc202a57dad35921c0",
"parentId": "d65495313f944801",
"id": "3e53779bdde7c111",
"name": "kong.access",
"localEndpoint": {
"serviceName": "<name of the Kong Service matched for this Request>",
},
"timestamp": 1570005203829000,
"duration": 16000
},
I think I can try to add that. But I have questions:
- Is
localEndpoint.serviceName
also needed in the other spans, or just on this one? - What does "the duration moved to where it is missing" mean? That the other spans should also have a duration? That the
duration
field on this particular span should be put in a different place on this span (where)? Or is it both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the client span should be complete, that's the most important one, but all spans should be complete. The subspans, some of them have duration. It was unclear to me why these are modeled as spans vs events.
all spans should have the exact same localEndpoint value if coming from the same node. Most of the time folks just take the link local IP, the advertised port and service name and treat that as a constant.
"tags": { | ||
"kong.route": "aaa20d62-813a-428f-aa40-447324cd30af", | ||
"kong.service": "872eae11-4571-490b-86d2-59834b93f870", | ||
"peer.hostname": "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't add empty hostname
"traceId": "6cf5624e3452d1bc202a57dad35921c0", | ||
"id": "ed395cb866d02665", | ||
"kind": "SERVER", | ||
"name": "kong.request", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad span name. prefer route-based ex get /users/{userId}
or even get
as kong is already tagged
"parentId": "ed395cb866d02665", | ||
"id": "d65495313f944801", | ||
"kind": "CLIENT", | ||
"name": "kong.proxy", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad span name. prefer route-based ex get /users/{userId} or even get as kong isn't helpful
what you are describing here, seems to be better as a tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this apply to the other span names as well? (kong.access
, kong.header_filter
, kong.body_filter
, kong.rewrite
, kong.access
) or is it just kong.proxy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the client span should be semantically correct for http. It is better to have a coherent client span than any of the other spans mentioned. As you noticed, they didn't render and actually made the diagram more confusing vs if they didn't exist at all. My advice is probably use one client span and render the rest with annotations if it is commonly the case you cannot record a duration of precision less than 1ms.
"component": "kong", | ||
"http.method": "GET", | ||
"http.status_code": "200", | ||
"http.url": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't add empty url
"port": 3771 | ||
}, | ||
"tags": { | ||
"component": "kong", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conventionally zipkin uses "lc" for "local component"
"parentId": "d65495313f944801", | ||
"id": "862d0f978033cc53", | ||
"name": "kong.balancer", | ||
"timestamp": 1570005203845000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing duration. probably better as an annotation named kong.balancer.try1
as that should be bounded, unless there's a chance of infinite tries. That or make this the client span if this is the actual wire request.. hard to tell what's going on (might need a wiki)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @adriancole, I am trying to fix the issues reported on the Kong plugin. Thanks for reporting the problems to us. I left some questions on this PR, do you think you could answer those?
"parentId": "d65495313f944801", | ||
"id": "d5cb70daead478dc", | ||
"name": "kong.header_filter", | ||
"timestamp": 1570005203864000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on this a little bit? The Zipkin docs make it very clear that the timestamp
field exists, and also say:
The bottom-line is that choosing not to record Span.timestamp and duration will result in less accurate data and less functionality. Since it is very easy to record these authoritatively before reporting, all Zipkin instrumentation should do it or ask someone to help them do it.
Wouldn't removing the timestamp
field making Zipkin less accurate?
Also: can I name an annotation timestamp
?
"traceId": "6cf5624e3452d1bc202a57dad35921c0", | ||
"parentId": "d65495313f944801", | ||
"id": "3e53779bdde7c111", | ||
"name": "kong.access", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at the examples in Zipkin's source code, I think this means that the span should look like so:
{
"traceId": "6cf5624e3452d1bc202a57dad35921c0",
"parentId": "d65495313f944801",
"id": "3e53779bdde7c111",
"name": "kong.access",
"localEndpoint": {
"serviceName": "<name of the Kong Service matched for this Request>",
},
"timestamp": 1570005203829000,
"duration": 16000
},
I think I can try to add that. But I have questions:
- Is
localEndpoint.serviceName
also needed in the other spans, or just on this one? - What does "the duration moved to where it is missing" mean? That the other spans should also have a duration? That the
duration
field on this particular span should be put in a different place on this span (where)? Or is it both?
"parentId": "ed395cb866d02665", | ||
"id": "d65495313f944801", | ||
"kind": "CLIENT", | ||
"name": "kong.proxy", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this apply to the other span names as well? (kong.access
, kong.header_filter
, kong.body_filter
, kong.rewrite
, kong.access
) or is it just kong.proxy
?
See #2829