Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Add tracing instrumentation #289

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

slashmo
Copy link

@slashmo slashmo commented Aug 4, 2020

@swift-server-bot
Copy link

Can one of the admins verify this patch?

3 similar comments
@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@swift-server-bot
Copy link

Can one of the admins verify this patch?

@slashmo
Copy link
Author

slashmo commented Aug 4, 2020

FYI, this is how a request currently looks like in my example project:

InstrumentationSystem.bootstrap(FakeTracer())

let client = HTTPClient(eventLoopGroupProvider: .createNew)
var context = BaggageContext()
// Extracted from a prior request
context.fakeTraceID = "123456789"

var request = try HTTPClient.Request(url: "https://httpbin.org/anything", method: .POST, body: .string(#"{"hello":"World"}"#))
request.headers.add(name: "Content-Type", value: "application/json")

let response = try client
    .execute(request: request, context: context)
    .wait()

In the background, FakeTracer was automatically used to create (and end) a Span, with the following attributes being set:

[
  "http.request_content_length": TracingInstrumentation.SpanAttribute.int(17),
  "http.status_code": TracingInstrumentation.SpanAttribute.int(200),
  "http.target": TracingInstrumentation.SpanAttribute.string("/anything"), 
  "http.host": TracingInstrumentation.SpanAttribute.string("httpbin.org"), 
  "http.status_text": TracingInstrumentation.SpanAttribute.string("OK"),
  "http.scheme": TracingInstrumentation.SpanAttribute.string("https"),
  "http.response_content_length": TracingInstrumentation.SpanAttribute.int(433),
  "http.method": TracingInstrumentation.SpanAttribute.string("POST")
]

It also added a trace header to the request containing the fakeTraceID.

Comment on lines +235 to +236
public func get(url: String, context: BaggageContext, deadline: NIODeadline? = nil) -> EventLoopFuture<Response> {
return self.execute(.GET, url: url, context: context, deadline: deadline)
Copy link

@pokryfka pokryfka Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for backward compatibility, the context should be optional or have default (static?) value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, contexts should never be defaulted to nothing/empty - that leads to fragile APIs where dropping the traces becomes the default operation, rather than doing the right thing -- keep passing along contexts. We want our APIs to help people do the right thing. Note in a same way, Go's context values must never be defaulted to empty, we're in the same boat.

It's another discussion how and when this PR will be merged though; I do strongly believe contexts must not be optional, as that leads to breaking traces accidentally (been there, done that). See discussions in https://forums.swift.org/t/server-distributed-tracing/37464/10 and the guidelines in https://github.com/slashmo/gsoc-swift-baggage-context#argument-namingpositioning

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. This will, however, affect integration in a big way.

Please see my comment about an integration branch below.

Copy link
Contributor

@ktoso ktoso Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure it will, it breaks API, we're aware.

My wording here was not very nice, sorry about that.

What I mean that this is a PoC branch and it's breaking here. An adoption plan will be laid forward, see https://forums.swift.org/t/the-context-passing-problem/39162/27 (AHC plan incomming there)

deadline: NIODeadline? = nil,
logger originalLogger: Logger?) -> Task<Delegate.Response> {
let logger = (originalLogger ?? HTTPClient.loggingDisabled).attachingRequestInformation(request, requestID: globalRequestID.add(1))
// TODO: net.peer.ip / Not required, but recommended
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktoso @slashmo could this is be propagated in baggage shared by tracers (??)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that its typically propagated in X-Forwarded-For header

@pokryfka
Copy link

pokryfka commented Aug 4, 2020

Awesome 👍

@pokryfka
Copy link

pokryfka commented Aug 4, 2020

@ktoso

do you think its possible and makes sense to create feature/instrumentation integration branch in async-http-client?

@ktoso
Copy link
Contributor

ktoso commented Aug 4, 2020

do you think its possible and makes sense to create feature/instrumentation integration branch in async-http-client?

Maybe an experimental/tracing branch could be a good idea indeed... so people can more earily try things out as we work on this, and before it actually fully lands hm (esp since it has API impact).

WDYT @Lukasa @artemredkin @tomerd ?

public func delete(url: String, deadline: NIODeadline? = nil, logger: Logger) -> EventLoopFuture<Response> {
return self.execute(.DELETE, url: url, deadline: deadline, logger: logger)
public func delete(url: String, context: BaggageContext, deadline: NIODeadline? = nil) -> EventLoopFuture<Response> {
return self.execute(.DELETE, url: url, context: context, deadline: deadline)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's an interesting question here: so... deadline was defaulted in the public API, but on execute it's not...
Technically our rules say "context last, before defaulted" so... Technically this would mean self.execute(.DELETE, url: url, deadline: deadline, context: context) I wonder if it matters / if it is worth it to be strict about this hm...

span.attributes.http.host = request.host
if let requestContentLength = request.body?.length {
span.attributes.http.requestContentLength = requestContentLength
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is http.url attribute in OTel example

http.url "https://example.com:8080/webshop/articles/4?s=1"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reread the Otel definition for HTTP client spans. Your correct, the combination of host/scheme/target is not their recommended way for client spans: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/http.md#http-client

One of the following sets of attributes is required (in order of usual preference unless for a particular web client/framework it is known that some other set is preferable for some reason; all strings must be non-empty):

  • http.url
  • http.scheme, http.host, http.target
  • http.scheme, net.peer.name, net.peer.port, http.target
  • http.scheme, net.peer.ip, net.peer.port, http.target

Therefore I'll be replacing scheme/host/target with the full url. Thanks for pointing this out 👍

@Lukasa
Copy link
Collaborator

Lukasa commented Aug 5, 2020

Maybe an experimental/tracing branch could be a good idea indeed... so people can more earily try things out as we work on this, and before it actually fully lands hm (esp since it has API impact).

I'm fine with that. By the way, right now this patch doesn't just have API impact, it's API breaking. I'd like to find a way to avoid that if we can.

@slashmo
Copy link
Author

slashmo commented Aug 6, 2020

I'd like to find a way to avoid that if we can.
@Lukasa

We briefly chatted about the reasoning of having it API breaking here: slashmo/gsoc-swift-baggage-context#8

@Lukasa
Copy link
Collaborator

Lukasa commented Aug 6, 2020

I don't think that conversation does a good job of making the case.

@ktoso
Copy link
Contributor

ktoso commented Aug 7, 2020

The topic is a bit deeper than "this breaks API so it's bad", it is what is the end goal we want to head towards, and if we agree on this, we can talk more how we get there. This was touched upon in various places but as Tanner just opened up https://forums.swift.org/t/the-context-passing-problem/39162 which also relates to this, perhaps we use that forums thread to discuss this again, and then decide how to roll changes into the client here -- sounds good?

I do not think the end goal should be "context is optional" that causes all kinds of headaches in "where did I drop my trace / context" that are hard to debug (been there, it's a pain), and confirmed by others in: https://forums.swift.org/t/server-distributed-tracing/37464/28?u=ktoso (last replies, and also lessons learnt from how the Go context works and is used IMO); and here where we're in agreement with Johannes that those APIs will change in the future #227 (comment)

Long story short: Yes, context parameter replaces logger parameter, because of the insistence of the "pass a logger" yet context is the actual object that has tracing metadata so so it is the actual thing that needs passing around more than a logger even, but since we also want to pass loggers, we need e.g. a composite context protocol, like designed here: slashmo/gsoc-swift-baggage-context#9 - so we can pass one parameter, and not logger + context all the time. I believe it was very much known up front with Johannes's work and merging the logger parameter that there would have to be an API breaking change once tracing/logging is "done", I talked with Johannes about this at length back then.

As mentioned above, let's first continue in the thread that Tanner kicked off, agree there and loop back here, sounds good?

@ktoso ktoso changed the title Add tracing instrumentation [DO NOT MERGE] Add tracing instrumentation Aug 7, 2020
@ktoso
Copy link
Contributor

ktoso commented Aug 7, 2020

I added a do-not-merge marker, perhaps that was partially the confusion here as well? This PR is not intended to be merged as is.

@@ -64,7 +64,8 @@ extension HTTPClientTests {
("testNoResponseWithIgnoreErrorForSSLUncleanShutdown", testNoResponseWithIgnoreErrorForSSLUncleanShutdown),
("testWrongContentLengthForSSLUncleanShutdown", testWrongContentLengthForSSLUncleanShutdown),
("testWrongContentLengthWithIgnoreErrorForSSLUncleanShutdown", testWrongContentLengthWithIgnoreErrorForSSLUncleanShutdown),
("testEventLoopArgument", testEventLoopArgument),
// TODO: Comment back in once failure was resolved
// ("testEventLoopArgument", testEventLoopArgument),
Copy link
Author

@slashmo slashmo Aug 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs investigation as it currently leads to a failing precondition. Commented out, for now, to be able to run the full test-suite.

@@ -88,7 +89,8 @@ extension HTTPClientTests {
("testUncleanShutdownCancelsTasks", testUncleanShutdownCancelsTasks),
("testDoubleShutdown", testDoubleShutdown),
("testTaskFailsWhenClientIsShutdown", testTaskFailsWhenClientIsShutdown),
("testRaceNewRequestsVsShutdown", testRaceNewRequestsVsShutdown),
// TODO: Comment back in once failure was resolved
// ("testRaceNewRequestsVsShutdown", testRaceNewRequestsVsShutdown),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs investigation as it currently leads to a failing assertion. Commented out, for now, to be able to run the full test-suite.

Comment on lines +152 to +157
extension SpanStatus {
/// Map status code to canonical code according to OTel spec
///
/// - SeeAlso: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/http.md#status
init(_ responseStatus: HTTPResponseStatus) {
switch responseStatus.code {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mapping of HTTP status code (UInt) to SpanStatus is very much reusable and as such should be provided in TracingInstrumentation, sth like

SpanStatus(code: UInt, message: String?)

otherwise each library making HTTP calls and not using AHC will need to map it on its own

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, this reminded me I had an not submitted review here, yeah this seems like a good candidate to move up into OpenTelemetryInstrumentationSupport/OpenTelemetrySemanticConventions 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or not such a good candidate after all 😁👉 slashmo/gsoc-swift-tracing#134

Copy link
Contributor

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(seems I forgot to hit submit on a review here I did a while ago, submitted)

@@ -88,7 +88,7 @@ extension HTTPClientTests {
("testUncleanShutdownCancelsTasks", testUncleanShutdownCancelsTasks),
("testDoubleShutdown", testDoubleShutdown),
("testTaskFailsWhenClientIsShutdown", testTaskFailsWhenClientIsShutdown),
("testRaceNewRequestsVsShutdown", testRaceNewRequestsVsShutdown),
// ("testRaceNewRequestsVsShutdown", testRaceNewRequestsVsShutdown),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

self = SpanStatus(canonicalCode: .unknown, message: responseStatus.reasonPhrase)
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should move to OpenTelemetryInstrumentationSupport?

Btw, been thinking if that should be called OpenTelemetrySemanticConventions?

@ktoso ktoso changed the base branch from master to main August 20, 2020 01:30
@ktoso
Copy link
Contributor

ktoso commented Aug 20, 2020

Just as a heads up, the main development branch has been changed to main, following the Swift policy on this.

This PR has been re-targeted to main and should just work. However when performing rebases etc please keep this in mind -- you may want to fetch the main branch and rebase onto the main branch from now on since master is not up-to-date anymore and is going to be deleted shortly.

Comment on lines 2647 to 2725
let span = TestSpan(operationName: operationName,
kind: kind,
startTimestamp: timestamp,
context: context.baggage)
recordedSpans.append(span)
return span
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the context should be updated, see slashmo/gsoc-swift-tracing#126 (comment)

var request = request
InstrumentationSystem.instrument.inject(span.context.baggage, into: &request.headers, using: HTTPHeadersInjector())

let logger = context.logger.attachingRequestInformation(request, requestID: globalRequestID.add(1))
Copy link

@pokryfka pokryfka Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktoso @slashmo

this updates an instance of logger,
should it not set requestId in baggage (which then would be used to setup logger)?

otherwise the requestId metadata will be lost in the logger passed via context

Copy link

@pokryfka pokryfka Aug 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(copied from soto-project/soto-core#348 (comment))

how about sth like:

    private struct RequestMetadata: CustomStringConvertible {
        var requestId: Int
        var service: String
        var operation: String

        var description: String {
            "aws-request-id=\(requestId) aws-service=\(service) aws-operation=\(operation)"
        }
    }
    
    private enum RequestKey: BaggageContextKey {
        typealias Value = RequestMetadata
        var name: String { "AWSRequest" }
    }

then updating the baggage like:

context.baggage[RequestKey.self] = RequestMetadata(requestId: Self.globalRequestID.add(1),
    service: serviceConfig.service,
    operation: "signURL")

instead of updating the logger directly (current):

context.logger = context.logger.attachingRequestId(Self.globalRequestID.add(1), operation: "signURL", service: serviceConfig.service)

@slashmo slashmo force-pushed the feature/instrumentation branch 3 times, most recently from 45e6adf to 577f205 Compare October 1, 2020 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Record HTTP requests made by AsyncHTTPClient Make BaggageContext part of AsyncHTTPClient API
5 participants