Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls instrumentation in @opentelemetry/instrumentation-net tries to operate on closed span #1775

Closed
Zirak opened this issue Nov 6, 2023 · 4 comments · Fixed by #1819
Closed
Labels
bug Something isn't working pkg:instrumentation-net priority:p4 Bugs and spec inconsistencies which do not fall into a higher prioritization

Comments

@Zirak
Copy link
Contributor

Zirak commented Nov 6, 2023

What version of OpenTelemetry are you using?

  • @opentelemetry/auto-instrumentations-node@0.39.4
  • @opentelemetry/sdk-node@0.44.0

What version of Node are you using?

Reproduced on:

  • 18.18.2
  • 20.9.0

What did you do?

When instrumented code uses fetch in a non-idiomatic manner, the tls instrumentation attempts to do an action on a closed tls.connect span.

For example, instrumenting the following code:

fetch('https://example.com').then(() => {
    // A
    console.log('got a response');
});

// let the fetch timeout, hitting B
setTimeout(() => {}, 10_000);

The fetch timeouts since the response is not properly handled.

Point A is called when the CONNECT event is emitted, handled inside otel here:

const otelTlsSpanListener = () => {
const peerCertificate = socket.getPeerCertificate(true);
const cipher = socket.getCipher();
const protocol = socket.getProtocol();
const attributes = {
[TLSAttributes.PROTOCOL]: String(protocol),
[TLSAttributes.AUTHORIZED]: String(socket.authorized),
[TLSAttributes.CIPHER_NAME]: cipher.name,
[TLSAttributes.CIPHER_VERSION]: cipher.version,
[TLSAttributes.CERTIFICATE_FINGERPRINT]: peerCertificate.fingerprint,
[TLSAttributes.CERTIFICATE_SERIAL_NUMBER]: peerCertificate.serialNumber,
[TLSAttributes.CERTIFICATE_VALID_FROM]: peerCertificate.valid_from,
[TLSAttributes.CERTIFICATE_VALID_TO]: peerCertificate.valid_to,
[TLSAttributes.ALPN_PROTOCOL]: '',
};
if (socket.alpnProtocol) {
attributes[TLSAttributes.ALPN_PROTOCOL] = socket.alpnProtocol;
}
tlsSpan.setAttributes(attributes);
tlsSpan.end();
};

Point B is called when the ERROR event is emitted, handled here:

const otelTlsErrorListener = (e: Error) => {
tlsSpan.setStatus({
code: SpanStatusCode.ERROR,
message: e.message,
});
tlsSpan.end();
};

First A is emitted, setting some attributes and ending the span. Then B is hit, which attempts to set a status and close the span. That's invalid, since the span has already ended. With OTEL_LOG_LEVEL set to info, it prints something like the following:

Can not execute the operation on ended Span {traceId: 05b3d14d94bff06cd612b28e3df51afe, spanId: 0474f505a765a41f}
Can not execute the operation on ended Span {traceId: 05b3d14d94bff06cd612b28e3df51afe, spanId: 0474f505a765a41f}
tls.connect 05b3d14d94bff06cd612b28e3df51afe-0474f505a765a41f - You can only call end() on a span once.

What did you expect to see?

The tls instrumentation handles these scenarios without stepping on its own toes.

What did you see instead?

When a fetch timeouts, the tls instrumentation tries doing operations on an ended span

Possible solutions

A couple of options (of course, more are possible):

  1. Clear the event listeners inside the connect handler
  2. Wrap the entire tls connection in a span of its own, where tls.connect is a child span. Subsequent errors hit the longer parent span

I'm more than willing to create a followup PR for this

@Zirak Zirak added the bug Something isn't working label Nov 6, 2023
@pichlermarc pichlermarc added pkg:instrumentation-net priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect priority:p4 Bugs and spec inconsistencies which do not fall into a higher prioritization and removed priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect labels Nov 8, 2023
@pichlermarc
Copy link
Member

Thanks for reporting this! I'm not in-depth familiar with this instrumentation but it looks like if option 1 can be done without it impacting the way currently exported telemetry looks, I'd prefer it. 🙂

Assigning p4 as it's primarily logspam; am I right with that assumption?- or are you also seeing that the exported span is incorrect? That would bump it to p2.

@Zirak
Copy link
Contributor Author

Zirak commented Nov 10, 2023

it looks like if option 1 can be done without it impacting the way currently exported telemetry looks, I'd prefer it.

Yes, that's what it looks like; since the span is closed anyway, it couldn't have been changed by the existing behaviour. Bigger-picture, it may be preferable to have the error on some span. I don't currently recall if that's the existing case.

I'll PR soon enough to implement the behaviour, it's a simple one-liner.

Assigning p4 as it's primarily logspam; am I right with that assumption?

That's correct, it's slightly annoying when reading logs, but nothing bigger than that.

Thanks!

@ccoqueiro
Copy link

Hello guys, has this fix made it into the latest releases? Thanks.

@JamieDanielson
Copy link
Member

Hello guys, has this fix made it into the latest releases? Thanks.

@ccoqueiro It should be available now in v0.32.4. In case it helps, I found this by clicking to the PR and clicking on the commit that merged into Main. That shows the tags that contain this commit. Contrast that with a commit from a more recent PR, it will only show main and not other tags. Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg:instrumentation-net priority:p4 Bugs and spec inconsistencies which do not fall into a higher prioritization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants