Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(tracing): make spans resilient to performance clock drift #3434

Merged
merged 31 commits into from
Jan 11, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
7083983
fix(tracing): make spans resilient to performance clock drift
dyladan Nov 21, 2022
5794fbc
Fix changelog
dyladan Nov 21, 2022
7890388
Do not export getTimeOrigin
dyladan Nov 21, 2022
a764b2b
Apply shift to shim spans
dyladan Nov 22, 2022
e991a6c
Lint
dyladan Nov 22, 2022
4405fc0
Lint
dyladan Nov 22, 2022
ee10895
Merge branch 'main' into simple-fix-timestamps
dyladan Nov 22, 2022
27d72df
Merge branch 'main' into simple-fix-timestamps
dyladan Nov 22, 2022
44d9dc5
Remove unused imports
dyladan Nov 22, 2022
2a5fc2f
Merge branch 'simple-fix-timestamps' of github.com:dyladan/openteleme…
dyladan Nov 22, 2022
0149f56
Fix drift calculation
dyladan Nov 30, 2022
7c540fe
Merge remote-tracking branch 'origin/main' into simple-fix-timestamps
dyladan Dec 14, 2022
7626ca2
Remove bad import
dyladan Dec 14, 2022
26a27f0
Merge remote-tracking branch 'origin/main' into simple-fix-timestamps
dyladan Dec 29, 2022
488b5da
Merge remote-tracking branch 'origin/main' into simple-fix-timestamps
dyladan Jan 3, 2023
9cc92bf
Use Date.now for fetch span end
dyladan Jan 3, 2023
8bddc96
Fetch changelog
dyladan Jan 3, 2023
91cfd37
Merge branch 'main' into simple-fix-timestamps
dyladan Jan 6, 2023
63f3b2c
Test addHrTimes
dyladan Jan 6, 2023
e66b75f
lint
dyladan Jan 6, 2023
b848140
lint
dyladan Jan 6, 2023
f4c1be0
lint
dyladan Jan 6, 2023
8c48a34
Fix flaky test
dyladan Jan 9, 2023
6790d17
lint
dyladan Jan 9, 2023
3b61190
Use perf timer for perf API
dyladan Jan 9, 2023
5614bcb
Apply date fix to xhr
dyladan Jan 9, 2023
f6d88aa
Changelog
dyladan Jan 9, 2023
e763118
lint
dyladan Jan 9, 2023
03c96c0
Revert to previous addHrTimes impl
dyladan Jan 10, 2023
fd2398d
Review comments
dyladan Jan 10, 2023
830c799
Fix flaky test
dyladan Jan 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ For experimental package changes, see the [experimental CHANGELOG](experimental/
* `telemetry.sdk.name`
* `telemetry.sdk.language`
* `telemetry.sdk.version`
* fix(sdk-trace): make spans resilient to clock drift [#3434](https://github.com/open-telemetry/opentelemetry-js/pull/3434) @dyladan

### :books: (Refine Doc)

Expand Down
32 changes: 17 additions & 15 deletions packages/opentelemetry-core/src/common/time.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ const SECOND_TO_NANOSECONDS = Math.pow(10, NANOSECOND_DIGITS);
* This is represented in HrTime format as [1609504210, 150000000].
* @param epochMillis
*/
function numberToHrtime(epochMillis: number): api.HrTime {
export function epochMillisToHrTime(epochMillis: number): api.HrTime {
blumamir marked this conversation as resolved.
Show resolved Hide resolved
const epochSeconds = epochMillis / 1000;
// Decimals only.
const seconds = Math.trunc(epochSeconds);
Expand All @@ -58,21 +58,12 @@ function getTimeOrigin(): number {
* @param performanceNow
*/
export function hrTime(performanceNow?: number): api.HrTime {
const timeOrigin = numberToHrtime(getTimeOrigin());
const now = numberToHrtime(
const timeOrigin = epochMillisToHrTime(getTimeOrigin());
const now = epochMillisToHrTime(
typeof performanceNow === 'number' ? performanceNow : performance.now()
);

let seconds = timeOrigin[0] + now[0];
let nanos = timeOrigin[1] + now[1];

// Nanoseconds
if (nanos > SECOND_TO_NANOSECONDS) {
nanos -= SECOND_TO_NANOSECONDS;
seconds += 1;
}

return [seconds, nanos];
return addHrTimes(timeOrigin, now);
}

/**
Expand All @@ -90,10 +81,10 @@ export function timeInputToHrTime(time: api.TimeInput): api.HrTime {
return hrTime(time);
} else {
// epoch milliseconds or performance.timeOrigin
return numberToHrtime(time);
return epochMillisToHrTime(time);
}
} else if (time instanceof Date) {
return numberToHrtime(time.getTime());
return epochMillisToHrTime(time.getTime());
} else {
throw TypeError('Invalid input type');
}
Expand Down Expand Up @@ -181,3 +172,14 @@ export function isTimeInput(value: unknown): value is api.HrTime | number | Date
value instanceof Date
);
}

export function addHrTimes(time1: api.HrTime, time2: api.HrTime): api.HrTime {
const out: api.HrTime = [time1[0] + time2[0], time1[1] + time2[1]];

if(out[1] > SECOND_TO_NANOSECONDS) {
dyladan marked this conversation as resolved.
Show resolved Hide resolved
out[0] = out[0] + Math.floor(out[1] / SECOND_TO_NANOSECONDS);
out[1] = out[1] % SECOND_TO_NANOSECONDS;
}
dyladan marked this conversation as resolved.
Show resolved Hide resolved

return out;
}
151 changes: 102 additions & 49 deletions packages/opentelemetry-sdk-trace-base/src/Span.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,32 @@
* limitations under the License.
*/

import * as api from '@opentelemetry/api';
import { Context, HrTime, SpanAttributeValue } from '@opentelemetry/api';
import {
Clock,
Context,
diag,
Exception,
HrTime,
Link,
Span as APISpan,
SpanAttributes,
SpanAttributeValue,
SpanContext,
SpanKind,
SpanStatus,
SpanStatusCode,
TimeInput
} from '@opentelemetry/api';
import {
addHrTimes,
epochMillisToHrTime,
hrTime,
hrTimeDuration,
InstrumentationLibrary,
isAttributeValue,
isTimeInput,
isTimeInputHrTime,
otperformance,
sanitizeAttributes,
timeInputToHrTime
sanitizeAttributes
} from '@opentelemetry/core';
import { Resource } from '@opentelemetry/resources';
import { SemanticAttributes } from '@opentelemetry/semantic-conventions';
Expand All @@ -38,29 +53,33 @@ import { SpanLimits } from './types';
/**
* This class represents a span.
*/
export class Span implements api.Span, ReadableSpan {
export class Span implements APISpan, ReadableSpan {
// Below properties are included to implement ReadableSpan for export
// purposes but are not intended to be written-to directly.
private readonly _spanContext: api.SpanContext;
readonly kind: api.SpanKind;
private readonly _spanContext: SpanContext;
readonly kind: SpanKind;
readonly parentSpanId?: string;
readonly attributes: api.SpanAttributes = {};
readonly links: api.Link[] = [];
readonly attributes: SpanAttributes = {};
readonly links: Link[] = [];
readonly events: TimedEvent[] = [];
readonly startTime: api.HrTime;
readonly startTime: HrTime;
readonly resource: Resource;
readonly instrumentationLibrary: InstrumentationLibrary;
name: string;
status: api.SpanStatus = {
code: api.SpanStatusCode.UNSET,
status: SpanStatus = {
code: SpanStatusCode.UNSET,
};
endTime: api.HrTime = [0, 0];
endTime: HrTime = [0, 0];
private _ended = false;
private _duration: api.HrTime = [-1, -1];
private _duration: HrTime = [-1, -1];
private readonly _spanProcessor: SpanProcessor;
private readonly _spanLimits: SpanLimits;
private readonly _attributeValueLengthLimit: number;
private readonly _clock: Clock;

private readonly _performanceStartTime: number;
private readonly _performanceOffset: number;
private readonly _startTimeProvided: boolean;


/**
* Constructs a new Span instance.
Expand All @@ -71,20 +90,27 @@ export class Span implements api.Span, ReadableSpan {
parentTracer: Tracer,
context: Context,
spanName: string,
spanContext: api.SpanContext,
kind: api.SpanKind,
spanContext: SpanContext,
kind: SpanKind,
parentSpanId?: string,
links: api.Link[] = [],
startTime?: api.TimeInput,
clock: Clock = otperformance,
links: Link[] = [],
startTime?: TimeInput,
_deprecatedClock?: unknown, // keeping this argument even though it is unused to ensure backwards compatibility
) {
this._clock = clock;
this.name = spanName;
this._spanContext = spanContext;
this.parentSpanId = parentSpanId;
this.kind = kind;
this.links = links;
this.startTime = timeInputToHrTime(startTime ?? clock.now());

const now = Date.now();
this._performanceStartTime = otperformance.now();
this._performanceOffset = now - this._performanceStartTime;
dyladan marked this conversation as resolved.
Show resolved Hide resolved
this._startTimeProvided = startTime != null;


this.startTime = this._getTime(startTime ?? now);

this.resource = parentTracer.resource;
this.instrumentationLibrary = parentTracer.instrumentationLibrary;
this._spanLimits = parentTracer.getSpanLimits();
Expand All @@ -93,19 +119,19 @@ export class Span implements api.Span, ReadableSpan {
this._attributeValueLengthLimit = this._spanLimits.attributeValueLengthLimit || 0;
}

spanContext(): api.SpanContext {
spanContext(): SpanContext {
return this._spanContext;
}

setAttribute(key: string, value?: SpanAttributeValue): this;
setAttribute(key: string, value: unknown): this {
if (value == null || this._isSpanEnded()) return this;
if (key.length === 0) {
api.diag.warn(`Invalid attribute key: ${key}`);
diag.warn(`Invalid attribute key: ${key}`);
return this;
}
if (!isAttributeValue(value)) {
api.diag.warn(`Invalid attribute value set for key: ${key}`);
diag.warn(`Invalid attribute value set for key: ${key}`);
return this;
}

Expand All @@ -120,7 +146,7 @@ export class Span implements api.Span, ReadableSpan {
return this;
}

setAttributes(attributes: api.SpanAttributes): this {
setAttributes(attributes: SpanAttributes): this {
for (const [k, v] of Object.entries(attributes)) {
this.setAttribute(k, v);
}
Expand All @@ -132,42 +158,40 @@ export class Span implements api.Span, ReadableSpan {
* @param name Span Name
* @param [attributesOrStartTime] Span attributes or start time
* if type is {@type TimeInput} and 3rd param is undefined
* @param [startTime] Specified start time for the event
* @param [timeStamp] Specified time stamp for the event
*/
addEvent(
name: string,
attributesOrStartTime?: api.SpanAttributes | api.TimeInput,
startTime?: api.TimeInput
attributesOrStartTime?: SpanAttributes | TimeInput,
timeStamp?: TimeInput
): this {
if (this._isSpanEnded()) return this;
if (this._spanLimits.eventCountLimit === 0) {
api.diag.warn('No events allowed.');
diag.warn('No events allowed.');
return this;
}
if (this.events.length >= this._spanLimits.eventCountLimit!) {
api.diag.warn('Dropping extra events.');
diag.warn('Dropping extra events.');
this.events.shift();
}

if (isTimeInput(attributesOrStartTime)) {
if (typeof startTime === 'undefined') {
startTime = attributesOrStartTime as api.TimeInput;
if (!isTimeInput(timeStamp)) {
timeStamp = attributesOrStartTime;
}
attributesOrStartTime = undefined;
}
if (typeof startTime === 'undefined') {
startTime = this._clock.now();
}

const attributes = sanitizeAttributes(attributesOrStartTime);
this.events.push({
name,
attributes,
time: timeInputToHrTime(startTime),
time: this._getTime(timeStamp),
});
return this;
}

setStatus(status: api.SpanStatus): this {
setStatus(status: SpanStatus): this {
if (this._isSpanEnded()) return this;
this.status = status;
return this;
Expand All @@ -179,18 +203,18 @@ export class Span implements api.Span, ReadableSpan {
return this;
}

end(endTime?: api.TimeInput): void {
end(endTime?: TimeInput): void {
if (this._isSpanEnded()) {
api.diag.error('You can only call end() on a span once.');
diag.error('You can only call end() on a span once.');
return;
}
this._ended = true;

this.endTime = timeInputToHrTime(endTime ?? this._clock.now());
this.endTime = this._getTime(endTime);
this._duration = hrTimeDuration(this.startTime, this.endTime);

if (this._duration[0] < 0) {
api.diag.warn(
diag.warn(
'Inconsistent start and end time, startTime > endTime. Setting span duration to 0ms.',
this.startTime,
this.endTime
Expand All @@ -202,12 +226,41 @@ export class Span implements api.Span, ReadableSpan {
this._spanProcessor.onEnd(this);
}

private _getTime(inp?: TimeInput): HrTime {
if (typeof inp === 'number' && inp < otperformance.now()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in core we have similar logic where we use origin for this check, and not now

    // Must be a performance.now() if it's smaller than process start time.
    if (time < getTimeOrigin()) {
      return hrTime(time);

I believe they both work fine, but better to have consistency in the project, even maybe extract this logic into a function isTimeInputPerformanceNow(time: TimeInput) similar to the existing function isTimeInputHrTime, this is also self documenting and will make the comment below redundant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is actually very important to the functioning of the PR. I think the function in core should be deprecated.

The reason to use performance.now is that we can be much more sure that a given number is a performance timestamp. A number before timeOrigin may be ingested from historical logs or come from an inconsistent time source which has been corrected and timeOrigin is wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks for the response.
My point is that we are doing the same computation in 2 different places in 2 different ways. This comment is a suggestion to unify it (and maybe extract the logic into a function while we are at it). Could be done in this PR, later one, or not at all

// must be a performance timestamp
// apply correction and convert to hrtime
return hrTime(inp + this._performanceOffset);
}

if (typeof inp === 'number') {
return epochMillisToHrTime(inp);
}

if (inp instanceof Date) {
return epochMillisToHrTime(inp.valueOf());
blumamir marked this conversation as resolved.
Show resolved Hide resolved
}

if (isTimeInputHrTime(inp)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version seems to be susceptible to drift in fetch instrumentation (undefined start + hrTime end):

image

Because hrTime() does getTimeOrigin() + performance.now() it would generate drifted HrTime

Copy link
Member Author

@dyladan dyladan Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because an HrTime formatted timestamp is not corrected in any case. Currently, the start time also uses the performance timing API indirectly because a start time is not provided and the performance clock is used. This means that in the current state the whole span is shifted anyway.

My recommendation would be to:

  1. Change core.hrTime to return an HrTime generated using Date.now
  2. Update fetch instrumentation to not pass a time manually

Either of these changes would fix the issue here, but I actually recommend we do both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative to (2) would be to update fetch instrumentation to get start and end times from the performance timing API

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t2t2 does the updated version address your concern?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch and xhr have both been updated. I think this is resolved but I'll wait for @t2t2 to confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

return inp;
}

if (this._startTimeProvided) {
// if user provided a time for the start manually
// we can't use duration to calculate event/end times
return epochMillisToHrTime(Date.now());
}

const msDuration = otperformance.now() - this._performanceStartTime;
return addHrTimes(this.startTime, epochMillisToHrTime(msDuration));
}

isRecording(): boolean {
return this._ended === false;
}

recordException(exception: api.Exception, time: api.TimeInput = this._clock.now()): void {
const attributes: api.SpanAttributes = {};
recordException(exception: Exception, time?: TimeInput): void {
const attributes: SpanAttributes = {};
if (typeof exception === 'string') {
attributes[SemanticAttributes.EXCEPTION_MESSAGE] = exception;
} else if (exception) {
Expand All @@ -233,11 +286,11 @@ export class Span implements api.Span, ReadableSpan {
) {
this.addEvent(ExceptionEventName, attributes, time);
} else {
api.diag.warn(`Failed to record an exception ${exception}`);
diag.warn(`Failed to record an exception ${exception}`);
}
}

get duration(): api.HrTime {
get duration(): HrTime {
return this._duration;
}

Expand All @@ -247,7 +300,7 @@ export class Span implements api.Span, ReadableSpan {

private _isSpanEnded(): boolean {
if (this._ended) {
api.diag.warn(`Can not execute the operation on ended Span {traceId: ${this._spanContext.traceId}, spanId: ${this._spanContext.spanId}}`);
diag.warn(`Can not execute the operation on ended Span {traceId: ${this._spanContext.traceId}, spanId: ${this._spanContext.spanId}}`);
}
return this._ended;
}
Expand Down Expand Up @@ -279,7 +332,7 @@ export class Span implements api.Span, ReadableSpan {
// Check limit
if (limit <= 0) {
// Negative values are invalid, so do not truncate
api.diag.warn(`Attribute value limit must be positive, got ${limit}`);
diag.warn(`Attribute value limit must be positive, got ${limit}`);
return value;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,8 @@ describe('Span', () => {
spanContext,
SpanKind.SERVER
);
// @ts-expect-error writing readonly property. performance time origin is mocked to return ms value of [1,1]
span['_performanceOffset'] = 0;
span.end(hrTimeToMilliseconds(span.startTime) - 1);
assert.ok(hrTimeToNanoseconds(span.duration) >= 0);
});
Expand Down Expand Up @@ -1088,6 +1090,8 @@ describe('Span', () => {
spanContext,
SpanKind.CLIENT
);
// @ts-expect-error writing readonly property. performance time origin is mocked to return ms value of [1,1]
span['_performanceOffset'] = 0;
assert.strictEqual(span.events.length, 0);
span.recordException('boom', [0, 123]);
const event = span.events[0];
Expand Down
Loading