Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Reduce clock-skew issues in mobile and other client-side trace sources #154

Open
bryce-b opened this issue May 11, 2021 · 5 comments

Comments

@bryce-b
Copy link
Member

bryce-b commented May 11, 2021

I'm creating this ticket per discussion in the OpenTelemetry maintainers' meeting 05/10/2021

Clock-skew will always be a problem with distributed tracing, but the degree of skew that occurs on unmanaged devices (by 'unmanaged' I mean devices outside of the software provider's control) is untenable.

Screen Shot 2021-04-26 at 10 44 06 AM

This screenshot shows the degree of clock skew between a mobile device and a backend server while tracing a synchronous request. The mobile device is using an automatically sync'd system clock, but the degree of skew could be much, much worst, as the clock can be set at the whim of the mobile phone's owner (think days, months, years of skew).

I'd like to brainstorm some solutions to this problem.
Some possible solutions could be:

  • client side monitors should operate in offset times that can be later set relative to some time authority (collector?)
  • client side could sync to a non-system time authority
  • distributed traces could be processed to re-align spans based off relation (if a http request is made to a backend service they should probably overlap to some degree)
@Oberon00
Copy link
Member

(Clocks are hard. Except for Linux' clock_gettime(CLOCK_BOOTTIME) which may not be available in the target runtime/language, I do not know any other clock implementation that goes in lockstep with the epoch time. Especially on client systems, the typical monotonic clocks stop when the CPU is suspended (e.g. with a closed notebook lid, but I imagine on battery-driven mobile devices it occurs even more). The realtime clock on the other hand is subject to be changed by the user on a whim.)

@Oberon00
Copy link
Member

Oberon00 commented May 11, 2021

Without having delved deeper into the topic, I don't think it is feasible to get sub-second synchronization across distributed systems with anything short of full-fledged NTP (which takes a few minutes too sync precisely). For a precision in the order of a few seconds, it may be enough to send the "current" time with each request, so the receiver can calculate the offset between the current time of the sender and it's own current time.

@iNikem
Copy link
Contributor

iNikem commented May 13, 2021

it may be enough to send the "current" time with each request, so the receiver can calculate the offset between the current time of the sender and it's own current time.

This is more or less what we did in Plumbr

@ivomagi
Copy link

ivomagi commented May 24, 2021

There is a blogpost, exposing conceptually how the clock skew was handled back in the days: https://plumbr.io/blog/monitoring/time-in-distributed-systems

@t2t2
Copy link

t2t2 commented Apr 30, 2024

In case this issue gets active once again, archive.org link for above blog post:
https://web.archive.org/web/20210123103641/https://plumbr.io/blog/monitoring/time-in-distributed-systems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants