Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AS::Notifications measuring with Time.now vs CPU clock #34271

Closed
kirs opened this issue Oct 21, 2018 · 6 comments
Closed

AS::Notifications measuring with Time.now vs CPU clock #34271

kirs opened this issue Oct 21, 2018 · 6 comments

Comments

@kirs
Copy link
Member

kirs commented Oct 21, 2018

At Shopify, we measure and log how much time a request has spent talking to all kinds of external resources: Redis, Memcache, ActiveRecord. Most of that is implemented by wrapping the request into AS::Notification subscription and reporting the delta between event's start and end (which is a Time object).

At the same time we have other middlewares that report total request time, which is using a delta between CPU clock (Process.clock_gettime(Process::CLOCK_MONOTONIC)).

We've started noticing some weird data in out logs:

total=0.457ms
memcache=14.587ms
db=6.930158ms	

How come that total request time is 30 times less than time spent within the request?

The difference comes from total measured using CPU clock while memcache and db measured with AS::Notification using deltas of Time.now. That's what gives us such inaccurate results when comparing two things measured in different ways.

I noticed this has been recently fixed by @eileencodes in #33449.

Given that the PR was targeted for 6.0, can we consider this a bug and back port it to 5.2? 🙏

@kirs
Copy link
Member Author

kirs commented Oct 21, 2018

@eileencodes
Copy link
Member

I think we should backport this to 5.2 since we have identified that the current implementation is buggy. If we don't want to include any features we could consider just backporting a portion of the PR that swaps from Time to Process.clock_gettime.

@jeremy
Copy link
Member

jeremy commented Oct 22, 2018

  • Backport! Agree it's a bugfix that monotonic clock fixes.

rafaelfranca added a commit that referenced this issue Oct 22, 2018
This will avoid issues with the machine chaging time.

Closes #34271.
@rafaelfranca
Copy link
Member

I backported in 7c45421 by only applying the change in the clock.

@kirs
Copy link
Member Author

kirs commented Oct 23, 2018

❤️ ❤️ ❤️

@bdmac
Copy link
Contributor

bdmac commented Mar 19, 2019

So now how do you now get the wall clock time for when an event started to send to a system like e.g. Influx? I suppose we could just instrument everything in our subscribers using Time.now for Influx but to be more accurate we were using event.time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants