Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AS::Notifications measuring with Time.now vs CPU clock #34271

Closed
kirs opened this issue Oct 21, 2018 · 6 comments

Comments

Projects
None yet
5 participants
@kirs
Copy link
Member

commented Oct 21, 2018

At Shopify, we measure and log how much time a request has spent talking to all kinds of external resources: Redis, Memcache, ActiveRecord. Most of that is implemented by wrapping the request into AS::Notification subscription and reporting the delta between event's start and end (which is a Time object).

At the same time we have other middlewares that report total request time, which is using a delta between CPU clock (Process.clock_gettime(Process::CLOCK_MONOTONIC)).

We've started noticing some weird data in out logs:

total=0.457ms
memcache=14.587ms
db=6.930158ms	

How come that total request time is 30 times less than time spent within the request?

The difference comes from total measured using CPU clock while memcache and db measured with AS::Notification using deltas of Time.now. That's what gives us such inaccurate results when comparing two things measured in different ways.

I noticed this has been recently fixed by @eileencodes in #33449.

Given that the PR was targeted for 6.0, can we consider this a bug and back port it to 5.2? 🙏

@kirs

This comment has been minimized.

Copy link
Member Author

commented Oct 21, 2018

@eileencodes

This comment has been minimized.

Copy link
Member

commented Oct 22, 2018

I think we should backport this to 5.2 since we have identified that the current implementation is buggy. If we don't want to include any features we could consider just backporting a portion of the PR that swaps from Time to Process.clock_gettime.

@jeremy

This comment has been minimized.

Copy link
Member

commented Oct 22, 2018

  • Backport! Agree it's a bugfix that monotonic clock fixes.

rafaelfranca added a commit that referenced this issue Oct 22, 2018

Use the process clock in the subscriber events
This will avoid issues with the machine chaging time.

Closes #34271.
@rafaelfranca

This comment has been minimized.

Copy link
Member

commented Oct 22, 2018

I backported in 7c45421 by only applying the change in the clock.

@kirs

This comment has been minimized.

Copy link
Member Author

commented Oct 23, 2018

❤️ ❤️ ❤️

@bdmac

This comment has been minimized.

Copy link
Contributor

commented Mar 19, 2019

So now how do you now get the wall clock time for when an event started to send to a system like e.g. Influx? I suppose we could just instrument everything in our subscribers using Time.now for Influx but to be more accurate we were using event.time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.