-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instrumentation of the GVL #1442
Conversation
9ff25e2 to
ee7d28e
Compare
| io_percent = (timer.idle_duration / total_time * 100.0).round(1) | ||
| Rails.logger.info( | ||
| "[GVL] io_percent=#{io_percent}, run_ms=#{run_ms}, idle_ms=#{idle_ms}, stall_ms=#{stall_ms}, " \ | ||
| "gc_ms=#{gc_ms}, path=#{env["PATH_INFO"]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if PATH_INFO is the best, as it will have a fairly high cardinality. Perhaps controller/action would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want controller/action. Spiders regularly map all of the site and we have a few hundred thousand URLs. You’d gather hit frequency, but basically everything is log normal on recency like you’d assume.
Maybe you want http verb too? It’s an excellent proxy for reads vs writes. Because of the invite system we’re even a bit more read-skewed than the rule of thumb.
Which reminds me, it’s worth highlighting that that’s after nginx. We still do full-page caching with actionpack-page_caching so nginx absorbs a ton of traffic that’d otherwise land on rails. I haven’t measured a percentage in years, but I could check logs if it matters to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you want http verb too?
Good call. Even though it can be inferred from the action, it's easier that way.
but I could check logs if it matters to you.
Nah, it's fine. I'm interested in requests that are actually processed by Ruby.
03b43b1 to
a0c0404
Compare
|
Lobsters logs to files. I’d appreciate it if this logged to its own file rather than the main log; lots of things get handled out of it with grep and jq. |
e751c41 to
01a97e1
Compare
In response to https://byroot.github.io/ruby/performance/2025/01/23/the-mythical-io-bound-rails-app.html @pushcx kindly offered to instrument the actual IO ration on lobster production environment. This PR does exactly that using the `gvl_timing` gem, inside a custom middleware mounted about as high as possible in the stack. On my local environment it produces the following logs: ``` [GVL] io_percent=9.1, run_ms=30.57, idle_ms=2.95, stall_ms=0.03, gc_ms=0.0, path=/active ```
01a97e1 to
81ec213
Compare
| @@ -102,6 +102,9 @@ | |||
| # why help timing attacks? | |||
| config.middleware.delete(Rack::Runtime) | |||
|
|
|||
| require File.expand_path("../../extras/gvl_instrumentation", __FILE__) | |||
| config.middleware.insert 0, GvlInstrumentation, File.open(Rails.root.join("/srv/lobste.rs/log/production-gvl.log"), "a+") | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the log to its own file. But I don't know if the app has the permission to create it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'll be fine.
|
I'm deploying now. Will a couple days of logs work for your purposes? |
|
Yes, even a few hours are probably plenty. It's really up to you. |
|
OK. It's collecting in prod now, here's the last couple lines off it: |
|
Nice! |
|
It ran in production for a day, so I’ve now reverted the logger and emailed byroot a link to the log. |
|
Unfortunately the link was a 404 😢 |
|
Whoops, I put them in the cache dir without thinking, so the link stopped working after 2 minutes. Good now. |
|
I got it! If you have no objections I'll share that data in a google sheet or something in case others want to look at it. |
|
I don't mind at all, I sent the link privately because I don't want to support it indefinitely. |
I used `gvl_timing` to instrument lobste.rs: lobsters/lobsters#1442 One piece of data I wish was included was how many time the GVL was released by the thread. Because when you see a high stall time, you can't really tell if it was due to very high contention, or to moderate contention but the thread had to acquire the GVL many times.
In response to https://byroot.github.io/ruby/performance/2025/01/23/the-mythical-io-bound-rails-app.html @pushcx kindly offered to instrument the actual IO ration on lobster production environment.
This PR does exactly that using the
gvl_timinggem, inside a custom middleware mounted about as high as possible in the stack.On my local environment it produces the following logs:
Perhaps you'd prefer if these were integrated in the main logs? I'm not familiar with your current log infra, but if you point me to it, I can probably hook them off.