Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instrumentation of the GVL #1442

Merged
merged 1 commit into from
Jan 24, 2025
Merged

Conversation

byroot
Copy link
Contributor

@byroot byroot commented Jan 23, 2025

In response to https://byroot.github.io/ruby/performance/2025/01/23/the-mythical-io-bound-rails-app.html @pushcx kindly offered to instrument the actual IO ration on lobster production environment.

This PR does exactly that using the gvl_timing gem, inside a custom middleware mounted about as high as possible in the stack.

On my local environment it produces the following logs:

[GVL] io_percent=9.1, run_ms=30.57, idle_ms=2.95, stall_ms=0.03, gc_ms=0.0, path=/active

Perhaps you'd prefer if these were integrated in the main logs? I'm not familiar with your current log infra, but if you point me to it, I can probably hook them off.

@byroot byroot force-pushed the gvl-instrumentation branch from 9ff25e2 to ee7d28e Compare January 23, 2025 17:04
io_percent = (timer.idle_duration / total_time * 100.0).round(1)
Rails.logger.info(
"[GVL] io_percent=#{io_percent}, run_ms=#{run_ms}, idle_ms=#{idle_ms}, stall_ms=#{stall_ms}, " \
"gc_ms=#{gc_ms}, path=#{env["PATH_INFO"]}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if PATH_INFO is the best, as it will have a fairly high cardinality. Perhaps controller/action would be better?

Copy link
Member

@pushcx pushcx Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want controller/action. Spiders regularly map all of the site and we have a few hundred thousand URLs. You’d gather hit frequency, but basically everything is log normal on recency like you’d assume.

Maybe you want http verb too? It’s an excellent proxy for reads vs writes. Because of the invite system we’re even a bit more read-skewed than the rule of thumb.

Which reminds me, it’s worth highlighting that that’s after nginx. We still do full-page caching with actionpack-page_caching so nginx absorbs a ton of traffic that’d otherwise land on rails. I haven’t measured a percentage in years, but I could check logs if it matters to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you want http verb too?

Good call. Even though it can be inferred from the action, it's easier that way.

but I could check logs if it matters to you.

Nah, it's fine. I'm interested in requests that are actually processed by Ruby.

@byroot byroot force-pushed the gvl-instrumentation branch 3 times, most recently from 03b43b1 to a0c0404 Compare January 23, 2025 17:11
@pushcx
Copy link
Member

pushcx commented Jan 24, 2025

Lobsters logs to files. I’d appreciate it if this logged to its own file rather than the main log; lots of things get handled out of it with grep and jq.

@byroot byroot force-pushed the gvl-instrumentation branch 2 times, most recently from e751c41 to 01a97e1 Compare January 24, 2025 09:36
In response to https://byroot.github.io/ruby/performance/2025/01/23/the-mythical-io-bound-rails-app.html
@pushcx kindly offered to instrument the actual IO ration on lobster production
environment.

This PR does exactly that using the `gvl_timing` gem, inside a custom middleware
mounted about as high as possible in the stack.

On my local environment it produces the following logs:

```
[GVL] io_percent=9.1, run_ms=30.57, idle_ms=2.95, stall_ms=0.03, gc_ms=0.0, path=/active
```
@byroot byroot force-pushed the gvl-instrumentation branch from 01a97e1 to 81ec213 Compare January 24, 2025 09:37
@@ -102,6 +102,9 @@
# why help timing attacks?
config.middleware.delete(Rack::Runtime)

require File.expand_path("../../extras/gvl_instrumentation", __FILE__)
config.middleware.insert 0, GvlInstrumentation, File.open(Rails.root.join("/srv/lobste.rs/log/production-gvl.log"), "a+")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the log to its own file. But I don't know if the app has the permission to create it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll be fine.

@pushcx pushcx merged commit a6073d5 into lobsters:master Jan 24, 2025
4 checks passed
@byroot byroot deleted the gvl-instrumentation branch January 24, 2025 14:01
@pushcx
Copy link
Member

pushcx commented Jan 24, 2025

I'm deploying now. Will a couple days of logs work for your purposes?

@byroot
Copy link
Contributor Author

byroot commented Jan 24, 2025

Yes, even a few hours are probably plenty. It's really up to you.

@pushcx
Copy link
Member

pushcx commented Jan 24, 2025

OK. It's collecting in prod now, here's the last couple lines off it:

{"gc_ms":13.79,"run_ms":32.6,"idle_ms":2.28,"stall_ms":0.01,"io_percent":6.6,"method":"GET","action":"home#index"}
{"gc_ms":0.0,"run_ms":79.67,"idle_ms":3.29,"stall_ms":0.05,"io_percent":3.2,"method":"GET","action":"home#index"}
{"gc_ms":0.0,"run_ms":101.26,"idle_ms":12.3,"stall_ms":0.02,"io_percent":10.6,"method":"GET","action":"stories#show"}
{"gc_ms":0.0,"run_ms":25.99,"idle_ms":0.04,"stall_ms":0.4,"io_percent":0.1,"method":"HEAD"}

@byroot
Copy link
Contributor Author

byroot commented Jan 24, 2025

Nice!

pushcx added a commit that referenced this pull request Jan 25, 2025
@pushcx
Copy link
Member

pushcx commented Jan 25, 2025

It ran in production for a day, so I’ve now reverted the logger and emailed byroot a link to the log.

@byroot
Copy link
Contributor Author

byroot commented Jan 25, 2025

Unfortunately the link was a 404 😢

@pushcx
Copy link
Member

pushcx commented Jan 25, 2025

Whoops, I put them in the cache dir without thinking, so the link stopped working after 2 minutes. Good now.

@byroot
Copy link
Contributor Author

byroot commented Jan 25, 2025

I got it!

If you have no objections I'll share that data in a google sheet or something in case others want to look at it.

@pushcx
Copy link
Member

pushcx commented Jan 26, 2025

I don't mind at all, I sent the link privately because I don't want to support it indefinitely.

byroot added a commit to byroot/gvl_timing that referenced this pull request Jan 28, 2025
I used `gvl_timing` to instrument lobste.rs: lobsters/lobsters#1442

One piece of data I wish was included was how many time the GVL was released
by the thread. Because when you see a high stall time, you can't really
tell if it was due to very high contention, or to moderate contention
but the thread had to acquire the GVL many times.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants