New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] Ruby 3.2.0 structs #61
Comments
d859c58 looks like a good exemplar, I'll see if i can come up with something by muddling my way through |
@manuelfelipe and I both took a crack at this and came up with identical patches based on copy/pasta and not really knowing what we are doing, here is his 4056905 but mine looked like it as well. The error that we get is:
The line in question appears to be https://github.com/javierhonduco/rbperf/blob/main/src/rbperf.rs#L210 so clearly we have done something wrong |
I think that https://github.com/javierhonduco/rbperf/blob/main/src/rbperf.rs#L134 was somehow failing, because i reverted afc7ee7 and deleted the older ruby versions from the list:
and it allowed me to get past this error. Errno 7 i think is Now i have a new one, probably due to how we poorly copied the ruby structs as it says the error should be impossible (lol):
|
Found what was the issue with the first listed panic. manuelfelipe@48f05ec addressed it in my fork. Looks like there is a static limit set for the numbers entries allowed in the bpf map used for ruby version matching. Now, I'm at the sage stage you are @dalehamel and have the same panic after the record session is done. |
Hi!! The documentation is really lacking – sorry about this. Thanks for bringing this up, will prioritise the documentation on adding support for a new Ruby version.
That fix is totally correct! |
Seems that in Ruby 3.2.0 the internal structures that back Strings have changed, see Variable Width Allocation: Strings, so the stack unwinder in BPF will have to be adapted. Without these changes, the stacks will be dropped if we can detect that they don't look right, or will be garbled otherwise. |
@manuelfelipe took the liberty to create a PR with your changes + what was needed to read the correct string (#64). The tests pass, but please do let me know if things aren't working properly and I'll be happy to take a look |
Thanks @javierhonduco, just gave it a try in a random worker around and now failing with:
tho on random occasions, it does not panic and I'm able to get some output from it. eg:
|
Thanks for the logs, this is super valuable to understand what's going on! Will look into the panic. About the incomplete stacks, I'm assuming you are tracing a large Rails application, right? They tend to have incredibly deep stacks, and the current value is rather low. For some reason, even though I've done testing in mid to large Rails applications I never increased the total number of frames that can be captured. Right now, up to 150 frames can be unwound. Just opened #65 to increase this to 750 frames. Let me know if this doesn't fix the truncated (incomplete) stacks you are seeing. |
Nice, thanks for looking into the incomplete stacks. Yeah, I was trying this out in worker from a really large Rails app 🙈. After #65 I no longer see the incomplete stack msgs showing up.
|
Awesome! Thanks for giving it another try. I've just cut another PR to fix the Glad you are mentioning this last error. I've just cut #67 because when I was going through the code I realised that the error handling can definitely be improved. These samples should be ignored, this is not a fatal error by any means |
edit: #68 should remediate this problem |
nice. Thanks again @javierhonduco! definitely much better now. I'm no longer seeing the panics and getting more consistent results ❤️ the one thing I notice, is that n ocations, the output is full of erorr like:
|
Happy to help! Fixing this problem properly will take a little while, but there's a stop-gap you can perhaps try meanwhile. The underlying issue is that some of the stack frames in the Lines 88 to 89 in 3810e65
(note the Then we try to access the overwritten stack that doesn't exist anymore. This is the error we see in userspace. There are a couple of things that have to be done to fix this problem properly. The WIP I had some months back that I didn't publish because I didn't make much progress was to have a worker thread that will poll from the events, read the Ruby stack off the BPF maps and produce the Ruby stack we want. By doing this concurrently with the BPF program producing samples, it should heavily reduce the chance of collisions. To unblock you, a possibility will be to bump these two maps where we store the stacks (from https://github.com/javierhonduco/rbperf/blob/main/src/bpf/rbperf.bpf.c#L36-L48 The other issue that the code has (will put a fix some time this week) is to stop relying on the random number generator. Not only this burns more cycles, but it can also produce the same values that we already have in the maps, making this problem worse. Hope this makes sense |
thanks for the clarifications. Making more sense now. For now, will give it a try as you are suggesting with a different value for the ruby frame map. |
Closing this issue as Ruby 3.2.x support has already landed in #64. Let's keep track of the other bugs in their own issues. Thanks for contributing! |
Howdy @javierhonduco ! Have been kicking the tires on this and would love to try out some of the features in production.
Unfortunately, as we are perennially on the newest ruby, it is difficult to keep up with the struct offsets / version compatibility concerns.
I notice that it looks like this project pulls the struct info from rbspy.
Perhaps if you could add some documentation / a general explanation on your process for how to bump the struct offsets, I or someone on my team could take a crack at bumping the ruby structs? It would be very instructive to give it a go either way.
The text was updated successfully, but these errors were encountered: