-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ObjectSpace.dump: Include string coderange #6076
Conversation
dump_append(dc, "\""); | ||
|
||
if (RB_ENC_CODERANGE(obj) == RUBY_ENC_CODERANGE_BROKEN) | ||
dump_append(dc, ", \"broken\":true"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how you're looking to consume this data, but this doesn't appear to give you any more information than looking at the generated "coderange": "broken"
value. If you do need or want it, is it customary to leave off the false
case? I've never looked at objspace_dump.c before, so I'm legitimately curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a bit extra, but it was there before, so removing it would be somewhat of a breaking change.
is it customary to leave off the false case?
Yes, several other attributes are like that. Heap dumps can be huge, so it's actually a good thing not to include redundant information.
b771385
to
d067a6f
Compare
Urgh: $ grep --color '"coderange":"' ~/Downloads/shopify-production-boot-heap-2022-06-30.dump | wc -l
2071368
$ grep --color '"coderange":"unknown"' ~/Downloads/shopify-production-boot-heap-2022-06-30.dump | wc -l
1213428 That's like 58% of our strings with an unknown coderange. Meaning if they get scanned post fork, the entire page will be invalidated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some tests for this in test/objspace/test_objspace.rb
?
I suspect that some shared pages are invalidated because some static string don't have their coderange set eagerly. So the first time they are scanned, the entire memory page is invalidated. Being able to see the coderange in `ObjectSpace` would help debug this. And in addition `dump` currently call `is_broken_string()` and `is_ascii_string()` which both end up scanning the string and assigning coderange. I think it's undesirable as `dump` should be read only.
@peterzhu2118 good point. Done. |
d067a6f
to
4f7d628
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I suspect that some shared pages are invalidated because some static string don't have their coderange set eagerly.
So the first time they are scanned, the entire memory page is invalidated.
Being able to see the coderange in
ObjectSpace
would help debug this.And in addition
dump
currently callis_broken_string()
andis_ascii_string()
which both end up scanning the string and assigning coderange. I think it's undesirable asdump
should be read only.