Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObjectSpace.dump: Include string coderange #6076

Merged
merged 1 commit into from
Jul 4, 2022

Conversation

casperisfine
Copy link
Contributor

I suspect that some shared pages are invalidated because some static string don't have their coderange set eagerly.

So the first time they are scanned, the entire memory page is invalidated.

Being able to see the coderange in ObjectSpace would help debug this.

And in addition dump currently call is_broken_string() and is_ascii_string() which both end up scanning the string and assigning coderange. I think it's undesirable as dump should be read only.

ext/objspace/objspace_dump.c Show resolved Hide resolved
dump_append(dc, "\"");

if (RB_ENC_CODERANGE(obj) == RUBY_ENC_CODERANGE_BROKEN)
dump_append(dc, ", \"broken\":true");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how you're looking to consume this data, but this doesn't appear to give you any more information than looking at the generated "coderange": "broken" value. If you do need or want it, is it customary to leave off the false case? I've never looked at objspace_dump.c before, so I'm legitimately curious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a bit extra, but it was there before, so removing it would be somewhat of a breaking change.

is it customary to leave off the false case?

Yes, several other attributes are like that. Heap dumps can be huge, so it's actually a good thing not to include redundant information.

@casperisfine
Copy link
Contributor Author

Urgh:

$ grep --color '"coderange":"' ~/Downloads/shopify-production-boot-heap-2022-06-30.dump | wc -l
 2071368
$ grep --color '"coderange":"unknown"' ~/Downloads/shopify-production-boot-heap-2022-06-30.dump | wc -l
 1213428

That's like 58% of our strings with an unknown coderange. Meaning if they get scanned post fork, the entire page will be invalidated.

Copy link
Member

@peterzhu2118 peterzhu2118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some tests for this in test/objspace/test_objspace.rb?

I suspect that some shared pages are invalidated because
some static string don't have their coderange set eagerly.

So the first time they are scanned, the entire memory page is
invalidated.

Being able to see the coderange in `ObjectSpace` would help debug
this.

And in addition `dump` currently call `is_broken_string()`  and `is_ascii_string()`
which both end up scanning the string and assigning coderange. I think it's
undesirable as `dump` should be read only.
@casperisfine
Copy link
Contributor Author

@peterzhu2118 good point. Done.

Copy link
Member

@peterzhu2118 peterzhu2118 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@byroot byroot merged commit 890df5f into ruby:master Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants