New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate TypeName#name and Field#name string #4897
Conversation
Looking at deduplicated string reports from out application, GraphQL field and type are at the top: ``` Retained String Report ----------------------------------- 203.20 kB 5080 "String" 4902 graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73 172 graphql-2.2.13/lib/graphql/language/lexer.rb:80 60.77 kB 1485 "ID" 1271 graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73 166 graphql-2.2.13/lib/graphql/language/lexer.rb:80 19 graphql-client-ee19269df0ee/lib/graphql/client/schema/enum_type.rb:65 35.32 kB 883 "id" 518 graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73 353 graphql-2.2.13/lib/graphql/language/lexer.rb:80 34.24 kB 856 "Int" 800 graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73 51 graphql-2.2.13/lib/graphql/language/lexer.rb:80 33.08 kB 827 "Boolean" 811 graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73 10 graphql-2.2.13/lib/graphql/language/lexer.rb:80 ``` I'm very unfamilair with the gem, but looking at which objects retain them I think I managed to locate where best to deduplicate them, but perhaps there is a better way: ``` root path to 0x7da15ef231e0: ROOT (vm) ... 0x7da15ef02bc0 (OBJECT: GraphQL::Language::Nodes::VariableDefinition) 0x7da15ef6d718 (OBJECT: GraphQL::Language::Nodes::ListType) 0x7da15ef6d768 (OBJECT: GraphQL::Language::Nodes::NonNullType) 0x7da15ef6d7b8 (OBJECT: GraphQL::Language::Nodes::TypeName) 0x7da15ef231e0 (STRING: "String") harb> rootpath 0x7da15da32f60 root path to 0x7da15da32f60: ROOT (vm) ... 0x7da15d9bd7b0 (OBJECT: GraphQL::Schema::Argument) 0x7da15da28218 (OBJECT: GraphQL::Language::Nodes::InputValueDefinition) 0x7da15d9ff340 (OBJECT: GraphQL::Language::Nodes::NonNullType) 0x7da15d9ff390 (OBJECT: GraphQL::Language::Nodes::TypeName) 0x7da15da32f60 (STRING: "String") harb> rootpath 0x7da160835c28 root path to 0x7da160835c28: ROOT (vm) ... 0x7da16050c1c8 (OBJECT: GraphQL::Schema::Argument) 0x7da1606e0aa8 (OBJECT: GraphQL::Language::Nodes::InputValueDefinition) 0x7da160751d70 (OBJECT: GraphQL::Language::Nodes::NonNullType) 0x7da160751dc0 (OBJECT: GraphQL::Language::Nodes::TypeName) 0x7da160835c28 (STRING: "ID") harb> rootpath 0x7da15f36ea10 root path to 0x7da15f36ea10: ROOT (vm) 0x7da3f6ddfd20 (CLASS: Object) ... 0x7da15f132f80 (OBJECT: GraphQL::Schema::Field) 0x7da15f36ea10 (STRING: "id") harb> rootpath 0x7da16016beb0 root path to 0x7da16016beb0: ROOT (vm) ... 0x7da15ff16408 (OBJECT: GraphQL::Schema::Field) 0x7da16016beb0 (STRING: "clientMutationId") ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for sharing this detailed profile. I bet we can work these out! I shared a few comments above.
@@ -228,6 +228,9 @@ def initialize(type: nil, name: nil, owner: nil, null: nil, description: NOT_CON | |||
raise ArgumentError, "missing second `type` argument or keyword `type:`" | |||
end | |||
end | |||
|
|||
name = -name if name.is_a?(String) | |||
|
|||
@original_name = name | |||
name_s = -name.to_s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line was supposed to de-duplicate names in memory... but maybe it wasn't working as intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it's too late because we stored name
in @original_name
.
String#-@
returns a copy (unless the string is already frozen).
@@ -608,7 +608,7 @@ def parse_enum_name | |||
end | |||
|
|||
def parse_type_name | |||
TypeName.new(pos: pos, name: parse_name, filename: @filename, source_string: @graphql_str) | |||
TypeName.new(pos: pos, name: -parse_name, filename: @filename, source_string: @graphql_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your trace as graphql/c_parser
in it, which has its own parser and lexer. So I doubt this would affect the outcome of it. My guess would be here, where the C lexer makes tokens out of identifiers in the string:
DYNAMIC_VALUE_TOKEN(IDENTIFIER) |
graphql-ruby/graphql-c_parser/ext/graphql_c_parser_ext/lexer.rl
Lines 205 to 209 in ec1ae00
#define DYNAMIC_VALUE_TOKEN(token_type) \ | |
case token_type: \ | |
token_sym = ID2SYM(rb_intern(#token_type)); \ | |
token_content = rb_utf8_str_new(ts, te - ts); \ | |
break; |
Specifically the rb_utf8_str_new(...)
call. Maybe it would benefit from a str_uminus
(src) call too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this one is also a "repeat offender" in your profile:
@string.byteslice(@scanner.pos - @scanner.matched_size, @scanner.matched_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm unfamiliar with the GraphQL parser. Deduping strings there is an option, but deduping a string that has a low chance to be duplicated, is counter productive. Long story short String#-@
can be seen as:
class String
INTERNED_STRINGS = {}
def -@
INTERNED_STRINGS[self] ||= frozen? ? self : dup.freeze
end
end
So if you attempt to dedup all the strings ever, you will bloat that interal hash size, and waste a lot of time looking for strings into it.
Now if the parser is aware of the semantic of what it's trying to parse (e.g. type name, identifier, etc) then yes it can selectively dedup.
Also for the C parser specifically, there are dedicated functions to get an interned String directly from a char *
:
VALUE rb_enc_interned_str(const char *ptr, long len, rb_encoding *enc);
VALUE rb_enc_interned_str_cstr(const char *ptr, rb_encoding *enc);
This directly lookup in the interned string table before allocating a string, which on hit allow to get a string back with 0 allocations.
So if you think there's a better fix than mine, and would like to do it, please do, I think you have a much better overview of that codebase than me.
I added a benchmark to the project and started an improvement on it here: #4899 I tried cherry-picking your commit here but I found that it didn't improve the retained strings. Did it fix your benchmark? (I think it wasn't improved because the originally-parsed strings were retained as part of the AST nodes, and the AST nodes were retained with the GraphQL schema definition objects.) Do you know if Ruby Strings created with |
Nah, I must admit I didn't retest with the patch, it wasa bit of a drive by PR, sorry :/
Yes they aren't immortal. It's just that if it's almost certain no-one else will need the same, an interned string uses a small bit more memory than a regular one. So it's only "amortized" if it actually dedup at least another instance. |
I just released graphql-c_parser 1.1.0 with the fix from #4899 -- but it will need GraphQL-Ruby 2.3.1 in order to take effect by default. If you want to use it with 2.3.0, you have to manually pass MySchema.to_definition(parser: GraphQL::CParser::SchemaParser) |
Looking at deduplicated string reports from out application, GraphQL field and type are at the top:
I'm very unfamilair with the gem, but looking at which objects retain them I think I managed to locate where best to deduplicate them, but perhaps there is a better way: