Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate TypeName#name and Field#name string #4897

Closed

Conversation

casperisfine
Copy link
Contributor

Looking at deduplicated string reports from out application, GraphQL field and type are at the top:

Retained String Report
-----------------------------------
 203.20 kB    5080  "String"
              4902  graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73
               172  graphql-2.2.13/lib/graphql/language/lexer.rb:80
  60.77 kB    1485  "ID"
              1271  graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73
               166  graphql-2.2.13/lib/graphql/language/lexer.rb:80
                19  graphql-client-ee19269df0ee/lib/graphql/client/schema/enum_type.rb:65
  35.32 kB     883  "id"
               518  graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73
               353  graphql-2.2.13/lib/graphql/language/lexer.rb:80
  34.24 kB     856  "Int"
               800  graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73
                51  graphql-2.2.13/lib/graphql/language/lexer.rb:80
  33.08 kB     827  "Boolean"
               811  graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73
                10  graphql-2.2.13/lib/graphql/language/lexer.rb:80

I'm very unfamilair with the gem, but looking at which objects retain them I think I managed to locate where best to deduplicate them, but perhaps there is a better way:

root path to 0x7da15ef231e0:
                      ROOT (vm)
...
                      0x7da15ef02bc0 (OBJECT: GraphQL::Language::Nodes::VariableDefinition)
                      0x7da15ef6d718 (OBJECT: GraphQL::Language::Nodes::ListType)
                      0x7da15ef6d768 (OBJECT: GraphQL::Language::Nodes::NonNullType)
                      0x7da15ef6d7b8 (OBJECT: GraphQL::Language::Nodes::TypeName)
                      0x7da15ef231e0 (STRING: "String")

harb> rootpath 0x7da15da32f60
root path to 0x7da15da32f60:
                      ROOT (vm)
...
                      0x7da15d9bd7b0 (OBJECT: GraphQL::Schema::Argument)
                      0x7da15da28218 (OBJECT: GraphQL::Language::Nodes::InputValueDefinition)
                      0x7da15d9ff340 (OBJECT: GraphQL::Language::Nodes::NonNullType)
                      0x7da15d9ff390 (OBJECT: GraphQL::Language::Nodes::TypeName)
                      0x7da15da32f60 (STRING: "String")

harb> rootpath 0x7da160835c28
root path to 0x7da160835c28:
                      ROOT (vm)
...
                      0x7da16050c1c8 (OBJECT: GraphQL::Schema::Argument)
                      0x7da1606e0aa8 (OBJECT: GraphQL::Language::Nodes::InputValueDefinition)
                      0x7da160751d70 (OBJECT: GraphQL::Language::Nodes::NonNullType)
                      0x7da160751dc0 (OBJECT: GraphQL::Language::Nodes::TypeName)
                      0x7da160835c28 (STRING: "ID")

harb> rootpath 0x7da15f36ea10
root path to 0x7da15f36ea10:
                      ROOT (vm)
                      0x7da3f6ddfd20 (CLASS: Object)
...
                      0x7da15f132f80 (OBJECT: GraphQL::Schema::Field)
                      0x7da15f36ea10 (STRING: "id")

harb> rootpath 0x7da16016beb0
root path to 0x7da16016beb0:
                      ROOT (vm)
...
                      0x7da15ff16408 (OBJECT: GraphQL::Schema::Field)
                      0x7da16016beb0 (STRING: "clientMutationId")

Looking at deduplicated string reports from out application,
GraphQL field and type are at the top:

```
Retained String Report
-----------------------------------
 203.20 kB    5080  "String"
              4902  graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73
               172  graphql-2.2.13/lib/graphql/language/lexer.rb:80
  60.77 kB    1485  "ID"
              1271  graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73
               166  graphql-2.2.13/lib/graphql/language/lexer.rb:80
                19  graphql-client-ee19269df0ee/lib/graphql/client/schema/enum_type.rb:65
  35.32 kB     883  "id"
               518  graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73
               353  graphql-2.2.13/lib/graphql/language/lexer.rb:80
  34.24 kB     856  "Int"
               800  graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73
                51  graphql-2.2.13/lib/graphql/language/lexer.rb:80
  33.08 kB     827  "Boolean"
               811  graphql-c_parser-1.0.8/lib/graphql/c_parser.rb:73
                10  graphql-2.2.13/lib/graphql/language/lexer.rb:80
```

I'm very unfamilair with the gem, but looking at which objects retain them
I think I managed to locate where best to deduplicate them, but perhaps there
is a better way:

```
root path to 0x7da15ef231e0:
                      ROOT (vm)
...
                      0x7da15ef02bc0 (OBJECT: GraphQL::Language::Nodes::VariableDefinition)
                      0x7da15ef6d718 (OBJECT: GraphQL::Language::Nodes::ListType)
                      0x7da15ef6d768 (OBJECT: GraphQL::Language::Nodes::NonNullType)
                      0x7da15ef6d7b8 (OBJECT: GraphQL::Language::Nodes::TypeName)
                      0x7da15ef231e0 (STRING: "String")

harb> rootpath 0x7da15da32f60
root path to 0x7da15da32f60:
                      ROOT (vm)
...
                      0x7da15d9bd7b0 (OBJECT: GraphQL::Schema::Argument)
                      0x7da15da28218 (OBJECT: GraphQL::Language::Nodes::InputValueDefinition)
                      0x7da15d9ff340 (OBJECT: GraphQL::Language::Nodes::NonNullType)
                      0x7da15d9ff390 (OBJECT: GraphQL::Language::Nodes::TypeName)
                      0x7da15da32f60 (STRING: "String")

harb> rootpath 0x7da160835c28
root path to 0x7da160835c28:
                      ROOT (vm)
...
                      0x7da16050c1c8 (OBJECT: GraphQL::Schema::Argument)
                      0x7da1606e0aa8 (OBJECT: GraphQL::Language::Nodes::InputValueDefinition)
                      0x7da160751d70 (OBJECT: GraphQL::Language::Nodes::NonNullType)
                      0x7da160751dc0 (OBJECT: GraphQL::Language::Nodes::TypeName)
                      0x7da160835c28 (STRING: "ID")

harb> rootpath 0x7da15f36ea10
root path to 0x7da15f36ea10:
                      ROOT (vm)
                      0x7da3f6ddfd20 (CLASS: Object)
...
                      0x7da15f132f80 (OBJECT: GraphQL::Schema::Field)
                      0x7da15f36ea10 (STRING: "id")

harb> rootpath 0x7da16016beb0
root path to 0x7da16016beb0:
                      ROOT (vm)
...
                      0x7da15ff16408 (OBJECT: GraphQL::Schema::Field)
                      0x7da16016beb0 (STRING: "clientMutationId")
```
Copy link
Owner

@rmosolgo rmosolgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for sharing this detailed profile. I bet we can work these out! I shared a few comments above.

@@ -228,6 +228,9 @@ def initialize(type: nil, name: nil, owner: nil, null: nil, description: NOT_CON
raise ArgumentError, "missing second `type` argument or keyword `type:`"
end
end

name = -name if name.is_a?(String)

@original_name = name
name_s = -name.to_s
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line was supposed to de-duplicate names in memory... but maybe it wasn't working as intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's too late because we stored name in @original_name.

String#-@ returns a copy (unless the string is already frozen).

@@ -608,7 +608,7 @@ def parse_enum_name
end

def parse_type_name
TypeName.new(pos: pos, name: parse_name, filename: @filename, source_string: @graphql_str)
TypeName.new(pos: pos, name: -parse_name, filename: @filename, source_string: @graphql_str)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your trace as graphql/c_parser in it, which has its own parser and lexer. So I doubt this would affect the outcome of it. My guess would be here, where the C lexer makes tokens out of identifiers in the string:

DYNAMIC_VALUE_TOKEN(IDENTIFIER)

#define DYNAMIC_VALUE_TOKEN(token_type) \
case token_type: \
token_sym = ID2SYM(rb_intern(#token_type)); \
token_content = rb_utf8_str_new(ts, te - ts); \
break;

Specifically the rb_utf8_str_new(...) call. Maybe it would benefit from a str_uminus (src) call too?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this one is also a "repeat offender" in your profile:

@string.byteslice(@scanner.pos - @scanner.matched_size, @scanner.matched_size)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm unfamiliar with the GraphQL parser. Deduping strings there is an option, but deduping a string that has a low chance to be duplicated, is counter productive. Long story short String#-@ can be seen as:

class String
  INTERNED_STRINGS = {}

  def -@
    INTERNED_STRINGS[self] ||= frozen? ? self : dup.freeze
  end
end

So if you attempt to dedup all the strings ever, you will bloat that interal hash size, and waste a lot of time looking for strings into it.

Now if the parser is aware of the semantic of what it's trying to parse (e.g. type name, identifier, etc) then yes it can selectively dedup.

Also for the C parser specifically, there are dedicated functions to get an interned String directly from a char *:

VALUE rb_enc_interned_str(const char *ptr, long len, rb_encoding *enc);
VALUE rb_enc_interned_str_cstr(const char *ptr, rb_encoding *enc);

This directly lookup in the interned string table before allocating a string, which on hit allow to get a string back with 0 allocations.

So if you think there's a better fix than mine, and would like to do it, please do, I think you have a much better overview of that codebase than me.

@rmosolgo
Copy link
Owner

rmosolgo commented Apr 4, 2024

I added a benchmark to the project and started an improvement on it here: #4899

I tried cherry-picking your commit here but I found that it didn't improve the retained strings. Did it fix your benchmark? (I think it wasn't improved because the originally-parsed strings were retained as part of the AST nodes, and the AST nodes were retained with the GraphQL schema definition objects.)

Do you know if Ruby Strings created with rb_enc_interned_str are GC'ed?

@casperisfine
Copy link
Contributor Author

Did it fix your benchmark?

Nah, I must admit I didn't retest with the patch, it wasa bit of a drive by PR, sorry :/

Do you know if Ruby Strings created with rb_enc_interned_str are GC'ed?

Yes they aren't immortal. It's just that if it's almost certain no-one else will need the same, an interned string uses a small bit more memory than a regular one. So it's only "amortized" if it actually dedup at least another instance.

@rmosolgo
Copy link
Owner

I just released graphql-c_parser 1.1.0 with the fix from #4899 -- but it will need GraphQL-Ruby 2.3.1 in order to take effect by default. If you want to use it with 2.3.0, you have to manually pass GraphQL::CParser::SchemaParser:

MySchema.to_definition(parser: GraphQL::CParser::SchemaParser)

@rmosolgo rmosolgo closed this Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants