Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Allow keys for tags to be symbols or strings #35

Closed
mwear opened this issue Jun 27, 2018 · 10 comments
Closed

Proposal: Allow keys for tags to be symbols or strings #35

mwear opened this issue Jun 27, 2018 · 10 comments

Comments

@mwear
Copy link
Member

mwear commented Jun 27, 2018

Currently Span#set_tag accepts strings for tag key names. There are a few reasons why this is undesirable. String keys, where the strings are not frozen, result in additional object allocations and more garbage generation. When tracing an application, we want to minimize the impact of the tracer on application performance. We could work around this problem by having a set of frozen string constants defined in the OpenTracing API, but that also has some downsides. For example, we would only have constants for tags defined in the semantic conventions document (not for custom tags) and we'd have to release a new version of the API when new tags are added.

Using symbols for key names would reduce garbage generation and obviate the need to ship frozen string constants for instrumentation authors to use, they would facilitate custom tag names, and would be more natural and idiomatic than frozen string contants for Ruby developers. However, making tag keys symbols exclusively, would be a backwards incompatible change, and would likely be unpopular. Perhaps it would be a reasonable compromise to update the API to allow for keys to be strings or symbols, and the tracer implementation can normalize them as desired.

I would be interested in having a discussion to see what people's opinions are on this proposal.

cc: @indrekj @itsderek23

@pglombardo
Copy link
Contributor

👍 I like the flexibility (and backwards compatibility) of accepting both Strings and Symbols. With that, Tracer implementations should always use to_sym to use only symbols internally and as a result, release object references to any passed in Strings.

to_sym is supported on both Strings and Symbols for anyone not familiar. Symbol.to_sym just returns self.

Then the burden of reducing object allocation is on the API user. e.g. don't pass in strings

@indrekj
Copy link
Contributor

indrekj commented Jun 28, 2018

Tracer implementations should always use to_sym to use only symbols internally and as a result, release object references to any passed in Strings.

I'm not sure if I agree with this. E.g. zipkin API expects tag names and values to be strings so I don't think there's a benefit of using symbols everywhere https://github.com/salemove/zipkin-ruby-opentracing/blob/master/lib/zipkin/collector.rb#L42 / https://github.com/salemove/jaeger-client-ruby/blob/master/lib/jaeger/client/collector.rb#L50 Although encoding to json already changes the keys to strings so maybe it's fine.

I think accepting both Symbols and Strings is fine. I'd let the tracer implementation then handle how it keeps them internally.

jaeger-client-ruby and zipkin-ruby-opentracing tracers already can handle symbol keys IIRC (they are converted to strings internally).

EDIT: clients mentioned above actually are not fully compatible with symbols yet, but I can change them.

@pglombardo
Copy link
Contributor

I'm not sure if I agree with this. E.g. zipkin API expects tag names and values to be strings

Yeah I understand there may be cases where it may not be possible but in Ruby, the benefits in terms of memory usage and reduced object allocations are big. Hence the word "should" for tracer implementations (although knowing there are exceptions).

In a previous Ruby instrumentation project I worked on (non OpenTracing), I found that switching to Symbols reduced gem memory usage by ~36% and object allocations by ~52%. It was definitely worth the effort and is a good general rule to follow going forward when possible.

In any case, I'm also 👍 this PR.

@mwear
Copy link
Member Author

mwear commented Jun 30, 2018

Thanks to everyone for the comments so far. I'll leave this open a little bit longer to see if anyone has opinions. I'd also like to add a few thoughts myself.

If instrumentation uses primarily symbols for tag names, and I think we should heavily encourage this, there are opportunities for tracer implementations to save string allocations. Even when serializing to JSON, string allocations can be spared by maintaining a symbol to frozen string cache for keys and looking those up during serialization. Also, some tracer implementations may choose to sample, in which case, not all spans will be serialized and the use of symbols will avoid additional allocations.

@indrekj
Copy link
Contributor

indrekj commented Aug 4, 2018

It's been a month. What's the plan here?

@mwear
Copy link
Member Author

mwear commented Aug 8, 2018

All of the comments are positive regarding this change, and while I like some aspects of it, I do have some apprehensions.

  • For a tracer to be able to take full advantage of this, instrumentation should primarily use symbols. That is not currently the case, and I'm unsure if it ever will be.
  • This change would be unnecessary if we can rely on users and instrumentation using frozen strings for tags. Currently that is the case in opentracing-contrib, and perhaps we could encourage this practice by adding frozen string constants for the tags in the semantic conventions.
  • Ultimately, having tags uniformly be frozen strings, or uniformly symbols opens up a lot of possibilities for tracer implementations. Having to deal with a mix of strings, frozen strings, and symbols makes tracer-side optimizations quite a bit harder. I do worry that adding this flexibility might make things more problematic for tracer implementations going forward.

I'd be interested in anyone has feedback on any of my concerns. Right now I am pretty solidly on the fence as to whether is a good idea or not.

@pglombardo
Copy link
Contributor

For a tracer to be able to take full advantage of this, instrumentation should primarily use symbols.

This proposal provides more flexibility to implementations but in the end it's ultimately up to them.

That is not currently the case, and I'm unsure if it ever will be.

I'm not sure who you are referring to but Ruby being a dynamic language - this proposal is right in line with that at no extra cost or confusion.

if we can rely on users and instrumentation using frozen strings for tags

I prefer to provide more options instead of limiting the API and "suggesting" how users should write their code. Especially so when the subject is something like Symbols vs Strings - potential religious entrenchments. In this case, the option of either is the best path forward IMO.

@indrekj
Copy link
Contributor

indrekj commented Aug 9, 2018

the option of either is the best path forward IMO

This means the tracing client implementation must handle both cases. The implementation can expect strings and do #to_s to be sure but if the user is sending symbols then this is a new string allocation each time. There's option to build in memory cache for symbol to string coercion to avoid extra memory allocation but that's extra work on the tracing client implementation side.

In my opinion there should be string keys or symbol keys but not both.

@mwear
Copy link
Member Author

mwear commented Aug 9, 2018

Tracer implementations will have to choose to represent tags as strings or symbols internally.

If a tracer chooses to use represent tags as symbols internally, but instrumentation passes in frozen strings, the tracer will incur additional object allocations when serializing the span that could have been avoided if it used strings internally.

If a tracer chooses to represent tags internally as strings, then it will have to, as @indrekj mentioned previously, maintain a symbol to frozen string cache. This implementation would spare any additional object allocations during serialization.

The third option, which would require compliance from users and instrumentation authors, would be to always use frozen strings as keys. This would result in the least number of object allocations, and save tracer implementations from having to worry about keys in multiple formats.

All of these approaches have pros and cons. I'm unsure what path is the best one forward, although I'm open to all of them.

@mwear
Copy link
Member Author

mwear commented Sep 12, 2018

I am going to close this issue for the time being. All the approaches discussed seem to have their tradeoffs and it's not clear that any of them are a clear winner. I appreciate everyone's input on this issue and I'm happy to revisit it if anyone is passionate about it or has further insights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants