Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header propagation problems #97

Closed
ajvondrak opened this issue Jun 13, 2020 · 1 comment
Closed

Header propagation problems #97

ajvondrak opened this issue Jun 13, 2020 · 1 comment

Comments

@ajvondrak
Copy link
Contributor

Background

I have two services instrumented in production: a Ruby service that talks to a Python service. I just spent a bunch of time debugging two issues I was getting with my distributed traces (which were actually kind of one issue), and it's a bit of a long story. For want of a better place to share the story and the questions left in its wake, I'm opening the issue here. So, strap in kids, we're going for a ride...

The whole crux of my bug is that I add a trace-level field whose value happened to be a URI object.

uri = URI.parse('http://honeycomb.io')
Honeycomb.add_field_to_trace('uri', uri)

How this breaks is pretty circuitous.

When I only had the Ruby service instrumented, I didn't notice this was a URI instead of a string. In the Honeycomb UI it showed up just fine, but that's because Libhoney::Cleaner calls URI#to_s automatically before sending the event: https://github.com/honeycombio/libhoney-rb/blob/4457c71b311f340967cc138f6e3afcf8d7f9c0f2/lib/libhoney/cleaner.rb#L41-L46

When I added the Python service, I saw two types of errors:

  1. When the Python traces were present, they weren't nested in the Ruby spans as expected.
  2. Sometimes, the request from the Ruby service to the Python service would fail with an HTTP 400.

Both of these smelled like trace header issues. So I added the X-Honeycomb-Trace header value and the HTTP 400 response body to the Ruby spans.

Turns out the HTTP 400s were from nginx rejecting the requests with

<html>
<head><title>400 Request Header Or Cookie Too Large</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<center>Request Header Or Cookie Too Large</center>
</body>
</html>

Okay, I have a healthy amount of trace fields, but I didn't think I had that many. So I grab the header value from the span and go about trying to parse it. Just how much data is getting sent?

The beeline-ruby problem

I'm more familiar with the Ruby beeline, so I chose to use it to parse the header in my local console. Using a constructed example for the sake of this write-up (now that I know what the actual issue is), let's suppose the trace header looked like this:

1;dataset=dataset,trace_id=trace_id,parent_id=parent_id,context=eyJ1cmkiOnsic2NoZW1lIjoiaHR0cCIsInVzZXIiOm51bGwsInBhc3N3b3JkIjpudWxsLCJob3N0IjoiaG9uZXljb21iLmlvIiwicG9ydCI6ODAsInBhdGgiOiIiLCJxdWVyeSI6bnVsbCwib3BhcXVlIjpudWxsLCJmcmFnbWVudCI6bnVsbCwicGFyc2VyIjp7InJlZ2V4cCI6eyJTQ0hFTUUiOiIoPy1taXg6XFxBW0EtWmEtel1bQS1aYS16MC05K1xcLS5dKlxceikiLCJVU0VSSU5GTyI6Iig_LW1peDpcXEEoPzolXFxoXFxofFshJCYtLjAtOz1BLVpfYS16fl0pKlxceikiLCJIT1NUIjoiKD8tbWl4OlxcQSg_Oig_PElQLWxpdGVyYWw-XFxbKD86KD88SVB2NmFkZHJlc3M-KD86XFxoezEsNH06KXs2fSg_PGxzMzI-XFxoezEsNH06XFxoezEsNH18KD88SVB2NGFkZHJlc3M-KD88ZGVjLW9jdGV0PlsxLTldXFxkfDFcXGR7Mn18MlswLTRdXFxkfDI1WzAtNV18XFxkKVxcLlxcZzxkZWMtb2N0ZXQ-XFwuXFxnPGRlYy1vY3RldD5cXC5cXGc8ZGVjLW9jdGV0PikpfDo6KD86XFxoezEsNH06KXs1fVxcZzxsczMyPnxcXGh7LDR9OjooPzpcXGh7MSw0fTopezR9XFxnPGxzMzI-fCg_Oig_OlxcaHsxLDR9Oik_XFxoezEsNH0pPzo6KD86XFxoezEsNH06KXszfVxcZzxsczMyPnwoPzooPzpcXGh7MSw0fTopeywyfVxcaHsxLDR9KT86Oig_OlxcaHsxLDR9Oil7Mn1cXGc8bHMzMj58KD86KD86XFxoezEsNH06KXssM31cXGh7MSw0fSk_OjpcXGh7MSw0fTpcXGc8bHMzMj58KD86KD86XFxoezEsNH06KXssNH1cXGh7MSw0fSk_OjpcXGc8bHMzMj58KD86KD86XFxoezEsNH06KXssNX1cXGh7MSw0fSk_OjpcXGh7MSw0fXwoPzooPzpcXGh7MSw0fTopeyw2fVxcaHsxLDR9KT86Oil8KD88SVB2RnV0dXJlPnZcXGgrXFwuWyEkJi0uMC07PUEtWl9hLXp-XSspKVxcXSl8XFxnPElQdjRhZGRyZXNzPnwoPzxyZWctbmFtZT4oPzolXFxoXFxofFshJCYtLjAtOTs9QS1aX2Eten5dKSopKVxceikiLCJBQlNfUEFUSCI6Iig_LW1peDpcXEFcXC8oPzolXFxoXFxofFshJCYtLjAtOz1ALVpfYS16fl0pKig_OlxcLyg_OiVcXGhcXGh8WyEkJi0uMC07PUAtWl9hLXp-XSkqKSpcXHopIiwiUkVMX1BBVEgiOiIoPy1taXg6XFxBKD86JVxcaFxcaHxbISQmLS4wLTs9QC1aX2Eten5dKSsoPzpcXC8oPzolXFxoXFxofFshJCYtLjAtOz1ALVpfYS16fl0pKikqXFx6KSIsIlFVRVJZIjoiKD8tbWl4OlxcQSg_OiVcXGhcXGh8WyEkJi0uMC07PUAtWl9hLXp-XFwvP10pKlxceikiLCJGUkFHTUVOVCI6Iig_LW1peDpcXEEoPzolXFxoXFxofFshJCYtLjAtOz1ALVpfYS16flxcLz9dKSpcXHopIiwiT1BBUVVFIjoiKD8tbWl4OlxcQSg_OlteXFwvXS4qKT9cXHopIiwiUE9SVCI6Iig_LW1peDpcXEFbXFx4MDlcXHgwYVxceDBjXFx4MGQgXSpcXGQqW1xceDA5XFx4MGFcXHgwY1xceDBkIF0qXFx6KSJ9fX19
require 'honeycomb-beeline'
include Honeycomb::PropagationParser
parse(trace)
#=> ["trace_id", "parent_id", {}, "dataset"]

The context parsing apparently fails to parse the JSON in this block:

Base64.decode64(value).tap do |json|
begin
trace_fields = JSON.parse json
rescue JSON::ParserError
trace_fields = {}
end
end

That being the case, what does the base 64 decoded value look like?

Base64.decode64(context)
#=> "{\"uri\":{\"scheme\":\"http\",\"user\":null,\"password\":null,\"host\":\"honeycomb.io\",\"port\":80,\"path\":\"\",\"query\":null,\"opaque\":null,\"fragment\":null,\"parser\":{\"regexp\":{\"SCHEME\":\"(?-mix:\\\\A[A-Za-z][A-Za-z0-9+\\\\-.]*\\\\z)\",\"USERINFO\":\"(\v[Z^\x0E\x97\x17\x10J\x0F\xCE\x89W\x17\x1A\x17\x17\x1A\x1F\x16\xC8I\t\x8BK\x8C\vN\xCFPKV\x97\xD8K^\x9F\x97JJ\x97\x17\x1E\x8AH\x8B\b\x92\x13\xD4\xD5\b\x8E\x88\x8A\x0F\xCB[Z^\x0E\x97\x17\x10J\x03\xA2\x80\xF1%@\xB5\xB1\xA5\xD1\x95\xC9\x85\xB0\\\\[(?:(?<IPv6address\n\x0F\xCE\x97\x17\x1A\x1E\xCCK\r\x1FN\x8A^\xCD\x9FJ\x03\xC6\xC733!qq\xA1\xEC\xC4\xB0\xD1\xF4\xE9qq\xA1\xEC\xC4\xB0\xD1\xF5\xF0\xA0\xFC\xF1%A\xD8\xD1\x85\x91\x91\xC9\x95\xCD\xCC(?<dec-octet>[1-9]\\\\d|1\\\\d{2}|2[0-4]\\\\d|25[0-5]|\\\\d)\\\\.\\\\g<dec-octet\x17\x17\v\x97\x17\x19\xCF\x19\x19X\xCB[\xD8\xDD\x19]\x0F\x97\x17\v\x97\x17\x19\xCF\x19\x19X\xCB[\xD8\xDD\x19]\x0F\x8AJ_\x0E\x8E\x8A\x0F\xCE\x97\x17\x1A\x1E\xCCK\r\x1FN\x8A^\xCD_W\x17\x19\xCF\e\x1C\xCC\xCC\x8F\x9F\x17\x17\x1A\x1E\xCB\r\x1FN\x8E\x8A\x0F\xCE\x97\x17\x1A\x1E\xCCK\r\x1FN\x8A^\xCD\x1FW\x17\x19\xCF\e\x1C\xCC\xCC\x87\xC2\x80\xE8\xA0:\\\\h{1,4}:)\x17\x17\x1A\x1E\xCCK\r\x1FJO\xCE\x8E\x8A\x0F\xCE\x97\x17\x1A\x1E\xCCK\r\x1FN\x8A^\xCC\xDFW\x17\x19\xCF\e\x1C\xCC\xCC\x8F\x9F\n\x0F\xCE\x8A\x0F\xCE\x97\x17\x1A\x1E\xCCK\r\x1FN\x8A^\xCB\f\x9FW\x17\x1A\x1E\xCCK\r\x1FJO\xCE\x8E\x8A\x03\xA5\xC5\xC6\x87\xB3\x12\xC3G\xD3\xA2\x97\xB3'\xD5\xC5\xC6s\xC6\xC733#\xE7\xC2\x83\xF3\xA2\x83\xF3\xA5\xC5\xC6\x87\xB3\x12\xC3G\xD3\xA2\x97\xB2\xC37\xD5\xC5\xC6\x87\xB3\x12\xC3G\xD2\x90\xE8\xE9qq\xA1\xEC\xC4\xB0\xD1\xF4\xE9qq\x9C\xF1\xB1\xCC\xCC\xC8\xF9\xF0\xA0\xFC\xE8\xA0\xFC\xE9qq\xA1\xEC\xC4\xB0\xD1\xF4\xE8\xA5\xEC\xB0\xD1\xF5qq\xA1\xEC\xC4\xB0\xD1\xF4\xA4::\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,5}\\\\h{1,4})\x0E\x8E\x97\x17\x1A\x1E\xCCK\r\x1F_\n\x0F\xCE\x8A\x0F\xCE\x97\x17\x1A\x1E\xCCK\r\x1FN\x8A^\xCB\r\x9FW\x17\x1A\x1E\xCCK\r\x1FJO\xCE\x8E\x8A_\n\x0F\xCF\x12T\x1D\x91\x9D]\x1D\\\x99O\x9D\x97\x17\x1A\n\xD7\x17\v\x96\xC8I\t\x8BK\x8C\vN\xCFPKV\x97\xD8K^\x95\xD2\xB2\x92\x95\xC5\xC5\xD2\x97\xC5\xC5\xC6s\xC4\x95\acF\x16FG&W73\xE7\xC2\x83\xF3\xC7&Vr\xD6\xE6\x16\xD6S\xE2\x83\xF3\xA2U\xC5\xC6\x85\xC5\xC6\x87\xC5\xB2\x12Bb\xD2\xE3\x02\xD3\x93\xB3\xD4\x12\xD5\xA5\xF6\x12\xD7\xA7\xE5\xD2\x92\xA2\x92\x95\xC5\xC7\xA2\x92\"\xC2$\x14%5\xF5\x04\x15D\x82#\xA2\"\x80\xB5\xB5\xA5\xE0\xE9qq\x05qp\xBC\xA0\xFC\xE8\x95qq\xA1qq\xA1\xF1l\x84\x90\x98\xB4\xB8\xC0\xB4\xEC\xF5\x00\xB5i}\x84\xB5\xE9\xF9t\xA4\xA8\xA0:\\\\/(\x0E\x89W\x17\x1A\x17\x17\x1A\x1F\x16\xC8I\t\x8BK\x8C\vN\xCFP\vV\x97\xD8K^\x95\xD2\x92\xA2\x92\xA5\xC5\xC7\xA2\x92\"\xC2%$T\xC5\xF5\x04\x15D\x82#\xA2\"\x83\xF2\xD6\xD6\x97\x83\xA5\xC5\xC4\x12\x83\xF3\xA2U\xC5\xC6\x85\xC5\xC6\x87\xC5\xB2\x12Bb\xD2\xE3\x02\xD3\xB3\xD4\x02\xD5\xA5\xF6\x12\xD7\xA7\xE5\xD2\x92\xB2\x83\xF3\xA5\xC5\xC2\xF2\x83\xF3\xA2U\xC5\xC6\x85\xC5\xC6\x87\xC5\xB2\x12Bb\xD2\xE3\x02\xD3\xB3\xD4\x02\xD5\xA5\xF6\x12\xD7\xA7\xE5\xD2\x92\xA2\x92\xA5\xC5\xC7\xA2\x92\"\xC2%\x15TU%\x92#\xA2\"\x83\xF2\xD6\xD6\x97\x83\xA5\xC5\xC4\x12\x80\xE8\x95qq\xA1qq\xA1\xF1l\x84\x90\x98\xB4\xB8\xC0\xB4\xEC\xF5\x00\xB5i}\x84\xB5\xE9\\\\/?])*\\\\z)\",\"FRAGMENT\":\"(\v[Z^\x0E\x97\x17\x10J\x0F\xCE\x89W\x17\x1A\x17\x17\x1A\x1F\x16\xC8I\t\x8BK\x8C\vN\xCFP\vV\x97\xD8K^\x9F\x97\x17\v\xCF\xD7JJ\x97\x17\x1E\x8AH\x8B\b\x93\xD4\x10TUQH\x8E\x88\x8A\x0F\xCB[Z^\x0E\x97\x17\x10J\x03\xA5\xB5\xE5\xC5\xC2\xF5\xD2\xE2\xA2\x93\xF5\xC5\xC7\xA2\x92\"\xC2%\x04\xF5%B#\xA2\"\x80\xB5\xB5\xA5\xE0\xE9qq\x05mqq\xE0\xC0\xE5qq\xE0\xC1\x85qq\xE0\xC1\x8Dqq\xE0\xC1\x90\x81t\xA9qq\x90\xA9mqq\xE0\xC0\xE5qq\xE0\xC1\x85qq\xE0\xC1\x8Dqq\xE0\xC1\x90\x81t\xA9qq\xE8\xA4\x89\xF5\xF5\xF5"

Yikes! No wonder JSON parsing fails. Why all the binary data?

Because while the header is generated with URL-safe base 64 encoding here:

context = Base64.urlsafe_encode64(JSON.generate(trace.fields)).strip

It's not decoded URL-safe here:

Base64.decode64(value).tap do |json|

What happens when we decode in a URL-safe way?

Base64.urlsafe_decode64(context)
#=> "{\"uri\":{\"scheme\":\"http\",\"user\":null,\"password\":null,\"host\":\"honeycomb.io\",\"port\":80,\"path\":\"\",\"query\":null,\"opaque\":null,\"fragment\":null,\"parser\":{\"regexp\":{\"SCHEME\":\"(?-mix:\\\\A[A-Za-z][A-Za-z0-9+\\\\-.]*\\\\z)\",\"USERINFO\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=A-Z_a-z~])*\\\\z)\",\"HOST\":\"(?-mix:\\\\A(?:(?<IP-literal>\\\\[(?:(?<IPv6address>(?:\\\\h{1,4}:){6}(?<ls32>\\\\h{1,4}:\\\\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\\\\d|1\\\\d{2}|2[0-4]\\\\d|25[0-5]|\\\\d)\\\\.\\\\g<dec-octet>\\\\.\\\\g<dec-octet>\\\\.\\\\g<dec-octet>))|::(?:\\\\h{1,4}:){5}\\\\g<ls32>|\\\\h{,4}::(?:\\\\h{1,4}:){4}\\\\g<ls32>|(?:(?:\\\\h{1,4}:)?\\\\h{1,4})?::(?:\\\\h{1,4}:){3}\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,2}\\\\h{1,4})?::(?:\\\\h{1,4}:){2}\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,3}\\\\h{1,4})?::\\\\h{1,4}:\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,4}\\\\h{1,4})?::\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,5}\\\\h{1,4})?::\\\\h{1,4}|(?:(?:\\\\h{1,4}:){,6}\\\\h{1,4})?::)|(?<IPvFuture>v\\\\h+\\\\.[!$&-.0-;=A-Z_a-z~]+))\\\\])|\\\\g<IPv4address>|(?<reg-name>(?:%\\\\h\\\\h|[!$&-.0-9;=A-Z_a-z~])*))\\\\z)\",\"ABS_PATH\":\"(?-mix:\\\\A\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*(?:\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*)*\\\\z)\",\"REL_PATH\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])+(?:\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*)*\\\\z)\",\"QUERY\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~\\\\/?])*\\\\z)\",\"FRAGMENT\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~\\\\/?])*\\\\z)\",\"OPAQUE\":\"(?-mix:\\\\A(?:[^\\\\/].*)?\\\\z)\",\"PORT\":\"(?-mix:\\\\A[\\\\x09\\\\x0a\\\\x0c\\\\x0d ]*\\\\d*[\\\\x09\\\\x0a\\\\x0c\\\\x0d ]*\\\\z)\"}}}}"

There we go! That looks like it'll parse as JSON.

🔴 Problem: The Ruby beeline is inconsistent about URL-safe encoding.

The JSON problem

Indeed, it does parse as JSON. But as you can already tell, it's not as simple as the URI#to_s:

{
  "uri" => {
    "scheme" => "http",
    "user" => nil,
    "password" => nil,
    "host" => "honeycomb.io",
    "port" => 80,
    "path" => "",
    "query" => nil,
    "opaque" => nil,
    "fragment" => nil,
    "parser" => {
      "regexp" => {
        "SCHEME" => "(?-mix:\\A[A-Za-z][A-Za-z0-9+\\-.]*\\z)",
        "USERINFO" => "(?-mix:\\A(?:%\\h\\h|[!$&-.0-;=A-Z_a-z~])*\\z)",
        "HOST" => "(?-mix:\\A(?:(?<IP-literal>\\[(?:(?<IPv6address>(?:\\h{1,4}:){6}(?<ls32>\\h{1,4}:\\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\\d|1\\d{2}|2[0-4]\\d|25[0-5]|\\d)\\.\\g<dec-octet>\\.\\g<dec-octet>\\.\\g<dec-octet>))|::(?:\\h{1,4}:){5}\\g<ls32>|\\h{,4}::(?:\\h{1,4}:){4}\\g<ls32>|(?:(?:\\h{1,4}:)?\\h{1,4})?::(?:\\h{1,4}:){3}\\g<ls32>|(?:(?:\\h{1,4}:){,2}\\h{1,4})?::(?:\\h{1,4}:){2}\\g<ls32>|(?:(?:\\h{1,4}:){,3}\\h{1,4})?::\\h{1,4}:\\g<ls32>|(?:(?:\\h{1,4}:){,4}\\h{1,4})?::\\g<ls32>|(?:(?:\\h{1,4}:){,5}\\h{1,4})?::\\h{1,4}|(?:(?:\\h{1,4}:){,6}\\h{1,4})?::)|(?<IPvFuture>v\\h+\\.[!$&-.0-;=A-Z_a-z~]+))\\])|\\g<IPv4address>|(?<reg-name>(?:%\\h\\h|[!$&-.0-9;=A-Z_a-z~])*))\\z)",
        "ABS_PATH" => "(?-mix:\\A\\/(?:%\\h\\h|[!$&-.0-;=@-Z_a-z~])*(?:\\/(?:%\\h\\h|[!$&-.0-;=@-Z_a-z~])*)*\\z)",
        "REL_PATH" => "(?-mix:\\A(?:%\\h\\h|[!$&-.0-;=@-Z_a-z~])+(?:\\/(?:%\\h\\h|[!$&-.0-;=@-Z_a-z~])*)*\\z)",
        "QUERY" => "(?-mix:\\A(?:%\\h\\h|[!$&-.0-;=@-Z_a-z~\\/?])*\\z)",
        "FRAGMENT" => "(?-mix:\\A(?:%\\h\\h|[!$&-.0-;=@-Z_a-z~\\/?])*\\z)",
        "OPAQUE" => "(?-mix:\\A(?:[^\\/].*)?\\z)",
        "PORT" => "(?-mix:\\A[\\x09\\x0a\\x0c\\x0d ]*\\d*[\\x09\\x0a\\x0c\\x0d ]*\\z)"
      }
    }
  }
}

So how was I getting this structure in my traces?

If you just load up the stock Ruby JSON library, this isn't a problem.

require 'json'
JSON.generate(URI.parse('http://honeycomb.io'))
#=> "\"http://honeycomb.io\""

In my project, though, I was using both active_support and yajl.

Individually, either works as expected.

require 'yajl/json_gem'
JSON.generate(URI.parse('http://honeycomb.io'))
#=> "\"http://honeycomb.io\""
require 'active_support/json'
JSON.generate(URI.parse('http://honeycomb.io'))
#=> "\"http://honeycomb.io\""

But together, their interactions can lead to something...quite different. What's more, it's load order dependent!

If you load yajl before active_support, the behavior is OK:

require 'yajl/json_gem'
require 'active_support/json'
JSON.generate(URI.parse('http://honeycomb.io'))
#=> "\"http://honeycomb.io\""

But when active_support is loaded before yajl, we get to the cause of the gnarly URI expansion:

require 'active_support/json'
require 'yajl/json_gem'
JSON.generate(URI.parse('http://honeycomb.io'))
#=> "{\"scheme\":\"http\",\"user\":null,\"password\":null,\"host\":\"honeycomb.io\",\"port\":80,\"path\":\"\",\"query\":null,\"opaque\":null,\"fragment\":null,\"parser\":{\"regexp\":{\"SCHEME\":\"(?-mix:\\\\A[A-Za-z][A-Za-z0-9+\\\\-.]*\\\\z)\",\"USERINFO\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=A-Z_a-z~])*\\\\z)\",\"HOST\":\"(?-mix:\\\\A(?:(?<IP-literal>\\\\[(?:(?<IPv6address>(?:\\\\h{1,4}:){6}(?<ls32>\\\\h{1,4}:\\\\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\\\\d|1\\\\d{2}|2[0-4]\\\\d|25[0-5]|\\\\d)\\\\.\\\\g<dec-octet>\\\\.\\\\g<dec-octet>\\\\.\\\\g<dec-octet>))|::(?:\\\\h{1,4}:){5}\\\\g<ls32>|\\\\h{,4}::(?:\\\\h{1,4}:){4}\\\\g<ls32>|(?:(?:\\\\h{1,4}:)?\\\\h{1,4})?::(?:\\\\h{1,4}:){3}\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,2}\\\\h{1,4})?::(?:\\\\h{1,4}:){2}\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,3}\\\\h{1,4})?::\\\\h{1,4}:\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,4}\\\\h{1,4})?::\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,5}\\\\h{1,4})?::\\\\h{1,4}|(?:(?:\\\\h{1,4}:){,6}\\\\h{1,4})?::)|(?<IPvFuture>v\\\\h+\\\\.[!$&-.0-;=A-Z_a-z~]+))\\\\])|\\\\g<IPv4address>|(?<reg-name>(?:%\\\\h\\\\h|[!$&-.0-9;=A-Z_a-z~])*))\\\\z)\",\"ABS_PATH\":\"(?-mix:\\\\A\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*(?:\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*)*\\\\z)\",\"REL_PATH\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])+(?:\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*)*\\\\z)\",\"QUERY\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~\\\\/?])*\\\\z)\",\"FRAGMENT\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~\\\\/?])*\\\\z)\",\"OPAQUE\":\"(?-mix:\\\\A(?:[^\\\\/].*)?\\\\z)\",\"PORT\":\"(?-mix:\\\\A[\\\\x09\\\\x0a\\\\x0c\\\\x0d ]*\\\\d*[\\\\x09\\\\x0a\\\\x0c\\\\x0d ]*\\\\z)\"}}}"

🔴 Problem: Unrelated to Honeycomb, yajl's interaction with active_support leads to differences in JSON encoding that may cause the trace fields serialization to balloon in size.

The beeline-python problem

So what about when the header isn't too big? Why weren't the Python traces showing up nested?

Well, much like my local testing in the Ruby beeline, the URL-safe encoding breaks the URL-unsafe decoding on the Python end. The beeline parses the context on this line: https://github.com/honeycombio/beeline-python/blob/43a3f9b4f4e2844e7030b33f62695567332c73df/beeline/trace.py#L352

Feeding in the context from above in Python 2.7 (the old version I'm using in my service):

import base64
base64.b64decode(context.encode())

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/base64.py", line 78, in b64decode
    raise TypeError(msg)
TypeError: Incorrect padding

This error remains in even Python 3.x, because it's a semantic error (not a bug).

This gets swallowed by the beeline, resulting in a brand new trace: https://github.com/honeycombio/beeline-python/blob/43a3f9b4f4e2844e7030b33f62695567332c73df/beeline/middleware/flask/__init__.py#L19-L22

If we were to use the URL-safe variant, the context parsing would work:

base64.urlsafe_b64decode(context.encode())
#=> b'{"uri":{"scheme":"http","user":null,"password":null,"host":"honeycomb.io","port":80,"path":"","query":null,"opaque":null,"fragment":null,"parser":{"regexp":{"SCHEME":"(?-mix:\\\\A[A-Za-z][A-Za-z0-9+\\\\-.]*\\\\z)","USERINFO":"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=A-Z_a-z~])*\\\\z)","HOST":"(?-mix:\\\\A(?:(?<IP-literal>\\\\[(?:(?<IPv6address>(?:\\\\h{1,4}:){6}(?<ls32>\\\\h{1,4}:\\\\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\\\\d|1\\\\d{2}|2[0-4]\\\\d|25[0-5]|\\\\d)\\\\.\\\\g<dec-octet>\\\\.\\\\g<dec-octet>\\\\.\\\\g<dec-octet>))|::(?:\\\\h{1,4}:){5}\\\\g<ls32>|\\\\h{,4}::(?:\\\\h{1,4}:){4}\\\\g<ls32>|(?:(?:\\\\h{1,4}:)?\\\\h{1,4})?::(?:\\\\h{1,4}:){3}\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,2}\\\\h{1,4})?::(?:\\\\h{1,4}:){2}\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,3}\\\\h{1,4})?::\\\\h{1,4}:\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,4}\\\\h{1,4})?::\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,5}\\\\h{1,4})?::\\\\h{1,4}|(?:(?:\\\\h{1,4}:){,6}\\\\h{1,4})?::)|(?<IPvFuture>v\\\\h+\\\\.[!$&-.0-;=A-Z_a-z~]+))\\\\])|\\\\g<IPv4address>|(?<reg-name>(?:%\\\\h\\\\h|[!$&-.0-9;=A-Z_a-z~])*))\\\\z)","ABS_PATH":"(?-mix:\\\\A\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*(?:\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*)*\\\\z)","REL_PATH":"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])+(?:\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*)*\\\\z)","QUERY":"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~\\\\/?])*\\\\z)","FRAGMENT":"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~\\\\/?])*\\\\z)","OPAQUE":"(?-mix:\\\\A(?:[^\\\\/].*)?\\\\z)","PORT":"(?-mix:\\\\A[\\\\x09\\\\x0a\\\\x0c\\\\x0d ]*\\\\d*[\\\\x09\\\\x0a\\\\x0c\\\\x0d ]*\\\\z)"}}}}'

🔴 Problem: The Honeycomb beelines are sometimes inconsistent when it comes to interoperability. E.g., honeycombio/beeline-python#113. This makes distributed tracing kind of fraught.

Outro

This was quite an issue to hunt down! For now, I can skirt it by simply URI#to_s-ing the trace-level field before adding it. But I thought it touched on a lot of important considerations for the beelines.

@vreynolds
Copy link
Contributor

Hello,

We will be closing this issue as it is a low priority for us. It is unlikely that we'll ever get to it, and so we'd like to set expectations accordingly.

As we enter 2022 Q1, we are trimming our OSS backlog. This is so that we can focus better on areas that are more aligned with the OpenTelemetry-focused direction of telemetry ingest for Honeycomb.

If this issue is important to you, please feel free to ping here and we can discuss/re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants