Header propagation problems #97

ajvondrak · 2020-06-13T22:33:10Z

Background

I have two services instrumented in production: a Ruby service that talks to a Python service. I just spent a bunch of time debugging two issues I was getting with my distributed traces (which were actually kind of one issue), and it's a bit of a long story. For want of a better place to share the story and the questions left in its wake, I'm opening the issue here. So, strap in kids, we're going for a ride...

The whole crux of my bug is that I add a trace-level field whose value happened to be a URI object.

uri = URI.parse('http://honeycomb.io')
Honeycomb.add_field_to_trace('uri', uri)

How this breaks is pretty circuitous.

When I only had the Ruby service instrumented, I didn't notice this was a URI instead of a string. In the Honeycomb UI it showed up just fine, but that's because Libhoney::Cleaner calls URI#to_s automatically before sending the event: https://github.com/honeycombio/libhoney-rb/blob/4457c71b311f340967cc138f6e3afcf8d7f9c0f2/lib/libhoney/cleaner.rb#L41-L46

When I added the Python service, I saw two types of errors:

When the Python traces were present, they weren't nested in the Ruby spans as expected.
Sometimes, the request from the Ruby service to the Python service would fail with an HTTP 400.

Both of these smelled like trace header issues. So I added the X-Honeycomb-Trace header value and the HTTP 400 response body to the Ruby spans.

Turns out the HTTP 400s were from nginx rejecting the requests with

<html>
<head><title>400 Request Header Or Cookie Too Large</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<center>Request Header Or Cookie Too Large</center>
</body>
</html>

Okay, I have a healthy amount of trace fields, but I didn't think I had that many. So I grab the header value from the span and go about trying to parse it. Just how much data is getting sent?

The beeline-ruby problem

I'm more familiar with the Ruby beeline, so I chose to use it to parse the header in my local console. Using a constructed example for the sake of this write-up (now that I know what the actual issue is), let's suppose the trace header looked like this:

1;dataset=dataset,trace_id=trace_id,parent_id=parent_id,context=eyJ1cmkiOnsic2NoZW1lIjoiaHR0cCIsInVzZXIiOm51bGwsInBhc3N3b3JkIjpudWxsLCJob3N0IjoiaG9uZXljb21iLmlvIiwicG9ydCI6ODAsInBhdGgiOiIiLCJxdWVyeSI6bnVsbCwib3BhcXVlIjpudWxsLCJmcmFnbWVudCI6bnVsbCwicGFyc2VyIjp7InJlZ2V4cCI6eyJTQ0hFTUUiOiIoPy1taXg6XFxBW0EtWmEtel1bQS1aYS16MC05K1xcLS5dKlxceikiLCJVU0VSSU5GTyI6Iig_LW1peDpcXEEoPzolXFxoXFxofFshJCYtLjAtOz1BLVpfYS16fl0pKlxceikiLCJIT1NUIjoiKD8tbWl4OlxcQSg_Oig_PElQLWxpdGVyYWw-XFxbKD86KD88SVB2NmFkZHJlc3M-KD86XFxoezEsNH06KXs2fSg_PGxzMzI-XFxoezEsNH06XFxoezEsNH18KD88SVB2NGFkZHJlc3M-KD88ZGVjLW9jdGV0PlsxLTldXFxkfDFcXGR7Mn18MlswLTRdXFxkfDI1WzAtNV18XFxkKVxcLlxcZzxkZWMtb2N0ZXQ-XFwuXFxnPGRlYy1vY3RldD5cXC5cXGc8ZGVjLW9jdGV0PikpfDo6KD86XFxoezEsNH06KXs1fVxcZzxsczMyPnxcXGh7LDR9OjooPzpcXGh7MSw0fTopezR9XFxnPGxzMzI-fCg_Oig_OlxcaHsxLDR9Oik_XFxoezEsNH0pPzo6KD86XFxoezEsNH06KXszfVxcZzxsczMyPnwoPzooPzpcXGh7MSw0fTopeywyfVxcaHsxLDR9KT86Oig_OlxcaHsxLDR9Oil7Mn1cXGc8bHMzMj58KD86KD86XFxoezEsNH06KXssM31cXGh7MSw0fSk_OjpcXGh7MSw0fTpcXGc8bHMzMj58KD86KD86XFxoezEsNH06KXssNH1cXGh7MSw0fSk_OjpcXGc8bHMzMj58KD86KD86XFxoezEsNH06KXssNX1cXGh7MSw0fSk_OjpcXGh7MSw0fXwoPzooPzpcXGh7MSw0fTopeyw2fVxcaHsxLDR9KT86Oil8KD88SVB2RnV0dXJlPnZcXGgrXFwuWyEkJi0uMC07PUEtWl9hLXp-XSspKVxcXSl8XFxnPElQdjRhZGRyZXNzPnwoPzxyZWctbmFtZT4oPzolXFxoXFxofFshJCYtLjAtOTs9QS1aX2Eten5dKSopKVxceikiLCJBQlNfUEFUSCI6Iig_LW1peDpcXEFcXC8oPzolXFxoXFxofFshJCYtLjAtOz1ALVpfYS16fl0pKig_OlxcLyg_OiVcXGhcXGh8WyEkJi0uMC07PUAtWl9hLXp-XSkqKSpcXHopIiwiUkVMX1BBVEgiOiIoPy1taXg6XFxBKD86JVxcaFxcaHxbISQmLS4wLTs9QC1aX2Eten5dKSsoPzpcXC8oPzolXFxoXFxofFshJCYtLjAtOz1ALVpfYS16fl0pKikqXFx6KSIsIlFVRVJZIjoiKD8tbWl4OlxcQSg_OiVcXGhcXGh8WyEkJi0uMC07PUAtWl9hLXp-XFwvP10pKlxceikiLCJGUkFHTUVOVCI6Iig_LW1peDpcXEEoPzolXFxoXFxofFshJCYtLjAtOz1ALVpfYS16flxcLz9dKSpcXHopIiwiT1BBUVVFIjoiKD8tbWl4OlxcQSg_OlteXFwvXS4qKT9cXHopIiwiUE9SVCI6Iig_LW1peDpcXEFbXFx4MDlcXHgwYVxceDBjXFx4MGQgXSpcXGQqW1xceDA5XFx4MGFcXHgwY1xceDBkIF0qXFx6KSJ9fX19

require 'honeycomb-beeline'
include Honeycomb::PropagationParser
parse(trace)
#=> ["trace_id", "parent_id", {}, "dataset"]

The context parsing apparently fails to parse the JSON in this block:

beeline-ruby/lib/honeycomb/propagation.rb

Lines 38 to 44 in 45587a1

    
           Base64.decode64(value).tap do |json| 
        
             begin 
        
               trace_fields = JSON.parse json 
        
             rescue JSON::ParserError 
        
               trace_fields = {} 
        
             end 
        
           end

That being the case, what does the base 64 decoded value look like?

Base64.decode64(context)
#=> "{\"uri\":{\"scheme\":\"http\",\"user\":null,\"password\":null,\"host\":\"honeycomb.io\",\"port\":80,\"path\":\"\",\"query\":null,\"opaque\":null,\"fragment\":null,\"parser\":{\"regexp\":{\"SCHEME\":\"(?-mix:\\\\A[A-Za-z][A-Za-z0-9+\\\\-.]*\\\\z)\",\"USERINFO\":\"(\v[Z^\x0E\x97\x17\x10J\x0F\xCE\x89W\x17\x1A\x17\x17\x1A\x1F\x16\xC8I\t\x8BK\x8C\vN\xCFPKV\x97\xD8K^\x9F\x97JJ\x97\x17\x1E\x8AH\x8B\b\x92\x13\xD4\xD5\b\x8E\x88\x8A\x0F\xCB[Z^\x0E\x97\x17\x10J\x03\xA2\x80\xF1%@\xB5\xB1\xA5\xD1\x95\xC9\x85\xB0\\\\[(?:(?<IPv6address\n\x0F\xCE\x97\x17\x1A\x1E\xCCK\r\x1FN\x8A^\xCD\x9FJ\x03\xC6\xC733!qq\xA1\xEC\xC4\xB0\xD1\xF4\xE9qq\xA1\xEC\xC4\xB0\xD1\xF5\xF0\xA0\xFC\xF1%A\xD8\xD1\x85\x91\x91\xC9\x95\xCD\xCC(?<dec-octet>[1-9]\\\\d|1\\\\d{2}|2[0-4]\\\\d|25[0-5]|\\\\d)\\\\.\\\\g<dec-octet\x17\x17\v\x97\x17\x19\xCF\x19\x19X\xCB[\xD8\xDD\x19]\x0F\x97\x17\v\x97\x17\x19\xCF\x19\x19X\xCB[\xD8\xDD\x19]\x0F\x8AJ_\x0E\x8E\x8A\x0F\xCE\x97\x17\x1A\x1E\xCCK\r\x1FN\x8A^\xCD_W\x17\x19\xCF\e\x1C\xCC\xCC\x8F\x9F\x17\x17\x1A\x1E\xCB\r\x1FN\x8E\x8A\x0F\xCE\x97\x17\x1A\x1E\xCCK\r\x1FN\x8A^\xCD\x1FW\x17\x19\xCF\e\x1C\xCC\xCC\x87\xC2\x80\xE8\xA0:\\\\h{1,4}:)\x17\x17\x1A\x1E\xCCK\r\x1FJO\xCE\x8E\x8A\x0F\xCE\x97\x17\x1A\x1E\xCCK\r\x1FN\x8A^\xCC\xDFW\x17\x19\xCF\e\x1C\xCC\xCC\x8F\x9F\n\x0F\xCE\x8A\x0F\xCE\x97\x17\x1A\x1E\xCCK\r\x1FN\x8A^\xCB\f\x9FW\x17\x1A\x1E\xCCK\r\x1FJO\xCE\x8E\x8A\x03\xA5\xC5\xC6\x87\xB3\x12\xC3G\xD3\xA2\x97\xB3'\xD5\xC5\xC6s\xC6\xC733#\xE7\xC2\x83\xF3\xA2\x83\xF3\xA5\xC5\xC6\x87\xB3\x12\xC3G\xD3\xA2\x97\xB2\xC37\xD5\xC5\xC6\x87\xB3\x12\xC3G\xD2\x90\xE8\xE9qq\xA1\xEC\xC4\xB0\xD1\xF4\xE9qq\x9C\xF1\xB1\xCC\xCC\xC8\xF9\xF0\xA0\xFC\xE8\xA0\xFC\xE9qq\xA1\xEC\xC4\xB0\xD1\xF4\xE8\xA5\xEC\xB0\xD1\xF5qq\xA1\xEC\xC4\xB0\xD1\xF4\xA4::\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,5}\\\\h{1,4})\x0E\x8E\x97\x17\x1A\x1E\xCCK\r\x1F_\n\x0F\xCE\x8A\x0F\xCE\x97\x17\x1A\x1E\xCCK\r\x1FN\x8A^\xCB\r\x9FW\x17\x1A\x1E\xCCK\r\x1FJO\xCE\x8E\x8A_\n\x0F\xCF\x12T\x1D\x91\x9D]\x1D\\\x99O\x9D\x97\x17\x1A\n\xD7\x17\v\x96\xC8I\t\x8BK\x8C\vN\xCFPKV\x97\xD8K^\x95\xD2\xB2\x92\x95\xC5\xC5\xD2\x97\xC5\xC5\xC6s\xC4\x95\acF\x16FG&W73\xE7\xC2\x83\xF3\xC7&Vr\xD6\xE6\x16\xD6S\xE2\x83\xF3\xA2U\xC5\xC6\x85\xC5\xC6\x87\xC5\xB2\x12Bb\xD2\xE3\x02\xD3\x93\xB3\xD4\x12\xD5\xA5\xF6\x12\xD7\xA7\xE5\xD2\x92\xA2\x92\x95\xC5\xC7\xA2\x92\"\xC2$\x14%5\xF5\x04\x15D\x82#\xA2\"\x80\xB5\xB5\xA5\xE0\xE9qq\x05qp\xBC\xA0\xFC\xE8\x95qq\xA1qq\xA1\xF1l\x84\x90\x98\xB4\xB8\xC0\xB4\xEC\xF5\x00\xB5i}\x84\xB5\xE9\xF9t\xA4\xA8\xA0:\\\\/(\x0E\x89W\x17\x1A\x17\x17\x1A\x1F\x16\xC8I\t\x8BK\x8C\vN\xCFP\vV\x97\xD8K^\x95\xD2\x92\xA2\x92\xA5\xC5\xC7\xA2\x92\"\xC2%$T\xC5\xF5\x04\x15D\x82#\xA2\"\x83\xF2\xD6\xD6\x97\x83\xA5\xC5\xC4\x12\x83\xF3\xA2U\xC5\xC6\x85\xC5\xC6\x87\xC5\xB2\x12Bb\xD2\xE3\x02\xD3\xB3\xD4\x02\xD5\xA5\xF6\x12\xD7\xA7\xE5\xD2\x92\xB2\x83\xF3\xA5\xC5\xC2\xF2\x83\xF3\xA2U\xC5\xC6\x85\xC5\xC6\x87\xC5\xB2\x12Bb\xD2\xE3\x02\xD3\xB3\xD4\x02\xD5\xA5\xF6\x12\xD7\xA7\xE5\xD2\x92\xA2\x92\xA5\xC5\xC7\xA2\x92\"\xC2%\x15TU%\x92#\xA2\"\x83\xF2\xD6\xD6\x97\x83\xA5\xC5\xC4\x12\x80\xE8\x95qq\xA1qq\xA1\xF1l\x84\x90\x98\xB4\xB8\xC0\xB4\xEC\xF5\x00\xB5i}\x84\xB5\xE9\\\\/?])*\\\\z)\",\"FRAGMENT\":\"(\v[Z^\x0E\x97\x17\x10J\x0F\xCE\x89W\x17\x1A\x17\x17\x1A\x1F\x16\xC8I\t\x8BK\x8C\vN\xCFP\vV\x97\xD8K^\x9F\x97\x17\v\xCF\xD7JJ\x97\x17\x1E\x8AH\x8B\b\x93\xD4\x10TUQH\x8E\x88\x8A\x0F\xCB[Z^\x0E\x97\x17\x10J\x03\xA5\xB5\xE5\xC5\xC2\xF5\xD2\xE2\xA2\x93\xF5\xC5\xC7\xA2\x92\"\xC2%\x04\xF5%B#\xA2\"\x80\xB5\xB5\xA5\xE0\xE9qq\x05mqq\xE0\xC0\xE5qq\xE0\xC1\x85qq\xE0\xC1\x8Dqq\xE0\xC1\x90\x81t\xA9qq\x90\xA9mqq\xE0\xC0\xE5qq\xE0\xC1\x85qq\xE0\xC1\x8Dqq\xE0\xC1\x90\x81t\xA9qq\xE8\xA4\x89\xF5\xF5\xF5"

Yikes! No wonder JSON parsing fails. Why all the binary data?

Because while the header is generated with URL-safe base 64 encoding here:

beeline-ruby/lib/honeycomb/propagation.rb

Line 55 in 45587a1

context = Base64.urlsafe_encode64(JSON.generate(trace.fields)).strip

It's not decoded URL-safe here:

beeline-ruby/lib/honeycomb/propagation.rb

Line 38 in 45587a1

Base64.decode64(value).tap do |json|

What happens when we decode in a URL-safe way?

Base64.urlsafe_decode64(context)
#=> "{\"uri\":{\"scheme\":\"http\",\"user\":null,\"password\":null,\"host\":\"honeycomb.io\",\"port\":80,\"path\":\"\",\"query\":null,\"opaque\":null,\"fragment\":null,\"parser\":{\"regexp\":{\"SCHEME\":\"(?-mix:\\\\A[A-Za-z][A-Za-z0-9+\\\\-.]*\\\\z)\",\"USERINFO\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=A-Z_a-z~])*\\\\z)\",\"HOST\":\"(?-mix:\\\\A(?:(?<IP-literal>\\\\[(?:(?<IPv6address>(?:\\\\h{1,4}:){6}(?<ls32>\\\\h{1,4}:\\\\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\\\\d|1\\\\d{2}|2[0-4]\\\\d|25[0-5]|\\\\d)\\\\.\\\\g<dec-octet>\\\\.\\\\g<dec-octet>\\\\.\\\\g<dec-octet>))|::(?:\\\\h{1,4}:){5}\\\\g<ls32>|\\\\h{,4}::(?:\\\\h{1,4}:){4}\\\\g<ls32>|(?:(?:\\\\h{1,4}:)?\\\\h{1,4})?::(?:\\\\h{1,4}:){3}\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,2}\\\\h{1,4})?::(?:\\\\h{1,4}:){2}\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,3}\\\\h{1,4})?::\\\\h{1,4}:\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,4}\\\\h{1,4})?::\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,5}\\\\h{1,4})?::\\\\h{1,4}|(?:(?:\\\\h{1,4}:){,6}\\\\h{1,4})?::)|(?<IPvFuture>v\\\\h+\\\\.[!$&-.0-;=A-Z_a-z~]+))\\\\])|\\\\g<IPv4address>|(?<reg-name>(?:%\\\\h\\\\h|[!$&-.0-9;=A-Z_a-z~])*))\\\\z)\",\"ABS_PATH\":\"(?-mix:\\\\A\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*(?:\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*)*\\\\z)\",\"REL_PATH\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])+(?:\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*)*\\\\z)\",\"QUERY\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~\\\\/?])*\\\\z)\",\"FRAGMENT\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~\\\\/?])*\\\\z)\",\"OPAQUE\":\"(?-mix:\\\\A(?:[^\\\\/].*)?\\\\z)\",\"PORT\":\"(?-mix:\\\\A[\\\\x09\\\\x0a\\\\x0c\\\\x0d ]*\\\\d*[\\\\x09\\\\x0a\\\\x0c\\\\x0d ]*\\\\z)\"}}}}"

There we go! That looks like it'll parse as JSON.

🔴 Problem: The Ruby beeline is inconsistent about URL-safe encoding.

The JSON problem

Indeed, it does parse as JSON. But as you can already tell, it's not as simple as the URI#to_s:

{
  "uri" => {
    "scheme" => "http",
    "user" => nil,
    "password" => nil,
    "host" => "honeycomb.io",
    "port" => 80,
    "path" => "",
    "query" => nil,
    "opaque" => nil,
    "fragment" => nil,
    "parser" => {
      "regexp" => {
        "SCHEME" => "(?-mix:\\A[A-Za-z][A-Za-z0-9+\\-.]*\\z)",
        "USERINFO" => "(?-mix:\\A(?:%\\h\\h|[!$&-.0-;=A-Z_a-z~])*\\z)",
        "HOST" => "(?-mix:\\A(?:(?<IP-literal>\\[(?:(?<IPv6address>(?:\\h{1,4}:){6}(?<ls32>\\h{1,4}:\\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\\d|1\\d{2}|2[0-4]\\d|25[0-5]|\\d)\\.\\g<dec-octet>\\.\\g<dec-octet>\\.\\g<dec-octet>))|::(?:\\h{1,4}:){5}\\g<ls32>|\\h{,4}::(?:\\h{1,4}:){4}\\g<ls32>|(?:(?:\\h{1,4}:)?\\h{1,4})?::(?:\\h{1,4}:){3}\\g<ls32>|(?:(?:\\h{1,4}:){,2}\\h{1,4})?::(?:\\h{1,4}:){2}\\g<ls32>|(?:(?:\\h{1,4}:){,3}\\h{1,4})?::\\h{1,4}:\\g<ls32>|(?:(?:\\h{1,4}:){,4}\\h{1,4})?::\\g<ls32>|(?:(?:\\h{1,4}:){,5}\\h{1,4})?::\\h{1,4}|(?:(?:\\h{1,4}:){,6}\\h{1,4})?::)|(?<IPvFuture>v\\h+\\.[!$&-.0-;=A-Z_a-z~]+))\\])|\\g<IPv4address>|(?<reg-name>(?:%\\h\\h|[!$&-.0-9;=A-Z_a-z~])*))\\z)",
        "ABS_PATH" => "(?-mix:\\A\\/(?:%\\h\\h|[!$&-.0-;=@-Z_a-z~])*(?:\\/(?:%\\h\\h|[!$&-.0-;=@-Z_a-z~])*)*\\z)",
        "REL_PATH" => "(?-mix:\\A(?:%\\h\\h|[!$&-.0-;=@-Z_a-z~])+(?:\\/(?:%\\h\\h|[!$&-.0-;=@-Z_a-z~])*)*\\z)",
        "QUERY" => "(?-mix:\\A(?:%\\h\\h|[!$&-.0-;=@-Z_a-z~\\/?])*\\z)",
        "FRAGMENT" => "(?-mix:\\A(?:%\\h\\h|[!$&-.0-;=@-Z_a-z~\\/?])*\\z)",
        "OPAQUE" => "(?-mix:\\A(?:[^\\/].*)?\\z)",
        "PORT" => "(?-mix:\\A[\\x09\\x0a\\x0c\\x0d ]*\\d*[\\x09\\x0a\\x0c\\x0d ]*\\z)"
      }
    }
  }
}

So how was I getting this structure in my traces?

If you just load up the stock Ruby JSON library, this isn't a problem.

require 'json'
JSON.generate(URI.parse('http://honeycomb.io'))
#=> "\"http://honeycomb.io\""

In my project, though, I was using both active_support and yajl.

Individually, either works as expected.

require 'yajl/json_gem'
JSON.generate(URI.parse('http://honeycomb.io'))
#=> "\"http://honeycomb.io\""

require 'active_support/json'
JSON.generate(URI.parse('http://honeycomb.io'))
#=> "\"http://honeycomb.io\""

But together, their interactions can lead to something...quite different. What's more, it's load order dependent!

If you load yajl before active_support, the behavior is OK:

require 'yajl/json_gem'
require 'active_support/json'
JSON.generate(URI.parse('http://honeycomb.io'))
#=> "\"http://honeycomb.io\""

But when active_support is loaded before yajl, we get to the cause of the gnarly URI expansion:

require 'active_support/json'
require 'yajl/json_gem'
JSON.generate(URI.parse('http://honeycomb.io'))
#=> "{\"scheme\":\"http\",\"user\":null,\"password\":null,\"host\":\"honeycomb.io\",\"port\":80,\"path\":\"\",\"query\":null,\"opaque\":null,\"fragment\":null,\"parser\":{\"regexp\":{\"SCHEME\":\"(?-mix:\\\\A[A-Za-z][A-Za-z0-9+\\\\-.]*\\\\z)\",\"USERINFO\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=A-Z_a-z~])*\\\\z)\",\"HOST\":\"(?-mix:\\\\A(?:(?<IP-literal>\\\\[(?:(?<IPv6address>(?:\\\\h{1,4}:){6}(?<ls32>\\\\h{1,4}:\\\\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\\\\d|1\\\\d{2}|2[0-4]\\\\d|25[0-5]|\\\\d)\\\\.\\\\g<dec-octet>\\\\.\\\\g<dec-octet>\\\\.\\\\g<dec-octet>))|::(?:\\\\h{1,4}:){5}\\\\g<ls32>|\\\\h{,4}::(?:\\\\h{1,4}:){4}\\\\g<ls32>|(?:(?:\\\\h{1,4}:)?\\\\h{1,4})?::(?:\\\\h{1,4}:){3}\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,2}\\\\h{1,4})?::(?:\\\\h{1,4}:){2}\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,3}\\\\h{1,4})?::\\\\h{1,4}:\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,4}\\\\h{1,4})?::\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,5}\\\\h{1,4})?::\\\\h{1,4}|(?:(?:\\\\h{1,4}:){,6}\\\\h{1,4})?::)|(?<IPvFuture>v\\\\h+\\\\.[!$&-.0-;=A-Z_a-z~]+))\\\\])|\\\\g<IPv4address>|(?<reg-name>(?:%\\\\h\\\\h|[!$&-.0-9;=A-Z_a-z~])*))\\\\z)\",\"ABS_PATH\":\"(?-mix:\\\\A\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*(?:\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*)*\\\\z)\",\"REL_PATH\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])+(?:\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*)*\\\\z)\",\"QUERY\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~\\\\/?])*\\\\z)\",\"FRAGMENT\":\"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~\\\\/?])*\\\\z)\",\"OPAQUE\":\"(?-mix:\\\\A(?:[^\\\\/].*)?\\\\z)\",\"PORT\":\"(?-mix:\\\\A[\\\\x09\\\\x0a\\\\x0c\\\\x0d ]*\\\\d*[\\\\x09\\\\x0a\\\\x0c\\\\x0d ]*\\\\z)\"}}}"

🔴 Problem: Unrelated to Honeycomb, yajl's interaction with active_support leads to differences in JSON encoding that may cause the trace fields serialization to balloon in size.

The beeline-python problem

So what about when the header isn't too big? Why weren't the Python traces showing up nested?

Well, much like my local testing in the Ruby beeline, the URL-safe encoding breaks the URL-unsafe decoding on the Python end. The beeline parses the context on this line: https://github.com/honeycombio/beeline-python/blob/43a3f9b4f4e2844e7030b33f62695567332c73df/beeline/trace.py#L352

Feeding in the context from above in Python 2.7 (the old version I'm using in my service):

import base64
base64.b64decode(context.encode())

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/base64.py", line 78, in b64decode
    raise TypeError(msg)
TypeError: Incorrect padding

This error remains in even Python 3.x, because it's a semantic error (not a bug).

This gets swallowed by the beeline, resulting in a brand new trace: https://github.com/honeycombio/beeline-python/blob/43a3f9b4f4e2844e7030b33f62695567332c73df/beeline/middleware/flask/__init__.py#L19-L22

If we were to use the URL-safe variant, the context parsing would work:

base64.urlsafe_b64decode(context.encode())
#=> b'{"uri":{"scheme":"http","user":null,"password":null,"host":"honeycomb.io","port":80,"path":"","query":null,"opaque":null,"fragment":null,"parser":{"regexp":{"SCHEME":"(?-mix:\\\\A[A-Za-z][A-Za-z0-9+\\\\-.]*\\\\z)","USERINFO":"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=A-Z_a-z~])*\\\\z)","HOST":"(?-mix:\\\\A(?:(?<IP-literal>\\\\[(?:(?<IPv6address>(?:\\\\h{1,4}:){6}(?<ls32>\\\\h{1,4}:\\\\h{1,4}|(?<IPv4address>(?<dec-octet>[1-9]\\\\d|1\\\\d{2}|2[0-4]\\\\d|25[0-5]|\\\\d)\\\\.\\\\g<dec-octet>\\\\.\\\\g<dec-octet>\\\\.\\\\g<dec-octet>))|::(?:\\\\h{1,4}:){5}\\\\g<ls32>|\\\\h{,4}::(?:\\\\h{1,4}:){4}\\\\g<ls32>|(?:(?:\\\\h{1,4}:)?\\\\h{1,4})?::(?:\\\\h{1,4}:){3}\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,2}\\\\h{1,4})?::(?:\\\\h{1,4}:){2}\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,3}\\\\h{1,4})?::\\\\h{1,4}:\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,4}\\\\h{1,4})?::\\\\g<ls32>|(?:(?:\\\\h{1,4}:){,5}\\\\h{1,4})?::\\\\h{1,4}|(?:(?:\\\\h{1,4}:){,6}\\\\h{1,4})?::)|(?<IPvFuture>v\\\\h+\\\\.[!$&-.0-;=A-Z_a-z~]+))\\\\])|\\\\g<IPv4address>|(?<reg-name>(?:%\\\\h\\\\h|[!$&-.0-9;=A-Z_a-z~])*))\\\\z)","ABS_PATH":"(?-mix:\\\\A\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*(?:\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*)*\\\\z)","REL_PATH":"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])+(?:\\\\/(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~])*)*\\\\z)","QUERY":"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~\\\\/?])*\\\\z)","FRAGMENT":"(?-mix:\\\\A(?:%\\\\h\\\\h|[!$&-.0-;=@-Z_a-z~\\\\/?])*\\\\z)","OPAQUE":"(?-mix:\\\\A(?:[^\\\\/].*)?\\\\z)","PORT":"(?-mix:\\\\A[\\\\x09\\\\x0a\\\\x0c\\\\x0d ]*\\\\d*[\\\\x09\\\\x0a\\\\x0c\\\\x0d ]*\\\\z)"}}}}'

🔴 Problem: The Honeycomb beelines are sometimes inconsistent when it comes to interoperability. E.g., honeycombio/beeline-python#113. This makes distributed tracing kind of fraught.

Outro

This was quite an issue to hunt down! For now, I can skirt it by simply URI#to_s-ing the trace-level field before adding it. But I thought it touched on a lot of important considerations for the beelines.

The text was updated successfully, but these errors were encountered:

vreynolds · 2021-12-27T20:49:11Z

Hello,

We will be closing this issue as it is a low priority for us. It is unlikely that we'll ever get to it, and so we'd like to set expectations accordingly.

As we enter 2022 Q1, we are trimming our OSS backlog. This is so that we can focus better on areas that are more aligned with the OpenTelemetry-focused direction of telemetry ingest for Honeycomb.

If this issue is important to you, please feel free to ping here and we can discuss/re-open.

vreynolds closed this as completed Dec 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Header propagation problems #97

Header propagation problems #97

ajvondrak commented Jun 13, 2020

vreynolds commented Dec 27, 2021

Header propagation problems #97

Header propagation problems #97

Comments

ajvondrak commented Jun 13, 2020

Background

The beeline-ruby problem

The JSON problem

The beeline-python problem

Outro

vreynolds commented Dec 27, 2021