Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a maximum of sub-spans? #109

Closed
maiermic opened this issue Sep 21, 2021 · 16 comments
Closed

Is there a maximum of sub-spans? #109

maiermic opened this issue Sep 21, 2021 · 16 comments

Comments

@maiermic
Copy link

maiermic commented Sep 21, 2021

Is there a (intended) maximum of sub-spans? I'm creating a larger trace of a request with several SQL statements (a span for each). It works up to a total of 59 spans (depth 4), where one span has 50 sub-spans (children). If I add more than 50 sub-spans, the trace is not found in Jaeger UI and the log of jaegertracing/all-in-one:1.26 shows two errors

{
  "level": "error",
  "ts": 1632257043.8104687,
  "caller": "processors/thrift_processor.go:123",
  "msg": "Processor failed",
  "error": "*jaeger.Batch error reading struct: *jaeger.Span error reading struct: *jaeger.Tag error reading struct: *jaeger.Tag field 0 read error: EOF",
  "stacktrace": "github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).processBuffer\n\tgithub.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:123\ngithub.com/jaegertracing/jaeger/cmd/agent/app/processors.NewThriftProcessor.func2\n\tgithub.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:87"
}
{
  "level": "error",
  "ts": 1632257043.824714,
  "caller": "processors/thrift_processor.go:123",
  "msg": "Processor failed",
  "error": "EOF",
  "stacktrace": "github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).processBuffer\n\tgithub.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:123\ngithub.com/jaegertracing/jaeger/cmd/agent/app/processors.NewThriftProcessor.func2\n\tgithub.com/jaegertracing/jaeger/cmd/agent/app/processors/thrift_processor.go:87"
}

jonahgeorge/jaeger-client-php version 1.2.4. My configuration:

$config = new \Jaeger\Config(
  [
    'sampler' => [
      'type' => \Jaeger\SAMPLER_TYPE_CONST,
      'param' => true,
    ],
    'logging' => true,
    "tags" => [
      "foo" => "bar",
    ],
    "local_agent" => [
      "reporting_host" => "jaeger",
    ],
    'dispatch_mode' => \Jaeger\Config::JAEGER_OVER_BINARY_UDP,
  ],
  'my-app',
);
@sergeyklay
Copy link
Collaborator

Hmm, this is very interesting ... AFAIK, there is no such limitation. At least we use hundreds of spans in our projects.

@maiermic
Copy link
Author

It seems to be caused by the size of the trace. This example causes the issue:

<?php

// https://github.com/jonahgeorge/jaeger-client-php/issues/109

const NESTED_SPAN_COUNT = 100;
const NESTED_SPAN_TAG_VALUE_BYTE_COUNT = 400;

require_once __DIR__.'/../lib/vendor/autoload.php';

use OpenTracing\GlobalTracer;

try {
  $config = new \Jaeger\Config(
    [
      'sampler' => [
        'type' => \Jaeger\SAMPLER_TYPE_CONST,
        'param' => true,
      ],
      'logging' => true,
      "tags" => [
        "foo" => "bar",
      ],
      "local_agent" => [
        "reporting_host" => "jaeger",
      ],
      'dispatch_mode' => \Jaeger\Config::JAEGER_OVER_BINARY_UDP,
    ],
    'my-app',
  );  $config->initializeTracer();
} catch (Throwable $e) {
  error_log("ERROR could not initialize tracer: $e");
}


$tracer = GlobalTracer::get();

$scope = $tracer->startActiveSpan('main', []);

for ($i = 0; $i < NESTED_SPAN_COUNT; $i++) {
  $nestedSpanScope = $tracer->startActiveSpan("nested");
  $nestedSpanScope->getSpan()
    ->setTag('key', bin2hex(random_bytes(NESTED_SPAN_TAG_VALUE_BYTE_COUNT)));
  usleep(1000);
  $nestedSpanScope->close();
}

$scope->close();
$tracer->flush();

$spanContext = $scope->getSpan()->getContext();
if (\is_a($spanContext, \Jaeger\SpanContext::class)) {
  $traceId = dechex($spanContext->getTraceId());
  error_log("TRACE_ID={$traceId}");
}

I don't get this issue if I either set

  • NESTED_SPAN_COUNT = 10 or
  • NESTED_SPAN_TAG_VALUE_BYTE_COUNT = 200

I guess, the UDP package size may be the issue. Any idea for a workaround?

@pekhota
Copy link
Contributor

pekhota commented Sep 22, 2021

Hi! Yes, you are right about UDP package size. To be sure about that try to use some logger here.
Example:

class StdErrLogger implements \Psr\Log\LoggerInterface
{
    use LoggerTrait;

    public function log($level, $message, array $context = array())
    {
        if ($level === "debug") {
            return ;
        }
        fwrite(STDERR, sprintf("Level: %s, Msg: %s", $level, $message));
    }
}

Then use it as 3rd parameter to Config constructor:

$config = new Config(
    [
        'sampler' => [
            'type' => Jaeger\SAMPLER_TYPE_CONST,
            'param' => true,
        ],
        'logging' => true,
    ],
    'your-app-name',
    new StdErrLogger() // <---------
);

Then you will see a message like this: Level: warning, Msg: socket_write failed: [code - 40] Message too long.

The solution can be to call $tracer->flush(); somewhere inside you loop to reduce the size of udp package. Example:

$scope1 = $tracer->startActiveSpan('TestSpan', []);

for ($j = 0; $j < 4; $j++) {
    $scope2 = $tracer->startActiveSpan(sprintf("Child %d; level 1", $j));
    for ($i = 0; $i < 50; $i++) {
        $scope3 = $tracer->startActiveSpan(sprintf("Child %d level 2", $i));
        $scope3->close();

        if ($i % 10 === 0) {
            $tracer->flush(); // <------- 
        }
    }
    $scope2->close();

}

$scope1->close();

$tracer->flush();

We will think about the way on how to optimise this.

@maiermic
Copy link
Author

Then you will see a message like this: Level: warning, Msg: socket_write failed: [code - 40] Message too long.

@pekhota Thanks, I do not get such a message (without the flushing in between). Is this a bug?

If I add the flush as you described, it works. Do you think it is possible to automatically flush, before the maximum UDP package size is exceeded?

@sergeyklay
Copy link
Collaborator

sergeyklay commented Sep 22, 2021

It seems the key point is to call $tracer->flush(); right after $scope->close(); iif the maximum UDP package size is exceeded (or close to be). So I don't think this is hard to implement. @maiermic Would you like to send a patch?

@maiermic
Copy link
Author

Would you like to send a patch?

What should the patch contain? Do you mean a PR with the auto-flush logic?

@maiermic
Copy link
Author

Do you think it is possible to automatically flush, before the maximum UDP package size is exceeded?

Shouldn't UdpSender::send already do this using UdpSender:: chunkSplit?

@sergeyklay
Copy link
Collaborator

Would you like to send a patch?

What should the patch contain? Do you mean a PR with the auto-flush logic?

Yeah, I meant exactly this.

@pekhota
Copy link
Contributor

pekhota commented Sep 22, 2021

I'll send a patch in a few days.

@pekhota
Copy link
Contributor

pekhota commented Sep 26, 2021

@maiermic try this fix #112. If everything good we will release new version.

@maiermic
Copy link
Author

@pekhota How can I install this version in my project?

@sergeyklay
Copy link
Collaborator

sergeyklay commented Sep 27, 2021

@maiermic A fix for this issue has been merged into the master branch. Could you please check master branch, so we can release a new version?

@maiermic
Copy link
Author

Does this mean, I can install this version using composer require jonahgeorge/jaeger-client-php:dev-master?

@pekhota
Copy link
Contributor

pekhota commented Sep 27, 2021

yes

@maiermic
Copy link
Author

Thanks 👍   I can confirm that it works 🚀

@sergeyklay
Copy link
Collaborator

Just released 1.2.5 version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants