-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed tracing: time inversion for duration calculation #164
Distributed tracing: time inversion for duration calculation #164
Conversation
Can one of the admins verify this patch? |
f253977
to
d313a20
Compare
Hi @b-viguier , thank you for the PR! We always appreciate you taking the time to dig in to the code!! Looks like it's generating a few errors in the time tests. |
Hi @zsistla 👋 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤞
9d4b39a
to
7e5a287
Compare
Apologies for the delay on this @b-viguier. Review in progress and should be done shortly. |
ok jenkins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great and the internal Jenkins tests passed as well. Thanks for catching this error @b-viguier and for the excellent fix! This should be included in our next release (Agent version 9.18).
No worries for the delay @Fahmy-Mohammed 😄 |
@@ -378,7 +378,16 @@ nrtime_t nr_distributed_trace_inbound_get_timestamp_delta( | |||
return 0; | |||
} | |||
|
|||
return nr_time_duration(txn_start, dt->inbound.timestamp); | |||
return nr_time_duration(dt->inbound.timestamp, txn_start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @b-viguier :)
Overall looks good, and thanks again for your engagement (and patience).
What was the rationale for switching the parameter signature?
There are quite a few places expecting with this order that don't account for the proposed new signature.
axiom/nr_exclusive_time.c:4
axiom/nr_txn.h:2
axiom/nr_distributed_trace.c:1
axiom/util_time.h:1
axiom/nr_segment_external.c:2
axiom/nr_app.c:1
axiom/nr_segment.c:4
axiom/nr_segment_datastore.c:2
axiom/nr_txn.c:3
axiom/nr_agent.c:1
axiom/tests/test_segment.c:1
axiom/tests/test_distributed_trace.c:3
axiom/tests/test_segment_tree.c:2
axiom/tests/test_time.c:3
axiom/tests/test_txn.c:3
agent/php_explain.c:2
agent/php_execute.c:2
agent/php_explain_mysqli.c:2
agent/lib_predis.c:1
agent/php_api_internal.c:1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nr_time_duration
is fine and well used in most cases. Parameters still have the same meaning: nr_time_duration(start, stop)
will return stop - start
if start < stop
(else 0
).
But I'm pretty sure that for this call, parameters are in the wrong order. This is certainly because it seems right to use txn_start
as the start
parameter of nr_time_duration
, but it cannot! Here, we are computing the transport duration. Transport finishes when transaction starts. Chronologically, dt->inbound.timestamp
is before txn_start
, because it is the time where the transport started (timestamp included in HTTP headers).
So, to compute the right duration, we need to compute txn_start - dt->inbound.timestamp
, i.e. nr_time_duration(dt->inbound.timestamp, txn_start)
. I didn't noticed any similar misuse in other calls.
@zsistla is this clearer now? 😄
* fix(distributed tracing): time inversion for duration calculation * fix(distributed tracing): incorrect w3c timestamp parsing
--- subject: PHP agent releaseDate: '2021-08-23' version: 9.18.1.303 downloadLink: 'https://download.newrelic.com/php_agent/archive/9.18.1.303' --- ## New Relic PHP Agent v9.18.1 ## ### New Features ### * [Added](newrelic/newrelic-php-agent#162) a docker development environment. It's now possible for contributors to both develop and test (unit tests and integration tests) without setting up a specific environment on their own system. Please see our [documentation](https://github.com/newrelic/newrelic-php-agent/blob/main/docs/dev_environment.md) for more information. * [Route caching in `Laravel 7.x` is now supported!](newrelic/newrelic-php-agent#174). Transaction naming now works with routes cached via `php artisan route:cache`. @stockalexander, thanks for your contribution! * `Redis::mget` and `Redis::mset` [functions are now supported](newrelic/newrelic-php-agent#156). @b-viguier, thanks for your contribution! ### Bug Fixes ### * [Fixed](newrelic/newrelic-php-agent#161) instances where a memory leak was occurring with our `curl multi` instrumentation. * [Fixed](newrelic/newrelic-php-agent#176) an issue where a supportability metric used to track an edge case was causing a segfault. * [Fixed](newrelic/newrelic-php-agent#87) an issue where PHP versions with an unknown API version were incorrectly handled during Debian package install. * [Fixed](newrelic/newrelic-php-agent#164) instances where `parent.transportDuration` values are `0` for transactions between two PHP applications instrumented through distributed tracing. @b-viguier, thanks for your contribution! * [Fixed](newrelic/newrelic-php-agent#158) an issue where the `newrelic.ini` configuration file was incorrectly installed. @b-viguier, thanks for your contribution! ### Support Statement ### * New Relic recommends that you upgrade the agent regularly and at a minimum every 3 months. As of this release, the oldest supported version is [8.6.0](/docs/release-notes/agent-release-notes/php-release-notes/php-agent-860238/).
--- subject: PHP agent releaseDate: '2021-08-23' version: 9.18.1.303 downloadLink: 'https://download.newrelic.com/php_agent/archive/9.18.1.303' --- ## New Relic PHP Agent v9.18.1 ## ### New Features ### * [Added](newrelic/newrelic-php-agent#162) a docker development environment. It's now possible for contributors to both develop and test (unit tests and integration tests) without setting up a specific environment on their own system. Please see our [documentation](https://github.com/newrelic/newrelic-php-agent/blob/main/docs/dev_environment.md) for more information. * [Route caching in `Laravel 7.x` is now supported!](newrelic/newrelic-php-agent#174). Transaction naming now works with routes cached via `php artisan route:cache`. @stockalexander, thanks for your contribution! * `Redis::mget` and `Redis::mset` [functions are now supported](newrelic/newrelic-php-agent#156). @b-viguier, thanks for your contribution! ### Bug Fixes ### * [Fixed](newrelic/newrelic-php-agent#161) instances where a memory leak was occurring with our `curl multi` instrumentation. * [Fixed](newrelic/newrelic-php-agent#176) an issue where a supportability metric used to track an edge case was causing a segfault. * [Fixed](newrelic/newrelic-php-agent#87) an issue where PHP versions with an unknown API version were incorrectly handled during Debian package install. * [Fixed](newrelic/newrelic-php-agent#164) instances where `parent.transportDuration` values are `0` for transactions between two PHP applications instrumented through distributed tracing. @b-viguier, thanks for your contribution! * [Fixed](newrelic/newrelic-php-agent#158) an issue where the `newrelic.ini` configuration file was incorrectly installed. @b-viguier, thanks for your contribution! ### Support Statement ### * New Relic recommends that you upgrade the agent regularly and at a minimum every 3 months. As of this release, the oldest supported version is [8.6.0](/docs/release-notes/agent-release-notes/php-release-notes/php-agent-860238/).
Context: I'm facing a strange issue in my NewRelic data, all
parent.transportDuration
values are0
for transactions between 2 Php applications instrumented by NewRelic. I opened a ticket and they suggest that servers are not synchronized (but I know they are 😉 ). So I dig into the code…Explanations
The
nr_time_duration
function compute the time elapsed between astart
timestamp and astop
timestamp. Ifstart > stop
, it returns0
.The
nr_distributed_trace_inbound_get_timestamp_delta
function intends to return the transport duration, i.e. the time elapsed during the timestamp stored in distributed trace header and the start time of the current transaction, by callingnr_time_duration
.Hypothesis: parameters are passed in the wrong order, then all transport duration are reported to
0
😨Example: When
parent.transportDuration
is set.if(nr_distributed_trace_inbound_get_timestamp_delta(dt, 0))
(to check if distributed trace has a non-zero timestamp as far as I understand), it seems to work because it relies on this wrong behavior. But this function cannot work in both case, in pseudo-code we cannot haveduration(dt, 0) > 0
ANDduration(dt, now) > 0
if0 < dt < now
.ℹ️ Tests use the same function to compute the expected value, then a wrong value always equals the same wrong value.
What I did
nr_time_duration
call insidenr_distributed_trace_inbound_get_timestamp_delta
0
(a wrong value in this case). We could also hard-code the expected value, to be sure to not rely on internal implementation of the tested function.nr_distributed_trace_inbound_get_timestamp_delta(dt, 0)
calls to a new explicit functionnr_distributed_trace_inbound_has_timestamp
(I think it's the original intent). I created some tests, but I don't understand thedefault value
case ofnr_distributed_trace_inbound_get_timestamp_delta
function… IMHO there is some trouble here, since it returns0
because the transaction have a0
start time 🤔nr_distributed_trace_convert_w3c_headers_tracestate
, the timestamp string value was converted to an integer, causing a capacity overflow onx86
arch, I changed to convert tolong long unsigned
And now…
I made this PR to run tests on CI, and to share my thoughts with you. What do you think? Am I totally wrong or is there something to investigate? Did I misunderstood something?
Thanks for your attention, I'm looking forward for your feedback 🙇♂️