Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

Backport crossdock fixes #7

Closed

Conversation

isaachier
Copy link
Contributor

No description provided.

Signed-off-by: Isaac Hier <isaachier@gmail.com>
Signed-off-by: Isaac Hier <isaachier@gmail.com>
Signed-off-by: Isaac Hier <isaachier@gmail.com>
Signed-off-by: Isaac Hier <isaachier@gmail.com>
Signed-off-by: Isaac Hier <isaachier@gmail.com>
}

void getSamplingStrategy(SamplingStrategyResponse& result,
const std::string& serviceName) override
{
if (_serverAddr == net::IPAddress()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, not following. _serverAddr is set to net::IPAddress() if the client couldn't reach agent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. net::IPAddress() is the default value for an empty IP and the address is only set upon successful connection. The reason for this is to avoid contacting the server if we couldn't connect on construction. I guess the real solution is not to start the looping thread at all, but that is another class that ends up calling this periodically.

/build
/crossdock/crossdock
/crossdock/jaeger-docker-compose.yml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this a temporary thing? don't think we should be ignoring this completely

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crossdock/crossdock is the executable binary itself. Actually changed this in my recent crossdock development. The jaeger-docker-compose.yml is a downloaded file to launch the other jaeger clients.

Isaac Hier added 2 commits November 14, 2017 15:31
Signed-off-by: Isaac Hier <ihier@uber.com>
Signed-off-by: Isaac Hier <ihier@uber.com>
@isaachier
Copy link
Contributor Author

I'm closing this in favor of real crossdock pull request

@isaachier isaachier closed this Nov 17, 2017
@isaachier isaachier deleted the backport-crossdock-fixes branch December 10, 2017 14:50
tudor added a commit to rockset/jaeger-client-cpp that referenced this pull request Sep 15, 2018
`Span::isFinished()` returns true iff the span's duration is zero.
So, if `steady_clock::now()` returns the same value at span creation as
at `Finish()` (unlikely, but possible, especially in virtualized
environments), the following scenario can happen:

- `Span::FinishWithOptions` sets `_duration` to zero
- and then calls `reportSpan`, which calls `RemoteReporter::report`
- which acquires the reporter mutex and
- adds the span to the reporter's `_queue`

Later:
- `RemoteReporter::sweepQueue` acquires the reporter mutex
- `RemoteReporter::sweepQueue` makes a copy of the span
(https://github.com/jaegertracing/jaeger-client-cpp/blob/7e9a135b362c47f7e4d42e0693b05b91a70e6888/src/jaegertracing/reporters/RemoteReporter.cpp#L95)
- The span is serialized in `RemoteReporter::sendSpan`
- The span (the copy made above) needs to be destroyed because the block in `sweepQueue` ends.
- The span's destructor is called, which calls `FinishWithOptions` again
- ... which sees that `_duration` is zero, so `isFinished()` returns
false:
https://github.com/jaegertracing/jaeger-client-cpp/blob/7e9a135b362c47f7e4d42e0693b05b91a70e6888/src/jaegertracing/Span.cpp#L94
- so it tries to call `reportSpan` again
- which tries to acquire the reporter's mutex
- but we're holding the lock in `sweepQueue`, so
- deadlock.

Relevant stack trace:
```
(gdb) bt
(gdb) bt
 #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 jaegertracing#1  0x00007fb09c722023 in __GI___pthread_mutex_lock (mutex=0x7fb088b1c1c0) at ../nptl/pthread_mutex_lock.c:78
 jaegertracing#2  0x00007fb08860987b in jaegertracing::reporters::RemoteReporter::report(jaegertracing::Span const&) () from /usr/local/lib/libjaegertracing_plugin.so
 jaegertracing#3  0x00007fb0885b6506 in jaegertracing::Span::FinishWithOptions(opentracing::v2::FinishSpanOptions const&) [clone .localalias.361] () from /usr/local/lib/libjaegertracing_plugin.so
 jaegertracing#4  0x00007fb08860849a in jaegertracing::Span::~Span() [clone .constprop.253] () from /usr/local/lib/libjaegertracing_plugin.so
 jaegertracing#5  0x00007fb08860a8fa in jaegertracing::reporters::RemoteReporter::sweepQueue() () from /usr/local/lib/libjaegertracing_plugin.so
 jaegertracing#6  0x00007fb0886f2f5f in execute_native_thread_routine () from /usr/local/lib/libjaegertracing_plugin.so
 jaegertracing#7  0x00007fb09c71f6db in start_thread (arg=0x7fb087574700) at pthread_create.c:463
 jaegertracing#8  0x00007fb0989ae88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Fixed by enforcing a minimum span duration of 1 `SteadyClock` tick.

Signed-off-by: Tudor Bosman <tudor@rockset.com>
isaachier pushed a commit that referenced this pull request Sep 18, 2018
* Fix deadlock if steady_clock::now() returns the same value twice

`Span::isFinished()` returns true iff the span's duration is zero.
So, if `steady_clock::now()` returns the same value at span creation as
at `Finish()` (unlikely, but possible, especially in virtualized
environments), the following scenario can happen:

- `Span::FinishWithOptions` sets `_duration` to zero
- and then calls `reportSpan`, which calls `RemoteReporter::report`
- which acquires the reporter mutex and
- adds the span to the reporter's `_queue`

Later:
- `RemoteReporter::sweepQueue` acquires the reporter mutex
- `RemoteReporter::sweepQueue` makes a copy of the span
(https://github.com/jaegertracing/jaeger-client-cpp/blob/7e9a135b362c47f7e4d42e0693b05b91a70e6888/src/jaegertracing/reporters/RemoteReporter.cpp#L95)
- The span is serialized in `RemoteReporter::sendSpan`
- The span (the copy made above) needs to be destroyed because the block in `sweepQueue` ends.
- The span's destructor is called, which calls `FinishWithOptions` again
- ... which sees that `_duration` is zero, so `isFinished()` returns
false:
https://github.com/jaegertracing/jaeger-client-cpp/blob/7e9a135b362c47f7e4d42e0693b05b91a70e6888/src/jaegertracing/Span.cpp#L94
- so it tries to call `reportSpan` again
- which tries to acquire the reporter's mutex
- but we're holding the lock in `sweepQueue`, so
- deadlock.

Relevant stack trace:
```
(gdb) bt
(gdb) bt
 #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
 #1  0x00007fb09c722023 in __GI___pthread_mutex_lock (mutex=0x7fb088b1c1c0) at ../nptl/pthread_mutex_lock.c:78
 #2  0x00007fb08860987b in jaegertracing::reporters::RemoteReporter::report(jaegertracing::Span const&) () from /usr/local/lib/libjaegertracing_plugin.so
 #3  0x00007fb0885b6506 in jaegertracing::Span::FinishWithOptions(opentracing::v2::FinishSpanOptions const&) [clone .localalias.361] () from /usr/local/lib/libjaegertracing_plugin.so
 #4  0x00007fb08860849a in jaegertracing::Span::~Span() [clone .constprop.253] () from /usr/local/lib/libjaegertracing_plugin.so
 #5  0x00007fb08860a8fa in jaegertracing::reporters::RemoteReporter::sweepQueue() () from /usr/local/lib/libjaegertracing_plugin.so
 #6  0x00007fb0886f2f5f in execute_native_thread_routine () from /usr/local/lib/libjaegertracing_plugin.so
 #7  0x00007fb09c71f6db in start_thread (arg=0x7fb087574700) at pthread_create.c:463
 #8  0x00007fb0989ae88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```

Fixed by enforcing a minimum span duration of 1 `SteadyClock` tick.

Signed-off-by: Tudor Bosman <tudor@rockset.com>

* fix typo

Signed-off-by: Tudor Bosman <tudor@rockset.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants