Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to egress directly to a Zipkin/Jaeger instance? #39

Open
jkaye2012 opened this issue Nov 15, 2020 · 10 comments
Open

How to egress directly to a Zipkin/Jaeger instance? #39

jkaye2012 opened this issue Nov 15, 2020 · 10 comments

Comments

@jkaye2012
Copy link

Hello,

I've been evaluating the possibility of using this library. Things seem to work when manually sending the log file to a Jaeger instance, but is it possible to configure the library to automatically stream the event log to a remote Jaeger instance? I think this would be required for any kind of serious use, and should probably be documented. I do see the ZipkinExporter in opentelemetry-extra, but it's unclear to me how one would use that.

If this is possible and we are able to get it working, I'd be happy to open a PR to document the necessary steps.

Thanks,
Jordan

@ethercrow
Copy link
Owner

Hi Jordan!

I agree that this mode of operation is very important to production use. I did some experiments and was able to get it working with some caveats.

So the operational principle of this library is: the instrumented application writes to eventlog, another application reads the eventlog, does some processing and uploads tracing data to wherever you need.

An interesting moment here is that GHC runtime can write the eventlog into a pipe instead of a file. This way the restreaming application can start reading from that pipe immediately, while the instrumented application is still running. So in a typical enterprise setting you would have a docker image where both processes are running simultaneously, one producing eventlog and one consuming it and sending data to Zipkin for example.

Now to the caveats:

  1. GHC runtime is not very eager to flush the eventlog, so you don't have any guarantees on what would be the delay between "event occurred" and "Zipkin was notified about the event"

  2. GHC runtime has an API for flushing the eventlog explicitly that I wanted to use as a workaround for caveat 1, but it doesn't work. There is a MR by the great @bgamari to fix it here: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/3073

  3. When the instrumented application is using several cores (via -N or -N<explicit-number-of-cores-greater-than-1>) the eventlog portions from different cores are flushed independently and possibly at different points in time. As the result the restreaming application can observe events out of order, e.g. end span foo can come before begin span foo, because the work corresponding to span foo was started on one core and finished on another, but the second one has flushed its eventlog first. This caveat can be mitigated by running the instrumented application with +RTS -N1.

So if these inconveniences are acceptable for you, please give it a try! If not, I'll probably wait a bit more to see if GHC 9.0 fixes the flushing issue. If it does not, I'll pivot and add another mode of operation where the instrumented application sends trace data directly to a collector service. My previous library, lightstep-haskell already works like that, but is only compatible with one collector service, namely Lightstep.

Please let me know what you think and whether waiting for GHC 9.0 would be a viable option or you are locked into some particular version of GHC for the foreseeable future.

@domenkozar
Copy link

@ethercrow will GHC 9 also address (3) caveat?

@ethercrow
Copy link
Owner

@domenkozar not directly, but if we gain the ability to flush the eventlog regularly then it would be feasible for the restreaming application to merge the eventlog portions on the fly, effectively having a sorted stream of events. Does this make sense?

@jkaye2012
Copy link
Author

Sorry for the delayed response.

This does make sense. I think I'll probably wait to try things out until you're able to give this a shot with the newer GHC version.

One question I do have is about the docker setup that you mentioned - my understanding is that it's generally not recommended to run multiple processes within a single docker container (see the first paragraph here: https://docs.docker.com/config/containers/multi-service_container/). It feels like it would be better to somehow wire two (or more) containers together rather than run the "log reader" within the application's container. Thoughts on that?

Thanks,
Jordan

@ethercrow
Copy link
Owner

In my experience having some auxiliary processes in a docker container has always been fine. I read this guidance as "avoid putting unrelated services into one container", not as a rule to only use one process.

@dustin
Copy link

dustin commented Oct 17, 2021

Is the restreamer able to compensate for out-of-order events? I've have to reorder eventlog events to make sense of them before, so it's at least possible on a whole stream, though.

At the very least, the "end before start" thing should be somewhat straightforward, though it would still be confusing to see events attached to spans that have already ended.

@dustin
Copy link

dustin commented Oct 19, 2021

An alternative would be writing code to process the events internally without having to coordinate processes. e.g.:

https://github.com/bgamari/ghc-eventlog-socket and https://github.com/mpickering/eventlog-live

A bit of FFI is required to get a handle on the log, but it would otherwise comfortably fit into the library as is by having the user spawn an event processor thread along with an output processor. It may seem slightly weird to process the events in something that's also creating events, but it should work just fine and be considerably easier to manage.

@shlevy
Copy link

shlevy commented Aug 25, 2022

Are we using flushEventLog here now?

@shlevy
Copy link

shlevy commented Aug 25, 2022

@m1-s
Copy link

m1-s commented Apr 12, 2023

Whats the status here? Does anyone have a minimal example how this works with GHC 9?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants