[ADDED] Distributed Message Tracing #5014

kozlovic · 2024-01-29T21:39:27Z

When an incoming message contains the header Nats-Trace-Dest set to a valid subject, the server will send there messages representing events happening in each server where the message is processed.

The events will give details about ingress, subject mapping, stream export and service imports, jetstream and egress to subscriptions and/or routes, leafnodes and gateways. Except for a standalone server it is expected to receive multiple trace messages for a given inbound message.

The header Nats-Trace-Only, if set to true, will produce the same tracing events than without that header, except that the message will not be delivered to subscriptions or inserted in the JetStream stream. This is usefull to see the path that a message would take but without affecting running applications.

Note that in a mixed environment where not all servers would be running at 2.11+, when a message is supposed to be traced only and the remote does not understand message tracing, the message will not be forwared and the Egress event for that remote will indicate the reason.

Signed-off-by: Ivan Kozlovic ivan@synadia.com

ripienaar · 2024-01-30T12:42:01Z

@kozlovic When tracing over an account import/export is it expected that I wont get trace messages once the message cross the account boundary?

Client on account "one" in london -> imports weather.> from account weather -> client on account weather connected across a gateway

In this case I get only 1 trace event with

  "events": [
    {
      "type": "in",
      "kind": 0,
      "cid": 859,
      "name": "NATS CLI Version development",
      "acc": "one",
      "subj": "weather.lon"
    },
    {
      "type": "si",
      "acc": "weather",
      "from": "weather.lon",
      "to": "weather.lon"
    }
  ]

In the same account I get events from otherside of the gateway also

I guess it could be a security concern to trace it through but otoh when debugging would be great to have

kozlovic · 2024-01-30T14:09:20Z

@ripienaar The decision was to go through service import only if "share: true" is specified in the service import. This was for the reason you just mentioned: security concern. If you have sharing enabled in the service import, then there may be an issue. I have test for service import with sharing/not sharing across super cluster and it passes, but there may still be an issue.

ripienaar · 2024-01-30T14:10:37Z

@ripienaar The decision was to go through service import only if "share: true" is specified in the service import. This was for the reason you just mentioned: security concern. If you have sharing enabled in the service import, then there may be an issue. I have test for service import with sharing/not sharing across super cluster and it passes, but there may still be an issue.

I didnt have share on, will try

kozlovic · 2024-01-30T16:32:09Z

@ripienaar I am thinking of adding something to the MsgTraceEgress structure to be able to "link" MsgTraceEvent's from a network topology together. Something like this:

diff --git a/server/msgtrace.go b/server/msgtrace.go
index 88c35d5f..601cf16d 100644
--- a/server/msgtrace.go
+++ b/server/msgtrace.go
@@ -110,6 +110,11 @@ type MsgTraceEgress struct {
        Subscription string `json:"sub,omitempty"`
        Queue        string `json:"queue,omitempty"`
        Error        string `json:"error,omitempty"`
+
+       // This is for applications that unmarshal the trace events
+       // and want to link an egress to route/leaf/gateway with
+       // the MsgTraceEvent from that server.
+       Link *MsgTraceEvent `json:"-"`
 }
 
 // -------------------------------------------------------------

The idea is that the library that assembles message trace events would then return only one event (the one that had a CLIENT as an Ingress) and then by going through the Events list, when getting an Egress that is not a CLIENT kind, there would be the Link field that would not be nil and you could descend the graph. If the Link is nil, it would be an indication that this trace from that server (as given by the Egress's info) is missing.

Any thoughts? Would you think that it should be done differently?

ripienaar · 2024-01-30T16:38:38Z

Any thoughts? Would you think that it should be done differently?

Yeah anything you can do to make assembling the flows easier would help (been trying to hack up a sorter/linker for them and its difficult!)

kozlovic · 2024-01-30T16:40:02Z

@ripienaar I will have the library repo ready today.

kozlovic · 2024-01-30T17:46:25Z

@ripienaar I am coming to the realization that I don't think we need the "Hop" because the Ingress/Egress has already the name of the server the message comes from/sent to, so that should be enough to link them. As for the Hops (the count at the MsgTraceEvent level), again, it is nice, but it is basically the number of Egress that are not a CLIENT kind, so again, not strictly necessary.

ripienaar · 2024-01-30T17:53:03Z

@ripienaar I am coming to the realization that I don't think we need the "Hop" because the Ingress/Egress has already the name of the server the message comes from/sent to, so that should be enough to link them. As for the Hops (the count at the MsgTraceEvent level), again, it is nice, but it is basically the number of Egress that are not a CLIENT kind, so again, not strictly necessary.

Yeah sounds reasonable :)

Kind of this is why I wanted a kind of draft client to assemble them, we'll discover a lot in doing that

MauriceVanVeen · 2024-01-30T18:40:15Z

Really cool! Have been trying it out for a bit. 🤩

Not sure if there's also an intention to add tracing between stream => consumer as well as mirroring/sourcing?
Am thinking of a use case where someone publishes into stream 1, a consumer consumes from that stream and publishes 1 or more events into stream 2.
Would that also become available with this tracing, or would that be an extension to this?

Thinking along the lines of a full graph, from the initial message all the way down to how it was processed. (From an OpenTelemetry perspective that could be one 'trace' consisting of multiple 'spans')

kozlovic · 2024-01-31T03:12:33Z

Would that also become available with this tracing, or would that be an extension to this?

@MauriceVanVeen No, currently the only JetStream related tracing is to know if a message would have been inserted to a stream, with which subject, and if not what caused the message to not be inserted (say presence of a header that causes server checks to reject it).

hwinkel · 2024-01-31T05:48:34Z

How are the traces exported? Towards "standard" OTEL collectors like jaeger, tempo etc. ? Does the export happens on a per serves bases or centrally like surveyor does ?

ripienaar · 2024-01-31T09:01:56Z

How are the traces exported? Towards "standard" OTEL collectors like jaeger, tempo etc. ? Does the export happens on a per serves bases or centrally like surveyor does ?

Now the messages are published as JSON (see msgtrace.go for the types) to keep the server light weight, I imagine you'd run a collector process next to your nats servers and the purpose of that would be to receive these, assemble them and send them to otel etc

For now otel/jaeger etc headers do not trigger a trace (though our service latency tracing does support this see https://github.com/nats-io/nats-architecture-and-design/blob/main/adr/ADR-3.md)

The focus of this is primarily debugging though and not, for example, tracing 100% of message routing

bruth · 2024-01-31T14:47:29Z

@hwinkel You can think of this as the lower layer of the output. It is published to the designated trace subject which then the subscribed client would do the work to map/export to otel (or anything else). That is the next bit of work to write the exporter.

bruth · 2024-01-31T14:48:37Z

And second what @ripienaar said. The scope was initially for trace debugging, albeit, depending on throughput/load of a system, you could opt to trace whatever you want.

hwinkel · 2024-01-31T15:49:54Z

@hwinkel You can think of this as the lower layer of the output. It is published to the designated trace subject which then the subscribed client would do the work to map/export to otel (or anything else). That is the next bit of work to write the exporter.

I like to have it on a lower, NATS native layer, hence publish to a subject. The OTEL "Formatter and Exporter" could be modelled in the same way as Surveyor already does for the metrics. IMHO a much cleaner NATS native architecture which is location independent for Trace Data.

ripienaar · 2024-01-31T21:32:56Z

For community who are following these are some renders of traces in various cases

https://gist.github.com/ripienaar/78666080a4296bc5a0e317d1c9c08df8

The cluster is a 3 cluster super cluster each with 3 nodes and a single leafnode hanging off the lon cluster.

kozlovic · 2024-01-31T22:49:18Z

@ripienaar Just a little comment, I think you should use %q for the strings so they are enclosed with double quotes, it would help for the reading.

davedotdev · 2024-02-01T15:00:44Z

I spent some (not negligible) time on this in the networking space working for a large vendor. Multiple approaches were taken, some out-of-band and some in-band, but the outcomes were the same; the desire to have tracing within an OAM domain, for targeted packets or a sample set. A networked system has some configuration to make it part of an OAM domain and each system in that domain must be able to understand the requirements of being part of a tracing capable system, or pass it on without modification exactly like you've done here with the header pass-thru.

What I wanted to raise for debate, was the ability to turn have tracing domains for NATS servers, so that one or more server or group of servers could ignore the header. For instance, this could be useful in secure environments, where some groups can trace and others cannot, or in situations where visibility needs to be reduced. Initial suggestion would be to have a server tag like "OAMDomain": "[X, Y, Z]". Is it something that's been thought about? Would any of the commenters here find it useful? I can see a requirement for it and if it changes any design decisions being made, it's worth raising for future considerations.

A good use-case would be hub and spoke, where the spokes are under your control, but the hubs are not. Traffic would traverse the hub in either case, but the traces would show the hub ingress and egress points and the resulting trace would show the delay, but not reveal anything about the hub.

Nice work Ivan - this really is great (and I'm not griping - just thinking about wider use cases!!).

ripienaar · 2024-02-01T19:16:18Z

A good use-case would be hub and spoke, where the spokes are under your control, but the hubs are not. Traffic would traverse the hub in either case, but the traces would show the hub ingress and egress points and the resulting trace would show the delay, but not reveal anything about the hub.

for now our only control here is when crossing account barriers if you dont set sharing true you wont get traces from the other side. One can imagine spoke/leafnode on an account and access to it over exports/imports would control visibility

I dont want to over complicate things from the get-go so but this is a good point for future iterations. I imagine we'd want to eventually support disabling tracing on an account level for sensitive cases. Much can be learned from seeing which mappings apply etc

ripienaar · 2024-02-01T19:59:31Z

For me I am happy with where this is to merge it. We might iterate and once it’s in a bit look at how it would be to trigger it from headers from other systems like we support for service traces

for now in the interest of getting it in peoples hands via nightly builds this works well.

To be clear I didn’t code review I looked at functionality only

bruth · 2024-02-02T03:11:12Z

I agree with getting it merged, knowing it may need some iteration. What are your thoughts @Jarema?

Once merged, I want to tag a preview release with this feature so people can access a build to try it out.

derekcollison

In general LGTM - Some comments and questions.

server/client.go

derekcollison · 2024-02-03T19:13:11Z

server/client.go

+	if mt != nil {
+		mt.addServiceImportEvent(siAcc.GetName(), string(pacopy.subject), to)
+		// If we are not sharing and doing trace only, we stop at this level.
+		if !share {


As noted I messed this up, let's switch to allow_trace for service export and stream import.

Will do in a separate commit.

server/client.go

server/events.go

server/jetstream_cluster.go

server/msgtrace.go

server/stream.go

Jarema · 2024-02-05T11:05:19Z

Yes, agree with merging and doing a preview release @bruth .
This will allow us to get more community feedback which could be invaluable for feature like this one.
The only concern is conflicts when cherry picking for patch releases, but that is unavoidable, as anyway this branch would need constant rebases and updates which is similar cost.
Let's merge and iterate.

ripienaar · 2024-02-05T11:18:31Z

We're working on a fix about import sharing, will look to merge after

When an incoming message contains the header `Nats-Trace-Dest` set to a valid subject, the server will send there messages representing events happening in each server where the message is processed. The events will give details about ingress, subject mapping, stream export and service imports, jetstream and egress to subscriptions and/or routes, leafnodes and gateways. Except for a standalone server it is expected to receive multiple trace messages for a given inbound message. The header `Nats-Trace-Only`, if set to `true`, will produce the same tracing events than without that header, except that the message will not be delivered to subscriptions or inserted in the JetStream stream. This is usefull to see the path that a message would take but without affecting running applications. Note that in a mixed environment where not all servers would be running at 2.11+, when a message is supposed to be traced only and the remote does not understand message tracing, the message will not be forwared and the Egress event for that remote will indicate the reason. Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

kozlovic · 2024-02-06T17:34:49Z

I'll just add a log line and sleep :)

Not ideal, but this is a more general issue of interest propagation with gateways, unfortunately.

ripienaar · 2024-02-06T17:42:48Z

thanks @kozlovic

ripienaar · 2024-02-06T17:43:05Z

I am good to merge this @derekcollison @kozlovic @bruth

derekcollison · 2024-02-06T23:36:04Z

ok so ready for final review?

Will try to get to this tomorrow.

kozlovic · 2024-02-07T15:52:30Z

@derekcollison Actually, could you let me check if I could replace addition of boolean in server's INFO/CONNECT in order to know if the remote supports message tracing, with the use of the Proto field. I will update the PR with a note when I am done (either with a new commit or saying that I am not changing and it is ok to review).

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

kozlovic · 2024-02-07T19:52:37Z

@derekcollison Ok, I have made the changes. So you are good to go to review the last 2 commits, which were added since you have last reviewed. Thanks!

derekcollison

LGTM - two minor comments.

derekcollison · 2024-02-07T23:06:08Z

server/accounts.go

@@ -2478,7 +2507,11 @@ func (a *Account) AddMappedStreamImportWithClaim(account *Account, from, to stri
 		a.mu.Unlock()
 		return ErrStreamImportDuplicate
 	}
-	a.imports.streams = append(a.imports.streams, &streamImport{account, from, to, tr, nil, imClaim, usePub, false})
+	// TODO(ik): When AllowTrace is added to JWT, uncomment those lines:
+	// if imClaim != nil {


Maybe coordinate with @aricart to get it added.

There would be also some code to add in updateAccountClaimsWithRefresh() to possibly call the function to set the service export allow trace property. And I would like to add a test with JWT of course. I would prefer to wait and add the JWT support before merging, but that would depend on how long it would take to get the JWT library updated. It can of course all be added after the merge. Your call Derek.
@aricart Would you mind adding a AllowTrace boolean property to a ServiceExport and StreamImport? If you are using common structures, validation should make it an error if this property is set on a ServiceImport or a StreamExport (the opposite of where it is valid).

server/accounts.go

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

When an incoming message contains the header `Nats-Trace-Dest` set to a valid subject, the server will send there messages representing events happening in each server where the message is processed. The events will give details about ingress, subject mapping, stream export and service imports, jetstream and egress to subscriptions and/or routes, leafnodes and gateways. Except for a standalone server it is expected to receive multiple trace messages for a given inbound message. The header `Nats-Trace-Only`, if set to `true`, will produce the same tracing events than without that header, except that the message will not be delivered to subscriptions or inserted in the JetStream stream. This is usefull to see the path that a message would take but without affecting running applications. Note that in a mixed environment where not all servers would be running at 2.11+, when a message is supposed to be traced only and the remote does not understand message tracing, the message will not be forwared and the Egress event for that remote will indicate the reason. Signed-off-by: Ivan Kozlovic <ivan@synadia.com> --------- Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

If the `Nats-Trace-Dest` header is not present, but `Traceparent` is and its last token is `01`, then message tracing is triggered. This also requires that the account be defined with a `trace_dest` subject so that traces can be sent there. Note that `Nats-Trace-Only` is not applicable for `Traceparent`. Addition to PR #5014 Resolves #5052 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

Related to #5014 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

If an account has a trace destination and an incoming message has the `traceparent` header with the proper sampled flag, a trace message would be triggered. The `sampling` field allows to trace a certain percentage of the traffic. The field `trace_dest` or now `msg_trace` can be a simple string representing the destination, and in this case sampling is 100% or it can be a structure with the `dest` and `sampling` fields. Sampling values that are negative or above 100 will trigger an error on configuration parsing. A value of 0 is converted to 100. If the sampling is specified without an account trace destination, it is set to 0 and a warning is issued when parsing configuration. There is similar support for the property set in a JWT account claim. Relates to #5014 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

MikaelElkiaer · 2024-04-30T06:26:47Z

After looking through this PR and related ones, it is still unclear to me whether spans can be exported.

Based on this reply I guess it is not yet possible to export OTEL spans to a service such as OTEL Collector or Grafana Tempo.
If not, is this in the pipeline? I would love to export spans to OTEL Collector to link NATS spans to my application spans in case of erroneous requests.

ripienaar · 2024-04-30T06:43:04Z

We do not export spans exactly - we have no otel dependencies etc - but a receiver for these messages can be written that turns them into spans and sends them to a collector.

We have some early reviewers who have written such receivers and reported success with that approach

MikaelElkiaer · 2024-05-01T05:53:29Z

We do not export spans exactly - we have no otel dependencies etc - but a receiver for these messages can be written that turns them into spans and sends them to a collector.

We have some early reviewers who have written such receivers and reported success with that approach

Thanks for clarifying.
I had assumed it was something written to stdout at first, but it makes sense that it is a message subject.
Does it need to be enabled? Is there any docs on this?

ripienaar · 2024-05-01T06:06:17Z

It is available in the preview releases and main of the CLI

see https://github.com/nats-io/nats-architecture-and-design/blob/main/adr/ADR-41.md

kozlovic requested a review from a team as a code owner January 29, 2024 21:39

kozlovic requested review from ripienaar and derekcollison January 29, 2024 21:39

derekcollison requested review from neilalexander and Jarema January 30, 2024 00:38

derekcollison reviewed Feb 3, 2024

View reviewed changes

Use protocol version for server connection msg tracing capability

33a5155

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

derekcollison self-requested a review February 7, 2024 23:03

derekcollison approved these changes Feb 7, 2024

View reviewed changes

Update copyright

b2778aa

Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

derekcollison merged commit cca23cd into main Feb 8, 2024
4 checks passed

derekcollison deleted the msg_tracing branch February 8, 2024 18:59

ripienaar mentioned this pull request Feb 9, 2024

Otel activation for message traces #5052

Closed

kozlovic mentioned this pull request Feb 11, 2024

Added OpenTelemetry support #5057

Merged

kozlovic added a commit that referenced this pull request Feb 14, 2024

Made updates to message tracing code to support JWT updates

3608bcf

Related to #5014 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

kozlovic added a commit that referenced this pull request Feb 15, 2024

Made updates to message tracing code to support JWT updates

b542edc

Related to #5014 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

kozlovic mentioned this pull request Feb 15, 2024

Made updates to message tracing code to support JWT updates #5093

Merged

kozlovic added a commit that referenced this pull request Feb 15, 2024

Made updates to message tracing code to support JWT updates

d858f62

Related to #5014 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

derekcollison added a commit that referenced this pull request Feb 15, 2024

Made updates to message tracing code to support JWT updates (#5093)

9cf42f6

Related to #5014 Signed-off-by: Ivan Kozlovic <ivan@synadia.com>

kozlovic mentioned this pull request Feb 22, 2024

Added sampling for account's trace destination #5121

Merged

[ADDED] Distributed Message Tracing #5014

[ADDED] Distributed Message Tracing #5014

Conversation

kozlovic commented Jan 29, 2024

ripienaar commented Jan 30, 2024

kozlovic commented Jan 30, 2024

ripienaar commented Jan 30, 2024

kozlovic commented Jan 30, 2024

ripienaar commented Jan 30, 2024

kozlovic commented Jan 30, 2024

kozlovic commented Jan 30, 2024

ripienaar commented Jan 30, 2024

MauriceVanVeen commented Jan 30, 2024

kozlovic commented Jan 31, 2024

hwinkel commented Jan 31, 2024

ripienaar commented Jan 31, 2024

bruth commented Jan 31, 2024

bruth commented Jan 31, 2024

hwinkel commented Jan 31, 2024

ripienaar commented Jan 31, 2024

kozlovic commented Jan 31, 2024

davedotdev commented Feb 1, 2024

ripienaar commented Feb 1, 2024

ripienaar commented Feb 1, 2024

bruth commented Feb 2, 2024

derekcollison left a comment

Choose a reason for hiding this comment

derekcollison Feb 3, 2024

Choose a reason for hiding this comment

kozlovic Feb 5, 2024

Choose a reason for hiding this comment

Jarema commented Feb 5, 2024

ripienaar commented Feb 5, 2024

kozlovic commented Feb 6, 2024

ripienaar commented Feb 6, 2024

ripienaar commented Feb 6, 2024

derekcollison commented Feb 6, 2024

kozlovic commented Feb 7, 2024

kozlovic commented Feb 7, 2024

derekcollison left a comment

Choose a reason for hiding this comment

derekcollison Feb 7, 2024

Choose a reason for hiding this comment

kozlovic Feb 8, 2024

Choose a reason for hiding this comment

MikaelElkiaer commented Apr 30, 2024

ripienaar commented Apr 30, 2024

MikaelElkiaer commented May 1, 2024

ripienaar commented May 1, 2024