Signal Translation Proprosal #2336

djaglowski · 2021-01-06T15:42:50Z

Is your feature request related to a problem? Please describe.
It is sometimes necessary to convert signals from one signal type to another. Signal translations may be direct one-to-one operations, or may include derivative signals such as counts, aggregations, summarizations, etc. A specific example can be seen in contrib #403.

The OpenTelemetry Collector does not currently have a standard mechanism to support such use cases.

Describe the solution you'd like

Introduce a new component type, tentatively called connectors, which would serve the following purposes:

Act as a formal point of translation from one signal type to another
Avoid introducing to the pipeline any additional fan-out points
Formalize the linking together of multiple pipelines

Rationale

First, consider that exporters may be thought of as a natural point of connection between pipelines, particularly when it comes to translation from one signal type to another. After all, exporters are fundamentally concerned with translating OTLP formatted signals to any arbitrary output format. This output format could just as well be another OTLP signal type, which could then be sent to an OTLP receiver.

Next, consider the following example scenario, simplified from the suggestion here.

tracing data, processed to derive metrics and exported
tracing data then needs to be further processed before being exported

A minimal solution to this use case could depend on the addition of a new exporter, which would derive metrics from traces and emit them like the OTLP exporter. This solution would use the fanout stage of one pipeline to replicate traces, and then concurrently send traces to two additional pipelines.

receivers:
  some/tracingreceiver:
  otlp/tracingreceiver:
    endpoint: "0.0.0.0:1111"
  otlp/metricsreceiver:
    endpoint: "0.0.0.0:2222"
  
processors:
  some/tracingprocessor:
  some/metricsprocessor:
  
exporters:
  some/tracingexporter:
  some/metricsexporter:
  otlp/tracingexporter:
    endpoint: "0.0.0.0:1111"
  tracestometrics: # perform some derivation and then behave as otlp/metricsexporter
    endpoint: "0.0.0.0:2222"

service:
  pipelines:
    a:
      receivers: [some/tracingreceiver]
      exporters: [otlp/tracingexporter, tracestometrics]
    b:
      receivers: [some/tracingreceiver]
      processors: [some/tracingprocessor]
      exporters: [some/tracingexporter]
    c:
      receivers: [some/metricsreceiver]
      processors: [some/metricsprocessor]
      exporters: [some/metricsexporter]

Problems with Loosely Connected Pipelines

The above describes a workable near-term strategy for approaching signal translation, but it leaves a few things to be desired:

Usability is somewhat poor
- Users must reason about both the logical flow of signals and the implementation of linking pipelines
- Users may have to allocate/manage ports for local signal transmission
If the collector wishes to enforce constraints upon pipeline dependencies, then it must first infer how pipelines are connected
- if pipeline(“a”).hasExporterOnPort(1111) && pipeline(“b”).hasReceiverOnPort(1111) { ... }
Transmission of signals between pipelines should not necessarily require network protocols

All of the above issues can be resolved by formalizing the OTLP exporter -> OTLP receiver pattern into a new component type.

Configuration

In the configuration file, connectors could be specified in a new section alongside receivers, processors, exporters, and extensions. Then, when including connectors in a pipeline, they would be used in place of one exporter and one receiver. Solving for the same example as above, the config could be defined as follows:

receivers:
  some/tracingreceiver:
  
processors:
  some/tracingprocessor:
  some/metricsprocessor:
  
exporters:
  some/tracingexporter:
  some/metricsexporter:

connectors: # must be used as both exporter and receiver
  tracestotraces:
  tracestometrics:

service:
  pipelines:
    a:
      receivers: [some/tracingreceiver]
      exporters: [tracestotraces, tracestometrics]
    b:
      receivers: [tracestotraces]
      processors: [some/tracingprocessor]
      exporters: [some/tracingexporter]
    c:
      receivers: [tracestometrics]
      processors: [some/metricsprocessor]
      exporters: [some/metricsexporter]

Additional Suggestions

Rules for Connectors

Connectors should implement
- One of: TracesExporter, MetricsExporter, LogsExporter
- And also one of: TracesReceiver, MetricsReceiver, LogsReceiver
A connector, when used in a pipeline, must be used as both an exporter and a receiver
Exporters may be used in the same pipeline where a connector is acting as an exporter
- Shown in Pipeline A below
Receivers may be used in the same pipeline where a connector is acting as an receiver
- Shown in Pipeline B below

Implementation

Initially, this design could rely on the OTLP exporter and OTLP receiver, with individual connectors needing to implement only an additional method that processes signal from one type to another. In the longer term, functionality could be refactored under the hood as necessary.

If it seems necessary to formalize the underlying translation types (traces-to-metrics, metrics-to-logs, traces-to-traces, etc), then some base functionality could be encapsulated in a connectorhelper package that resembles exporterhelper and receiverhelper. At a minimum, some enforcement of the input and output types may be useful.

The new connector type also makes it much easier to validate inter-pipeline linkages. This means that any rules that enforced a DAG (or tree) upon pipelines are much simpler to implement.

The text was updated successfully, but these errors were encountered:

bogdandrutu · 2021-01-20T23:09:52Z

I really like this proposal, I would like to think couple more days about the name of the new component but sounds promising.

Some questions/thoughts also:

I expect after a connector the new pipeline also has a fan-out correct?
Are we going to provide standard connectors like tracetotrace? I am a bit confused about tracetotrace use-case, can you elaborate on that?
The tracetometrics is actually the spanmetrics processor that is started in contrib correct?
We need to be careful when shutting down the pipelines because with the connectors we have a dependency between pipelines now.

djaglowski · 2021-01-21T22:21:43Z

@bogdandrutu, I have to admit that since writing up this proposal, I have learned something about pipelines which I think undermines much of the utility of the proposed functionality. I do believe there may still be something to it, so I'll explain the misunderstanding, what I think it means for the original design, and try address the points you've raised along the way.

What I misunderstood
My previous understanding was that fan-out only occurs immediately before exporters. I think this is actually correct when considering a single pipeline. However, I missed the fact that receivers can themselves fan-out to multiple pipelines. This isn't mentioned in the main design doc, but is found in source comments here.

What I think the misunderstanding takes away from the proposal
Making use of the receiver fan-out capability, I believe signal translation is better solved in processors, such as the spanmetrics processor. The use case shown in detail above can be solved with just a processor that converts the signal.

What I think is left in the proposal
This proposal would still allow for the chaining together of pipelines, which would solve use case such as the following:

process signals and then
1. export at high fidelity
2. summarize, and export at lower fidelity

This functionality may be useful, but I'm not sure it directly relates to signal translation as much as it does to the general idea of pipeline chaining. With that in mind, this issue should perhaps be closed and I can open a new one pertaining specifically to that topic, if it appears to be worth considering further.

The final point you noted, would still be a problem to solve, but I believe this could be handled by determining a topological ordering among all pipeline components, and then starting/stoping according to that order.

bogdandrutu · 2021-01-21T23:31:52Z

Making use of the receiver fan-out capability, I believe signal translation is better solved in processors, such as the spanmetrics processor. The use case shown in detail above can be solved with just a processor that converts the signal.

The image you added is right now impossible to create, because right now a pipeline cannot start with a signal and end with a different signal. The signal for a pipeline is the same for receivers and exporters and defined in the name of the pipeline.

I think the connector is still useful unless we remove that restriction and use the pipeline signal as just the input and determine the output dynamically which may be possible but I think will be not trivial.

djaglowski · 2021-01-22T21:05:19Z

a pipeline cannot start with a signal and end with a different signal

Given this, I'll try to address your questions more directly.

I expect after a connector the new pipeline also has a fan-out correct?

I think this should be the default design, and that we can exclude the fan-out if there's a particular reason. My thinking here is that the component, as proposed, would be a receiver (as well as an exporter). Since receivers currently have a built-in fan-out, I would assume by default that this would be the case here as well.

I am a bit confused about tracetotrace use-case, can you elaborate on that?

The general idea is to enable use cases where it is necessary to partially process a signal and then process it further in differing ways. For example, perhaps the following is intuitive:

Are we going to provide standard connectors like tracetotrace?

If use cases such as the one immediately above are thought to be worth supporting, then I think the answer is yes.

To identify some standard connectors, we can consider the input -> output possibilities:

A 1:N connector (1 trace -> 1 trace, 1 metric -> 1 metric, 1 log -> 1 log) would be functionally similar to the FanOutConnector. The purpose of this would be to support replication on demand.
1:1 signal convertors would convert a single signal to a different type. We'd want to consider each conversion case (1 trace -> 1 log, 1 log -> 1 trace, etc.) carefully and determine whether or not there is an agreeable "standard" conversion. If so, then a connector would likely be worth providing for that case.
N:1 convertors would involve operations such as aggregations or summarizations, and could fill in gaps where 1:1 conversions may not make sense. For example, perhaps a N logs : 1 metric connector would summarizes logs by emitting metric counts of observed log levels. We should consider proposals on a case-by-case basis, but I imagine there are some broadly useful scenarios for which we'd want to provide standard connectors.

The tracetometrics is actually the spanmetrics processor that is started in contrib correct?

My specific intention was only to convey that the connector is any arbitrary implementation that converts from traces to metrics. This could be 1 trace : 1 metric or N traces : 1 metric.

We need to be careful when shutting down the pipelines because with the connectors we have a dependency between pipelines now.

I believe that we could process all pipeline components into a DAG, sort the components topologically, and then start/stop according to that ordering. This is the approach used by stanza, and it has worked quite well for us. Certainly, this would be a non-trivial change, but it seems as though it could be applied here.

bogdandrutu · 2021-01-27T18:43:52Z

Everything sounds good and aligned with my understanding. I think we should start doing this.

gramidt · 2021-03-23T13:07:58Z

@djaglowski - Are you still interested in working on this? I remember reading this when you first submitted it, but it wasn't until a few days ago I needed it.

See:

https://cloud-native.slack.com/archives/C01N6P7KR6W/p1616472939047100

https://cloud-native.slack.com/archives/C01N6P7KR6W/p1616473342050300

I'll be immediately creating a more naive implementation to solve my immediate use case (custom receiver or custom exporter loopback), but would be interested in working with you on this proposal.

@bogdandrutu - Does this proposal still align with the collector? Are there new strategies that need to be considered?

djaglowski · 2021-03-23T16:20:31Z

@gramidt Unfortunately, I don't have enough capacity to take this on in the near future.

If you want to take it, I'm happy to be involved in review, design discussions, etc.

gramidt · 2021-03-23T17:04:27Z

Thank you for the prompt response, @djaglowski! No worries at all.

Based on our products, I feel this would be a much needed enabler. I'm going to review this with some other teams to gather some additional interest in helping and determine potential timelines when resources would be available.

bogdandrutu · 2021-09-14T10:35:04Z

@gramidt any update on this?

gramidt · 2021-09-16T13:02:16Z

@bogdandrutu - Sadly, no updates. The primary focus is on open-telemetry/oteps#171 right now.

gramidt · 2022-07-11T18:16:25Z

Happy Monday, OTel Community!

Do any other companies have an interest in this proposal progressing?

gfonseca-tc · 2022-08-04T22:40:28Z

I would love to see that feature in place! Mostly for connecting pipelines, rather than translating signals.

gfonseca-tc · 2022-08-11T14:44:57Z

Maybe it would make sense to separate the features, the first being connecting pipelines of the same type and the second translating between the different signals.

djaglowski · 2022-08-11T16:52:47Z

Yes, I think so. Although the translation use case is what motivated this proposal, it ultimately is a proposal for connectors. Actual signal translation would be implemented afterwards for specific translation use cases.

gbbr · 2022-08-16T12:41:59Z

Hello everyone and @djaglowski!

Thanks for putting together this obviously well thought out design. I am interested in the solution here because I am trying to solve a similar problem at Datadog relating very much to metrics computed on the basis of traces. The way that the collector is currently designed, it is possible for us to automatically detect the existing exporter using the experimental GetExporters method from the Host type, just like the spanmetricsprocessor does. Generally, this works smoothly given that the exporter is already part of a pipeline. When it's not, or a dedicated exporter is needed, then the dummy pipelines (and receivers) come in and the design fails, signalling the need for a better solution...

I have a few questions:

Besides the pain point I've described in the previous paragraph, where dummy pipelines could be potentially needed, resulting in a hacky setup - what other problems would you be solving with this design? Would it be the fact that the new pipeline that you are connecting to would allow applying additional processors on top of the exported signals, rather than exporting them directly? Or that GetExporters is being removed, being marked as experimental? If so, both of these are good arguments which you should be adding into the reasoning for this proposal.
Rather than introducing a new concept ("connectors"), have you considered slightly changing the syntax of the exporter, in such a way as to do something like:

services:
  pipelines:
    a:
      receiver: [tracereceiver]
      processors: [batch]
      exporters: [pipeline:b] # pipeline "b" will be handling the exporting
    b:
      # requires no receiver, or we can come up with a predefined one
      processors: [...]
      exporters: [...]

The above (or similar) would be easier to follow in my opinion, without the need to add new concepts.

If your design is implemented, how would the config look like for the spanmetricsprocessor and what would be the benefits?

Apologies if these things are obvious to you, but I think they are worthwhile pointing out and underlining here, specifically because you are referring to the spanmetricsprocessor. It would be helpful if you could update your design to better illustrate and underline the problem being solved and the benefits being gained. I'm fairly new here, so I may be missing context, please consider that 🙂

Thanks!

djaglowski · 2022-08-17T15:59:21Z

Would it be the fact that the new pipeline that you are connecting to would allow applying additional processors on top of the exported signals

There are probably too many use cases to articulate, but anything that involves signal replication/translation, or differentiated processing/routing/export could use this feature.

exporters: [pipeline:b] # pipeline "b" will be handling the exporting

I think the connectors concept is cleaner because it does not blur the line between "components" and "group of components" (pipeline). Sending data "to a pipeline" feels ambiguous until you mentally step into the pipeline and consider its components, so as a user myself I would want to just think directly about how the components are connected to each other. This correlates directly to the notion of a (directed acyclic) graph.

how would the config look like for the spanmetricsprocessor and what would be the benefits?

Probably the processor would be a connector instead, and the metrics_exporter field would be removed. The primary benefit would be that we have a data flow model that is fully independent of individual components. In other words, we do not end up with lots of unique implementations of how to route data between pipelines.

gbbr · 2022-08-18T07:00:11Z

There are probably too many use cases to articulate [...]

That's great! It means that this design has the potential to solve a lot of problems! I'm happy to help rubber duck and brain storm to find a good solution if you like. In order for me to be useful, it would help me if you could articulate a few use cases (even one or two). Having none described concerns me because I may be thinking of a different use case than you and then we would be talking about two different things without knowing and we'd be stalling progress.

I think the connectors concept is cleaner because it does not blur the line between "components" and "group of components" (pipeline). Sending data "to a pipeline" feels ambiguous until you mentally step into the pipeline and consider its components, so as a user myself I would want to just think directly about how the components are connected to each other. This correlates directly to the notion of a (directed acyclic) graph.

Thanks for answering. I'll try to explain the parts that are slightly holding me back, maybe you can help align me:

The exporters field of the pipeline can now have two component types: connector or exporter, whereas previously it allowed only one type: exporter, which was intuitively named the same as the field itself (exporters). When looking at a configuration file, it will be hard to tell by reading a list of exporters which is an exporter and which is a connector.
The receivers field can now also have two types: connector or receiver. Same story as the previous point.
Isn't it ambiguous if the data goes to a connector which may show up in multiple pipelines as a receiver? And that's not immediately obvious from reading the config, you have to search around for the definition of the connector. By potentially specifying the pipeline directly, it becomes immediately obvious.

Because of this, I am wondering if a notation of some sort would enhance the readability of a config file because adding a non-exporter to the exporters array or a non-receiver to the receivers array is counter-intuitive. Perhaps we can keep the connector concept but add a prefix (which is not a great approach), or add a separate entry connectors (in addition to exporters, processors and receivers) where we'd list connectors. I think that should be fine, given that the order of exporters vs connectors is irrelevant.

Given your explanation about how the connector would be used with the span metrics processor, I take it that it would also allow configuration. In that sense, is there any functional difference between a connector and a processor? They both still process signals and pass them further down the line. Because of that, should we not reconsider connecting to a pipeline directly? A processor within that pipeline could easily do everything that the connector would. If we don't like the suggested prefixed syntax (which is not great, I admit), we could have another field once more, perhaps also called connectors which points to a neighbouring pipeline instead of a new component. For example:

services:
  pipelines:
    a:
      receiver: [tracereceiver]
      processors: [batch]
      exporters: [otlp] # "exporters" could be omitted if "connectors" is present
      connectors: [b]   # takes a list of pipelines
    b:
      # doesn't need a receiver, making its usage clear (it's being connected to from elsewhere)
      processors: [...]
      exporters: [...]

I want to make sure we are taking the right decision here. The current architecture of the Collector is quite simple and easy to wrap your head around. We should try our best to keep this otherwise simple design.

Probably the processor would be a connector instead [...]

Nice! I didn't realise until now that this is how you were planning to use it. I think that'd be definitely sweeter than what it is now.

Exactly because of this reason, it is essential to illustrate use cases. The word "probably" indicates an unknown. So it is not clear how and if the connector proposal will solve the existing problem. Please also note that you are considering the connector and processor interchangeable in this case, further underlining my previous point about them having the same functional purpose.

lpegoraro · 2022-08-26T19:46:51Z

One more cheer for this, and let me know if I can do anything to help, I am with a use case that will take advantage with this by a lot.

lpegoraro · 2022-08-29T18:51:35Z

Hi Folks, would this suit my following scenario?

I would like to have something like this setup:

But listen for changes from an external service (Extension)

djaglowski · 2023-04-17T19:37:10Z

Closed by #7045

djaglowski added the feature request label Jan 6, 2021

andrewhsu added enhancement New feature or request area:pipeline priority:p2 Medium release:after-ga spec:logs spec:metrics spec:trace and removed feature request labels Jan 6, 2021

gramidt mentioned this issue Apr 12, 2021

Transform logs to spans (PR WIP - #3003) open-telemetry/opentelemetry-collector-contrib#3071

Closed

tigrannajaryan removed spec:trace spec:logs spec:metrics labels Dec 17, 2021

tigrannajaryan mentioned this issue Jan 5, 2022

Decouple Service from components #4605

Closed

william-tran mentioned this issue Feb 14, 2022

Feature Request: include/exclude rules based matching for RoutingProcessor open-telemetry/opentelemetry-collector-contrib#7561

Closed

djaglowski mentioned this issue May 27, 2022

Proposal for pipeline branching in the OpenTelemetry Collector #5414

Closed

TylerHelmuth mentioned this issue Jul 26, 2022

Fix panic caused by race condition when accessing span attributes open-telemetry/opentelemetry-collector-contrib#12661

Merged

djaglowski mentioned this issue Sep 22, 2022

Connectors prototype #6140

Closed

tigrannajaryan mentioned this issue Nov 4, 2022

Add filereload receiver open-telemetry/opentelemetry-collector-contrib#15660

Closed

djaglowski mentioned this issue Jan 11, 2023

New component: exceptionsconnector open-telemetry/opentelemetry-collector-contrib#17272

Closed

2 tasks

gbbr mentioned this issue Jan 12, 2023

Extract service.pipelines interface #6764

Merged

djaglowski mentioned this issue Jan 23, 2023

New component: count connector open-telemetry/opentelemetry-collector-contrib#17912

Closed

2 tasks

atoulme mentioned this issue Apr 17, 2023

Mark connectors as stable #7539

Merged

djaglowski closed this as completed Apr 17, 2023

VihasMakwana mentioned this issue Oct 18, 2023

[connectors] - Add queue/retry support like exporters #8705

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Signal Translation Proprosal #2336

Signal Translation Proprosal #2336

djaglowski commented Jan 6, 2021 •

edited

Loading

bogdandrutu commented Jan 20, 2021 •

edited

Loading

djaglowski commented Jan 21, 2021 •

edited

Loading

bogdandrutu commented Jan 21, 2021

djaglowski commented Jan 22, 2021

bogdandrutu commented Jan 27, 2021

gramidt commented Mar 23, 2021 •

edited

Loading

djaglowski commented Mar 23, 2021

gramidt commented Mar 23, 2021

bogdandrutu commented Sep 14, 2021

gramidt commented Sep 16, 2021

gramidt commented Jul 11, 2022

gfonseca-tc commented Aug 4, 2022

gfonseca-tc commented Aug 11, 2022

djaglowski commented Aug 11, 2022

gbbr commented Aug 16, 2022 •

edited

Loading

djaglowski commented Aug 17, 2022

gbbr commented Aug 18, 2022 •

edited

Loading

lpegoraro commented Aug 26, 2022

lpegoraro commented Aug 29, 2022 •

edited

Loading

djaglowski commented Apr 17, 2023

Signal Translation Proprosal #2336

Signal Translation Proprosal #2336

Comments

djaglowski commented Jan 6, 2021 • edited Loading

Rationale

Problems with Loosely Connected Pipelines

Configuration

Additional Suggestions

Rules for Connectors

Implementation

bogdandrutu commented Jan 20, 2021 • edited Loading

djaglowski commented Jan 21, 2021 • edited Loading

bogdandrutu commented Jan 21, 2021

djaglowski commented Jan 22, 2021

bogdandrutu commented Jan 27, 2021

gramidt commented Mar 23, 2021 • edited Loading

djaglowski commented Mar 23, 2021

gramidt commented Mar 23, 2021

bogdandrutu commented Sep 14, 2021

gramidt commented Sep 16, 2021

gramidt commented Jul 11, 2022

gfonseca-tc commented Aug 4, 2022

gfonseca-tc commented Aug 11, 2022

djaglowski commented Aug 11, 2022

gbbr commented Aug 16, 2022 • edited Loading

djaglowski commented Aug 17, 2022

gbbr commented Aug 18, 2022 • edited Loading

lpegoraro commented Aug 26, 2022

lpegoraro commented Aug 29, 2022 • edited Loading

djaglowski commented Apr 17, 2023

djaglowski commented Jan 6, 2021 •

edited

Loading

bogdandrutu commented Jan 20, 2021 •

edited

Loading

djaglowski commented Jan 21, 2021 •

edited

Loading

gramidt commented Mar 23, 2021 •

edited

Loading

gbbr commented Aug 16, 2022 •

edited

Loading

gbbr commented Aug 18, 2022 •

edited

Loading

lpegoraro commented Aug 29, 2022 •

edited

Loading