Skip to content

Need a supported way to instrument parent/child span relationship in the workflow engine scenario #4766

@vkruglik-aka

Description

@vkruglik-aka

Is your feature request related to a problem? - YES

CONTEXT:
A workflow in a workflow engine consists of multiple steps that are executed in time-dispersed fashion over multiple host machines.

I have read many discussions on this subject, yet no credible solutions have been offered.

THE PROBLEM:
The problem is that using the supported/documented API, one ends up with Child Spans that outlive their parent/Root Span. The web is full of stories how graphing tools get confused and report errors/warnings or just don't display right when child spans outlive their parent.

While I can start the Root Span at the beginning of the workflow am not finding a supported way to reconstitute the same Root Span at a later time likely on a different host and process in such a way that I would be able to end the Root Span. And - no, TraceContextTextMapPropagator().extract() doesn't work for this because it creates an instance of NonRecordingSpan where NonRecordingSpan.end() is a no-op.

WHAT I TRIED:
Below is what I tried and where I ran into the open-telemetry API deficiency which prevented me from succeeding:

  1. Start a Root Span at the beginning of the workflow (Tracer.start_span() probably)
  2. Serialize the Root Span and save it in database to make it available to individual workflow steps (TraceContextTextMapPropagator().inject() probably)
  3. In each step, fetch the saved Root Span carrier from the database, reconstitute it into a context (TraceContextTextMapPropagator().extract()) and use it as parent while creating a Child Span (Tracer.start_as_current_span())
  4. Below is where it gets dicey
  5. After all the workflow steps complete, I need to reconstitute the original Root Span (from step 1) and formally end it. Unfortunately, TraceContextTextMapPropagator().extract() creates an instance of NonRecordingSpan where NonRecordingSpan.end() is a no-op, so I can't properly record the end of the Root Span.

Describe the solution you'd like

A way to deserialize the Root Span into a normal Span (not NonRecordingSpan) at the end of the workflow so that I can properly end the Root Span via its end() method.

Describe alternatives you've considered

I have searched the opentelemetry code and the web and have not found any viable alternatives.

Additional Context

No response

Would you like to implement a fix?

None

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions