Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Integrate open telemetry into Flyte components #3304

Open
2 tasks done
hamersaw opened this issue Feb 1, 2023 · 0 comments
Open
2 tasks done

[Core feature] Integrate open telemetry into Flyte components #3304

hamersaw opened this issue Feb 1, 2023 · 0 comments
Assignees
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request exo
Milestone

Comments

@hamersaw
Copy link
Contributor

hamersaw commented Feb 1, 2023

Motivation: Why do you think this is important?

OpenTelemetry is a distributed tracing framework designed to ease performance analyses in distributed systems. Inline with our performance observability push, this would provide users a more conclusive understanding of Flyte performance. Additionally, it helps debug performance issues and serves as a benchmarking utility for new features.

Goal: What should the final outcome look like, ideally?

OpenTelmetry offers many opportunities for instrumentation. We hope to add support for:

  • grpc connections (ex. FlyteAdmin, datacatalog, FlytePropeller, etc)
  • blobstore I/O
  • k8s API server operations
  • many more

Describe alternatives you've considered

We have considered two main options:
(1) Leaving this as they are: The current state may leave users (or developers) frustrated about system performance with no real explanation.
(2) Enhancing prometheus metrics: Flyte currently exposes many metrics through prometheus, however these metics are often aggregations where fine-grained analysis at the workflow / node / or task level is unavailable.

Propose: Link/Inline OR Additional context

This work is described as "orchestration metrics" in the performance observability RFC.

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@hamersaw hamersaw added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers and removed untriaged This issues has not yet been looked at by the Maintainers labels Feb 1, 2023
@hamersaw hamersaw self-assigned this Feb 1, 2023
@hamersaw hamersaw added this to the 1.4.0 milestone Feb 1, 2023
@cosmicBboy cosmicBboy modified the milestones: 1.4.0, 1.5.0 Mar 6, 2023
@cosmicBboy cosmicBboy modified the milestones: 1.5.0, 1.6.0 Apr 20, 2023
@hamersaw hamersaw added exo backlogged For internal use. Reserved for contributor team workflow. labels Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlogged For internal use. Reserved for contributor team workflow. enhancement New feature or request exo
Projects
None yet
Development

No branches or pull requests

2 participants