-
Notifications
You must be signed in to change notification settings - Fork 655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit a metric and an alert if a task gets stuck in Initializing for "long" #588
Comments
Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com>
Signed-off-by: eduardo apolinario <eapolinario@users.noreply.github.com> Co-authored-by: eduardo apolinario <eapolinario@users.noreply.github.com>
Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏 |
Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏 |
Motivation: Why do you think this is important?
Jobs often get stuck in initializing. This causes issues with downstream jobs and SLA. Typically the workflow owner isn't aware that the job is not running until a downstream issue is triggered.
Goal: What should the final outcome look like, ideally?
Metric for workflow time in initializing.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Flyte component
[Optional] Propose: Link/Inline
If you have ideas about the implementation please propose the change. If inline keep it short, if larger then you link to an external document.
Additional context
Add any other context or screenshots about the feature request here.
Is this a blocker for you to adopt Flyte
Please let us know if this makes it impossible to adopt Flyte
The text was updated successfully, but these errors were encountered: