Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Fast fail if task resource requests exceed k8s resource limits #488

Merged
merged 9 commits into from
May 5, 2023

Conversation

hamersaw
Copy link
Contributor

@hamersaw hamersaw commented Sep 30, 2022

TL;DR

When encountering a "ResourceExceedsLimits" error from k8s we validate that the task resource requests and limits are below the k8s resource quota. Otherwise the task will never be schedule-able and will forever hang until FlytePropeller terminates it based on node-active-duration.

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

Complete description

^^^

Tracking Issue

fixes flyteorg/flyte#2933

Follow-up issue

NA

Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
@codecov
Copy link

codecov bot commented Sep 30, 2022

Codecov Report

Merging #488 (08b6349) into master (5b50d88) will increase coverage by 0.38%.
The diff coverage is 53.57%.

❗ Current head 08b6349 differs from pull request most recent head d4326e5. Consider uploading reports for the commit d4326e5 to get more accurate results

Signed-off-by: Daniel Rammer <daniel@union.ai>
@hamersaw hamersaw marked this pull request as ready for review September 30, 2022 20:10
@flixr
Copy link
Contributor

flixr commented Jan 10, 2023

This would be helpful! Anything missing here?

Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
Signed-off-by: Daniel Rammer <daniel@union.ai>
@hamersaw hamersaw merged commit f4cadb0 into master May 5, 2023
14 checks passed
@hamersaw hamersaw deleted the feature/fast-fail-on-k8s-resource-limits branch May 5, 2023 21:40
eapolinario pushed a commit to eapolinario/flytepropeller that referenced this pull request Aug 9, 2023
…org#488)

* checking if task resource requests exceed k8s limits

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added better message to task failure

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added request checks

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added tests for checking resource eligibility

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed lint issues

Signed-off-by: Daniel Rammer <daniel@union.ai>

* updated comment

Signed-off-by: Daniel Rammer <daniel@union.ai>

---------

Signed-off-by: Daniel Rammer <daniel@union.ai>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants