Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference Graph error response handling #3039

Merged
merged 58 commits into from
Sep 10, 2023

Conversation

rachitchauhan43
Copy link
Contributor

@rachitchauhan43 rachitchauhan43 commented Jul 19, 2023

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2484

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevent result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A

  • Test B

  • Logs

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Release note:

Add dependency field(Soft/Hard) to the inference graph steps.

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rachitchauhan43
To complete the pull request process, please assign njhill after the PR has been reviewed.
You can assign the PR to them by writing /assign @njhill in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rachitchauhan43 rachitchauhan43 changed the title First commit IG response code. Jul 19, 2023
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
@rachitchauhan43
Copy link
Contributor Author

/test ?

@kserve-oss-bot
Copy link
Collaborator

@rachitchauhan43: No presubmit jobs available for kserve/kserve@master

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kserve-oss-bot
Copy link
Collaborator

@rachitchauhan43: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
…sier debugging

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
… tests

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
…r job as well as new tests have been added

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
cmd/router/main_test.go Outdated Show resolved Hide resolved
cmd/router/main.go Show resolved Hide resolved
cmd/router/main.go Outdated Show resolved Hide resolved
cmd/router/main.go Outdated Show resolved Hide resolved
rachitchauhan43 and others added 5 commits August 30, 2023 12:49
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Rachit Chauhan <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
cmd/router/utils.go Outdated Show resolved Hide resolved
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
@yuzisun
Copy link
Member

yuzisun commented Sep 7, 2023

@rachitchauhan43 There seems to be resource issues possibly after changing test parallelism to 2?

isvc-sklearn-runtime-predictor-00001" failed with message: 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod...

https://github.com/kserve/kserve/actions/runs/6097857892/job/16547655921?pr=3039

…urce contention issues.

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
@rachitchauhan43
Copy link
Contributor Author

@rachitchauhan43 There seems to be resource issues possibly after changing test parallelism to 2?

isvc-sklearn-runtime-predictor-00001" failed with message: 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod...

https://github.com/kserve/kserve/actions/runs/6097857892/job/16547655921?pr=3039

Ok. Removed parallelism and bumping timeout to 120 mins to avoid job timeout issue.

@yuzisun
Copy link
Member

yuzisun commented Sep 10, 2023

Awesome work @rachitchauhan43 !

/lgtm

@yuzisun yuzisun merged commit a73c949 into kserve:master Sep 10, 2023
57 of 58 checks passed
@yuzisun
Copy link
Member

yuzisun commented Sep 10, 2023

/approve

@yuzisun
Copy link
Member

yuzisun commented Sep 10, 2023

@rachitchauhan43 Created a doc issue for you kserve/website#289

israel-hdez added a commit to israel-hdez/kserve that referenced this pull request Sep 19, 2023
Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
Iamlovingit pushed a commit to Iamlovingit/kserve that referenced this pull request Oct 1, 2023
* First commit

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Removing unused code

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Added handling for ensemble use case

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Bumping up resources for ISVCs to come up quickly

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Using latest version of controller-gen

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Adding in testing resources

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Reverting res req and limits

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Enabling live logging for pytest to enable logs to prin to cli for easier debugging

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Adding images build step in e2e tests.

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Fixing test to use service_name

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Adding build of router image to use latest router image everytime for tests

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Adding router enrty for kserve deps setup to update router for IGs

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Fixed linting errors

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Asserting in pythonic way

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Updating kserve version to 0.11.0

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Adding more e2e tests

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Some refactoring to run tests locally easily

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Adding some test deps and adding more tests

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Fixing a bug and Adding tests for switch use cases

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Fixing a bug in e2e-test script

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Added e2e test cases for ensemble.

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Reducing parallelism to avoid pressure on minikube and timeouts

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Bump up timeout interval for the action.

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Bump up timeout interval for the test_triton_runtime_with_transformer as this ISVC has both predictor and transformer that needs to come up and it fails sometime with ISVC Timeout error.

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Running path-based routing tests as their own job

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Moved path-based-routing test as it's own job

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Reducing parallelism to 2

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Reducing parallelism to 2 for test-fast job and increasing timeout for job as well as new tests have been added

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

* Update test/e2e/graph/test-resources/ig_test_ensemble_scenario_7.yaml

Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: Rachit Chauhan <rachitchauhan43@gmail.com

* Removing parallelism and bumping up timeout to 120 mins to avoid resource contention issues.

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>

---------

Signed-off-by: rachitchauhan43 <rachitchauhan43@gmail.com>
Signed-off-by: Rachit Chauhan <rachitchauhan43@gmail.com>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Signed-off-by: iamlovingit <freecode666@gmail.com>
@yuzisun yuzisun changed the title IG response code. Inference Graph error response handling Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inference Graph’s response code should be influenced by node’s response code
4 participants