Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return okay if get 404 when trying to delete an Instance #420

Merged
merged 3 commits into from Nov 15, 2021

Conversation

kate-goldenring
Copy link
Contributor

What this PR does / why we need it:
For shared devices, one Agent may try to delete an instance after it has already been deleted by another Agent on a different node. Due to a bug with this, Agents are sometime erroring out like in the following:

[2021-11-15T16:05:25Z TRACE akri_shared::akri::instance] delete_instance enter
[2021-11-15T16:05:25Z TRACE akri_shared::akri::instance] delete_instance instances_client.delete(name, &instance_delete_params).await?
[2021-11-15T16:05:25Z TRACE akri_shared::akri::instance] delete_instance kube_client.request returned kube error: ErrorResponse { status: "Failure", message: "instances.akri.sh \"akri-debug-echo-18f8cc\" not found", reason: "NotFound", code: 404 }
[2021-11-15T16:05:25Z TRACE akri_shared::akri::instance] find_instance enter
[2021-11-15T16:05:25Z TRACE akri_shared::akri::instance] find_instance getting instance with name akri-debug-echo-18f8cc
[2021-11-15T16:05:25Z TRACE akri_shared::akri::instance] find_instance kube_client.request returned kube error: ErrorResponse { status: "Failure", message: "instances.akri.sh \"akri-debug-echo-18f8cc\" not found", reason: "NotFound", code: 404 }
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: instances.akri.sh "akri-debug-echo-18f8cc" not found: NotFound', agent/src/util/config_action.rs:82:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: JoinError::Panic(...)', agent/src/main.rs:88:14
Error: JoinError::Panic(...)

This is because try_delete_instance was improperly trying to parse the error returned by delete_instance. Also, no find_instance check is needed if try_delete_instance properly looks for a 404 error, signifying that the instance was already deleted.

Special notes for your reviewer:

If applicable:

  • this PR has an associated PR with documentation in akri-docs
  • this PR contains unit tests
  • added code adheres to standard Rust formatting (cargo fmt)
  • code builds properly (cargo build)
  • code is free of common mistakes (cargo clippy)
  • all Akri tests succeed (cargo test)
  • inline documentation builds (cargo doc)
  • version has been updated appropriately (./version.sh)
  • all commits pass the DCO bot check by being signed off -- see the failing DCO check for instructions on how to retroactively sign commits

@kate-goldenring kate-goldenring added the bug Something isn't working label Nov 15, 2021
Signed-off-by: Kate Goldenring <kate.goldenring@microsoft.com>
Signed-off-by: Kate Goldenring <kate.goldenring@microsoft.com>
Signed-off-by: Kate Goldenring <kate.goldenring@microsoft.com>
@kate-goldenring kate-goldenring merged commit 21bc8ef into project-akri:main Nov 15, 2021
@kate-goldenring kate-goldenring deleted the catch-404-error branch November 15, 2021 18:26
vincepnguyen pushed a commit that referenced this pull request Nov 23, 2021
Signed-off-by: vincepnguyen <70007233+vincepnguyen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants