Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

[Error Spec] fix typo #3793

Merged
merged 2 commits into from Nov 6, 2019
Merged

[Error Spec] fix typo #3793

merged 2 commits into from Nov 6, 2019

Conversation

Binyang2014
Copy link
Contributor

No description provided.

Copy link
Member

@yqwang-ms yqwang-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

Copy link
Member

@yqwang-ms yqwang-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 248 is already unknown, why it is known to be PAIRuntimeInitContainerUnknownError?
Suggest to remove it and leverage 256 is enough.

@Binyang2014
Copy link
Contributor Author

Since 248 is already unknown, why it is known to be PAIRuntimeInitContainerUnknownError?
Suggest to remove it and leverage 256 is enough.

Checked the mail-chain, this exitCode is added by this PR #3695. The reason to add this code is we want to make exit reason more specific, since we can know it's a init container bug.

Without this change, we just know it's a runtime bug, and need to check log to figure out. From PR #3695, we know sometimes pod may contains previous logs. It make us a little confuse about which part cause such error. Add this error type will make debugging much easier.

Anyway, this type of error just provide extra stage info: initializing. If we can get such info from somewhere, it's fine to remove it.

@yqwang-ms
Copy link
Member

It is easy to figure out which container's error. If we only have init container log, it must be init container error, otherwise, it must be app container error.

@Binyang2014
Copy link
Contributor Author

It is easy to figure out which container's error. If we only have init container log, it must be init container error, otherwise, it must be app container error.

Currently, that's true. But I think report the error stage to user/dev is important too. (less confusing and easy to know why). Currently, the error spec format is:

- code: 256
  phrase: PAIRuntimeExitAbnormally
  issuer: PAI_RUNTIME
  causer: PAI_RUNTIME
  type: PLATFORM_FAILURE
  stage: UNKNOWN
  behavior: UNKNOWN
  reaction: RETRY_TO_MAX
  reason: "PAI Runtime exit abnormally with undefined exitcode, it may have bugs"
  repro:
  - "PAI Runtime exits with exitcode 1"
  solution:
  - "Contact PAI Dev to fix PAI Runtime bugs"

How about changing the stage to a dynamic type and let runtime report current stage, such as initializing, running..., (if it could report the stage)?
Just a proposal, will not do this in this PR

@yqwang-ms
Copy link
Member

Spec only contains static fields, soI think you can report "which container" in a new field inside the runtime Dynamic ExitInfo, such as caughtSite?

@yqwang-ms
Copy link
Member

I suggest we first remove the code, because PAIRuntimeInitContainerUnknownError is too general and more confusing than PAIRuntimeExitAbnormally

leverage code 256 for PAIRuntimeExitAbnormally, remove this erorr type
@Binyang2014
Copy link
Contributor Author

I suggest we first remove the code, because PAIRuntimeInitContainerUnknownError is too general and more confusing than PAIRuntimeExitAbnormally

Fine updated

@Binyang2014 Binyang2014 merged commit d77fd62 into master Nov 6, 2019
@Binyang2014 Binyang2014 deleted the binyli/typo branch November 11, 2019 14:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants