Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since 248 is already unknown, why it is known to be PAIRuntimeInitContainerUnknownError?
Suggest to remove it and leverage 256 is enough.
Checked the mail-chain, this exitCode is added by this PR #3695. The reason to add this code is we want to make exit reason more specific, since we can know it's a init container bug. Without this change, we just know it's a runtime bug, and need to check log to figure out. From PR #3695, we know sometimes pod may contains previous logs. It make us a little confuse about which part cause such error. Add this error type will make debugging much easier. Anyway, this type of error just provide extra stage info: |
It is easy to figure out which container's error. If we only have init container log, it must be init container error, otherwise, it must be app container error. |
Currently, that's true. But I think report the error stage to user/dev is important too. (less confusing and easy to know why). Currently, the error spec format is: - code: 256
phrase: PAIRuntimeExitAbnormally
issuer: PAI_RUNTIME
causer: PAI_RUNTIME
type: PLATFORM_FAILURE
stage: UNKNOWN
behavior: UNKNOWN
reaction: RETRY_TO_MAX
reason: "PAI Runtime exit abnormally with undefined exitcode, it may have bugs"
repro:
- "PAI Runtime exits with exitcode 1"
solution:
- "Contact PAI Dev to fix PAI Runtime bugs" How about changing the stage to a dynamic type and let runtime report current stage, such as initializing, running..., (if it could report the stage)? |
Spec only contains static fields, soI think you can report "which container" in a new field inside the runtime Dynamic ExitInfo, such as caughtSite? |
I suggest we first remove the code, because PAIRuntimeInitContainerUnknownError is too general and more confusing than PAIRuntimeExitAbnormally |
leverage code 256 for PAIRuntimeExitAbnormally, remove this erorr type
Fine updated |
No description provided.