Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Controller] Set function status to unhealthy instead of error during fast-fail check #3165

Merged
merged 11 commits into from
Feb 21, 2024

Conversation

rokatyy
Copy link
Contributor

@rokatyy rokatyy commented Feb 12, 2024

This PR addresses pretty old bug where we set function to error state within the fast-fail controller's mechanism because of either CrashLoop or scheduling issues. So, we could end up with working and healthy function stacked in error state forever, because our approach to Nuclio states assumes that error state is a permanent state which can not be changed per time. While unhealthy state is a state that we monitor with FunctionMonitor component, so it is changeable. In light of this, we decided to set function to unhealthy state in case of any of above described errors occurred.

Also, resolveFailFast function returns not only error, but also a status to which controller should set the function in case of error. For now, we only set function to unhealthy state here, but we plan to add more checks, so this change provides a flexibly for future changes.

Jira - https://jira.iguazeng.com/browse/NUC-26

[SOLVED - Have decided to change a warn log message to show the difference between error and unhealthy states]
Question: it is crucial to say that if function is unhealthy, we will anyway (with current logic) set the status.Logs so that user can understand what the problem is. Taking into account the fact that unhealthy status is monitored by FunctionMonitor we should consider what will happen with the function if it is fixed and is set to ready state. Maybe it worth cleaning up status.logs on that case?

Example:

  1. When unhealthy

image

  1. After finding image (function is ready, but logs are still there)

image

…t is in CrashLoopBack or has scheduling issues (not enough resources)
@rokatyy rokatyy changed the title NUC-26: Set function status to unhealthy status instead of error NUC-26: Set function status to unhealthy status instead of error during fast-fail check Feb 12, 2024
@rokatyy rokatyy changed the title NUC-26: Set function status to unhealthy status instead of error during fast-fail check [Controller] Set function status to unhealthy status instead of error during fast-fail check Feb 12, 2024
@rokatyy rokatyy changed the title [Controller] Set function status to unhealthy status instead of error during fast-fail check [WIP][Controller] Set function status to unhealthy status instead of error during fast-fail check Feb 12, 2024
@TomerShor TomerShor added the 1.13 label Feb 12, 2024
@github-actions github-actions bot added the docs label Feb 12, 2024
@rokatyy rokatyy changed the title [WIP][Controller] Set function status to unhealthy status instead of error during fast-fail check [Controller] Set function status to unhealthy status instead of error during fast-fail check Feb 19, 2024
@rokatyy rokatyy marked this pull request as ready for review February 19, 2024 12:36
Copy link
Contributor

@TomerShor TomerShor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last thing, that was maybe misunderstood from my previous review

Comment on lines 264 to 266
// low severity to not over log in the warning
createFunctionOptions.Logger.DebugWithCtx(ctx, "Function creation failed, brief error message extracted",
"briefErrorsMessage", briefErrorsMessage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this log, with the brief error message (will only be visible in the pod logs)

@@ -290,6 +283,14 @@ func (p *Platform) CreateFunction(ctx context.Context, createFunctionOptions *pl
}
}

if functionStatus.State == functionconfig.FunctionStateUnhealthy {
createFunctionOptions.Logger.WarnWithCtx(ctx, "Function deployment failed, setting state to unhealthy. The issue might be transient or require manual redeployment",
"briefErrorsMessage", briefErrorsMessage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"briefErrorsMessage", briefErrorsMessage)
"err", errorStack.String())

Copy link
Contributor

@TomerShor TomerShor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🚀

@TomerShor TomerShor changed the title [Controller] Set function status to unhealthy status instead of error during fast-fail check [Controller] Set function status to unhealthy instead of error during fast-fail check Feb 21, 2024
@TomerShor TomerShor merged commit 913e711 into nuclio:development Feb 21, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants