New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Minior typo in a lhv yaml stop entire cluster from working #2423
Comments
finally it passed election, but crash quickly with:
engine image was deployed early, but since no volume use it anymore, it get clean up and removed for some reason. |
Problem addressed, there is a lhv has recurring backup configured, and it seem there is typo in it. this minor typo shutdown entire cluster! This is unacceptable for a distributed system, I would say. |
Thanks for raising this issue. Could you help update the reproducible steps? also, provide the support bundle to help quickly identify the cause. Thanks. |
Steps to reproduce:
Might need two engine image in the cluster since it is this line stop longhorn-manager to startup
|
Can you share the yaml you've used for |
I do a kubectl edit lhv, and as far as I remember, it is something like above. |
@c3y1huang I don't think the main issue here is to reproduce it, but to review the over architecture here. An error in an individual volume shall not prevent the entire cluster from working. What do you say @joshimoo @yasker ? |
The error comes from this line which prevents the Longhorn manager pods from starting. In order to prevent the problem that users accidentally update Longhorn CRs with invalid values and thus nuke down the Longhorn system, we can use schema validation for the Longhorn CRDs: |
Reference to #604, we need to improve CRD to have structural schemas and ValidatingAdmissionWebhook. |
Just leaving a note here, we were planning to look at schema validation as part of the api refactor issue: |
We have addressed this issue by CRD structural schema. |
|
@innobead @longhorn/qa |
Agreed. Let's see if any issues related to validation later. |
Describe the bug
A clear and concise description of what the bug is.
longhorn manager failed to startup
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Log
If applicable, add the Longhorn managers' log when the issue happens.
You can also attach a Support Bundle here. You can generate a Support Bundle using the link at the footer of the Longhorn UI.
Environment:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: