Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistencies possible between levels of assisted-install in hub and agent in discovery iso deployments #4932

Closed
gmarcy opened this issue Jan 24, 2023 · 6 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@gmarcy
Copy link
Contributor

gmarcy commented Jan 24, 2023

I was trying to install OCP 4.11.20 SNO with assisted-installer and the cluster node was stuck with this message repeating in the agent container log

time="2023-01-22T21:43:46Z" level=error msg="Failed to update node nuc10 installation status" func="github.com/openshift/assisted-installer/src/assisted_installer_controller.(*controller).waitAndUpdateNodesStatus" file="/go/src/github.com/openshift/assisted-installer/src/assisted_installer_controller/assisted_installer_controller.go:255" error="response status code does not match any response statuses defined for this endpoint in the swagger spec (status 409): {}" request_id=b86c5c8c-e90c-4aab-b6c5-aeb7893d0931

and on the server the assisted-service container log had this repeating

time="2023-01-22T21:43:46Z" level=error msg="failed to update host d001adcc-f6cc-f1cb-7b10-bb5011add461 progress" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).V2UpdateHostInstallProgressInternal" file="/assisted-service/internal/bminventory/inventory.go:5301" error="Stages Joined isn't available for host role master bootstrap true" go-id=430202 host_id=d001adcc-f6cc-f1cb-7b10-bb5011add461 infra_env_id=f092fd1f-732a-49f6-a4b2-bc1c8fbd8952 pkg=Inventory request_id=7d8a05d4-dd03-46a8-acf9-17eb8ca71937
time="2023-01-22T21:43:46Z" level=info msg="Update host d001adcc-f6cc-f1cb-7b10-bb5011add461 install progress" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).V2UpdateHostInstallProgressInternal" file="/assisted-service/internal/bminventory/inventory.go:5283" go-id=430202 host_id=d001adcc-f6cc-f1cb-7b10-bb5011add461 infra_env_id=f092fd1f-732a-49f6-a4b2-bc1c8fbd8952 pkg=Inventory request_id=7d8a05d4-dd03-46a8-acf9-17eb8ca71937

I was able to reproduce on other h/w. My theory is that some incompatibility occurred between the hub which was running for several weeks on an older release and the cluster node that started with a newer release of the agent.

I would like to suggest that perhaps it would be better to reference a sha digest of a compatible agent in the generated discover iso instead of using the latest tag, i.e. choose a better default for AgentDockerImg than quay.io/edge-infrastructure/assisted-installer-agent:latest

@filanov
Copy link
Contributor

filanov commented Feb 1, 2023

Recently we introduced agent upgrade mechanism that should resolve those issues, i think that local deployments that will use latest by default will still have those issues because usually the are used for local testing and not suppose to live for long.
An option to resolve this issue is to use tags and not latest. so for example you can set all images to use the same tag and this issue will be resolved.

@gmarcy
Copy link
Contributor Author

gmarcy commented Feb 1, 2023

I am not sure what you mean by local deployments or how you characterize their behavior as somehow different. this is a long term deployment of a release that you provided from this repo. those releases should be using tags, or as I suggest shas, so that they are usable for long term deployments. to suggest this is my problem to solve says to me that you don't support your own releases.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 3, 2023
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 2, 2023
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this as completed Jul 3, 2023
@openshift-ci
Copy link

openshift-ci bot commented Jul 3, 2023

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants