forked from metal3-io/cluster-api-provider-baremetal
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No indication that a Machine can't acquire a Host #103
Comments
zaneb
added a commit
to zaneb/cluster-api-provider-baremetal
that referenced
this issue
Sep 12, 2020
If there are no Hosts available to provision a Machine, we should flag an InsufficientResourcesMachineError, which "generally refers to ... running out of physical machines in an on-premise[s] environment". We continue to retry looking for an available Host at intervals, but in the meantime the Machine will be in an error state. This means that the MachineSet will target the Machine for deletion on scale down, in preference to other Machines that might be happily provisioned. Fixes openshift#103
zaneb
added a commit
to zaneb/cluster-api-provider-baremetal
that referenced
this issue
Sep 12, 2020
If there are no Hosts available to provision a Machine, we should flag an InsufficientResourcesMachineError, which "generally refers to ... running out of physical machines in an on-premise[s] environment". We continue to retry looking for an available Host at intervals, but in the meantime the Machine will be in an error state. This means that the MachineSet will target the Machine for deletion on scale down, in preference to other Machines that might be happily provisioned. Fixes openshift#103
zaneb
added a commit
to zaneb/cluster-api-provider-baremetal
that referenced
this issue
Sep 12, 2020
If there are no Hosts available to provision a Machine, we should flag an InsufficientResourcesMachineError, which "generally refers to ... running out of physical machines in an on-premise[s] environment". We continue to retry looking for an available Host at intervals, but in the meantime the Machine will be in an error state. As soon as the Machine is able to acquire a Host for provisioning, clear the error. This means that the MachineSet will target the Machine for deletion on scale down, in preference to other Machines that might be happily provisioned. Fixes openshift#103
zaneb
added a commit
to zaneb/cluster-api-provider-baremetal
that referenced
this issue
Sep 14, 2020
If there are no Hosts available to provision a Machine, we should flag an InsufficientResourcesMachineError, which "generally refers to ... running out of physical machines in an on-premise[s] environment". We continue to retry looking for an available Host at intervals, but in the meantime the Machine will be in an error state. Once the Machine is able to acquire a Host for provisioning, clear the error if the Machine remains in the Provisioning phase (upon reaching the Provisioned phase, the machine controller will automatically clear it). The presence of an error means that the MachineSet will target the Machine for deletion on scale down, in preference to other Machines that might be happily provisioned. Fixes openshift#103
zaneb
added a commit
to zaneb/cluster-api-provider-baremetal
that referenced
this issue
Sep 14, 2020
If there are no Hosts available to provision a Machine, we should flag an InsufficientResourcesMachineError, which "generally refers to ... running out of physical machines in an on-premise[s] environment". We continue to retry looking for an available Host at intervals, but in the meantime the Machine will be in an error state. Once the Machine is able to acquire a Host for provisioning, clear the error if the Machine remains in the Provisioning phase (upon reaching the Provisioned phase, the machine controller will automatically clear it). The presence of an error means that the MachineSet will target the Machine for deletion on scale down, in preference to other Machines that might be happily provisioned. Fixes openshift#103
zaneb
added a commit
to zaneb/cluster-api-provider-baremetal
that referenced
this issue
Sep 15, 2020
If there are no Hosts available to provision a Machine, we should flag an InsufficientResourcesMachineError, which "generally refers to ... running out of physical machines in an on-premise[s] environment". We continue to retry looking for an available Host at intervals, but in the meantime the Machine will be in an error state. Once the Machine is able to acquire a Host for provisioning, clear the error if the Machine remains in the Provisioning phase (upon reaching the Provisioned phase, the machine controller will automatically clear it). The presence of an error means that the MachineSet will target the Machine for deletion on scale down, in preference to other Machines that might be happily provisioned. Fixes openshift#103
zaneb
changed the title
Machines that can't find a Host appear in "Provisioning" phase
No indication that a Machine can't find a Host
Sep 15, 2020
zaneb
changed the title
No indication that a Machine can't find a Host
No indication that a Machine can't acquire a Host
Sep 15, 2020
zaneb
added a commit
to zaneb/cluster-api-provider-baremetal
that referenced
this issue
Sep 15, 2020
If there are no Hosts available to provision a Machine, we should flag an InsufficientResourcesMachineError, which "generally refers to ... running out of physical machines in an on-premise[s] environment". We continue to retry looking for an available Host at intervals, but in the meantime the Machine will be in an error state. Once the Machine is able to acquire a Host for provisioning, clear the error if the Machine remains in the Provisioning phase (upon reaching the Provisioned phase, the machine controller will automatically clear it). The presence of an error means that the MachineSet will target the Machine for deletion on scale down, in preference to other Machines that might be happily provisioned. Fixes openshift#103
honza
pushed a commit
to honza/cluster-api-provider-baremetal
that referenced
this issue
Feb 7, 2022
🏃 Update v1a3 CRDs to add image checksum type and disk format
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
If we create a Machine and there are no Hosts available to provision, the actuator will just leave the Machine in the "Provisioning" phase and it will keep looking for a Host. Currently there is no indication that there is no Host available and the Machine will never exit the Provisioning phase.
This means that if a user e.g. accidentally scales up their MachineSet to a size larger than the number of actual Hosts, scaling down again doesn't just delete the orphans but will by default delete provisioned Machines running real workloads at random. If you scale up by a large enough amount then this will end up deleting all of your workloads with high probability.
Instead, we should flag an error on the Machine, and clear it once a suitable Host is found. This will make the Machine among the first candidates for deletion when scaling down.
The text was updated successfully, but these errors were encountered: