Skip to content

Commit

Permalink
VM Instance Initialization: Driver.InitializeMachine (#898)
Browse files Browse the repository at this point in the history
* New provider Op: Driver.InitializeMachine

* fixed godoc

* fix: handle NotFound properly in fake

* addressed second round of review-changes for code

* updated error code docs for InitializeMachine

* add  error code row for GetMachineStatus

* remmoved Lastknownstate

* removed accident check-in

* fixed accidental tyo

* fixed desc for codes.Internal for InitMachine
  • Loading branch information
elankath committed Apr 3, 2024
1 parent 10806c2 commit 0e30203
Show file tree
Hide file tree
Showing 9 changed files with 260 additions and 6 deletions.
2 changes: 1 addition & 1 deletion docs/development/cp_support_new.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ The contract between the Machine Controller Manager (MCM) and the Machine Contro
1. Fill in the methods described at `pkg/provider/core.go` to manage VMs on your cloud provider. Comments are provided above each method to help you fill them up with desired `REQUEST` and `RESPONSE` parameters.
- A sample provider implementation for these methods can be found [here](https://github.com/gardener/machine-controller-manager-provider-aws/blob/master/pkg/aws/core.go).
- Fill in the required methods `CreateMachine()`, and `DeleteMachine()` methods.
- Optionally fill in methods like `GetMachineStatus()`, `ListMachines()`, and `GetVolumeIDs()`. You may choose to fill these once the working of the required methods seems to be working.
- Optionally fill in methods like `GetMachineStatus()`, `InitializeMachine`, `ListMachines()`, and `GetVolumeIDs()`. You may choose to fill these once the working of the required methods seems to be working.
- `GetVolumeIDs()` expects VolumeIDs to be decoded from the volumeSpec based on the cloud provider.
- There is also an OPTIONAL method `GenerateMachineClassForMigration()` that helps in migration of `{ProviderSpecific}MachineClass` to `MachineClass` CR (custom resource). This only makes sense if you have an existing implementation (in-tree) acting on different CRD types. You would like to migrate this. If not, you MUST return an error (machine error UNIMPLEMENTED) to avoid processing this step.
1. Perform validation of APIs that you have described and make it a part of your methods as required at each request.
Expand Down
59 changes: 59 additions & 0 deletions docs/development/machine_error_codes.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,64 @@ The MCM MUST implement the specified error recovery behavior when it encounters
The status `message` MUST contain a human readable description of error, if the status `code` is not `OK`.
This string MAY be surfaced by MCM to end users.

#### `InitializeMachine`

Provider can OPTIONALLY implement this driver call. Else should return a `UNIMPLEMENTED` status in error.
This interface method will be called by the MCM to initialize a new VM just after creation. This can be used to configure network configuration etc.

- This call requests the provider to initialize a newly created VM backing the machine-object.
- The `InitializeMachineResponse` returned by this method is expected to return
- `ProviderID` that uniquely identifys the VM at the provider. This is expected to match with the `node.Spec.ProviderID` on the node object.
- `NodeName` that is the expected name of the machine when it joins the cluster. It must match with the node name.

```protobuf
// InitializeMachine call is responsible for VM initialization on the provider.
InitializeMachine(context.Context, *InitializeMachineRequest) (*InitializeMachineResponse, error)
// InitializeMachineRequest encapsulates params for the VM Initialization operation (Driver.InitializeMachine).
type InitializeMachineRequest struct {
// Machine object representing VM that must be initialized
Machine *v1alpha1.Machine
// MachineClass backing the machine object
MachineClass *v1alpha1.MachineClass
// Secret backing the machineClass object
Secret *corev1.Secret
}
// InitializeMachineResponse is the response for VM instance initialization (Driver.InitializeMachine).
type InitializeMachineResponse struct {
// ProviderID is the unique identification of the VM at the cloud provider.
// ProviderID typically matches with the node.Spec.ProviderID on the node object.
// Eg: gce://project-name/region/vm-ID
ProviderID string
// NodeName is the name of the node-object registered to kubernetes.
NodeName string
}
```

##### InitializeMachine Errors

If the provider is unable to complete the `InitializeMachine` call successfully, it MUST return a non-ok machine code in the machine status.

If the conditions defined below are encountered, the provider MUST return the specified machine error code.
The MCM MUST implement the specified error recovery behavior when it encounters the machine error code.

| machine Code | Condition | Description | Recovery Behavior | Auto Retry Required |
|-----------|-----------|-------------|-------------------|------------|
| 0 OK | Successful | The call was successful in initializing a VM that matches supplied initialization request. The `InitializeMachineResponse` is returned with desired values | | N |
| 5 NOT_FOUND | Timeout | VM Instance for Machine isn't found at provider | Skip Initialization and Continue | N |
| 12 UNIMPLEMENTED | Not implemented | Unimplemented indicates operation is not implemented or not supported/enabled in this service. | Skip Initialization and continue | N |
| 13 INTERNAL | Major error | Means some invariants expected by underlying system has been broken. | Needs investigation and possible intervention to fix this | Y |
| 17 UNINITIALIZED | Failed Initialization| VM Instance could not be initializaed | Initialization is reattempted in next reconcile cycle | Y |

The status `message` MUST contain a human readable description of error, if the status `code` is not `OK`.
This string MAY be surfaced by MCM to end users.


#### `DeleteMachine`

A Provider is REQUIRED to implement this driver call.
Expand Down Expand Up @@ -267,6 +325,7 @@ If the conditions defined below are encountered, the provider MUST return the sp
| 13 INTERNAL | Major error | Means some invariants expected by underlying system has been broken. If you see one of these errors, something is very broken. | Needs manual intervension to fix this | N |
| 14 UNAVAILABLE | Not Available | Unavailable indicates the service is currently unavailable. | Retry operation after sometime | Y |
| 16 UNAUTHENTICATED | Missing provider credentials | Request does not have valid authentication credentials for the operation | Fix the provider credentials | N |
| 17 UNINITIALIZED | Failed Initialization| VM Instance could not be initializaed | Initialization is reattempted in next reconcile cycle | N |

The status `message` MUST contain a human readable description of error, if the status `code` is not `OK`.
This string MAY be surfaced by MCM to end users.
Expand Down
34 changes: 33 additions & 1 deletion pkg/util/provider/driver/driver.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,23 @@ package driver
import (
"context"

"github.com/gardener/machine-controller-manager/pkg/apis/machine/v1alpha1"
corev1 "k8s.io/api/core/v1"

"github.com/gardener/machine-controller-manager/pkg/apis/machine/v1alpha1"
)

// Driver is the common interface for creation/deletion of the VMs over different cloud-providers.
type Driver interface {
// CreateMachine call is responsible for VM creation on the provider
CreateMachine(context.Context, *CreateMachineRequest) (*CreateMachineResponse, error)
// InitializeMachine call is responsible for VM initialization on the provider.
// This method should only be invoked as a post VM creation initialization to configure network configuration etc.
//
// In case of an error, this operation should return an error with one of the following status codes
// - codes.Unimplemented if the provider does not support VM instance initialization.
// - codes.Uninitialized initialization of VM instance failed due to errors
// - codes.NotFound if VM instance was not found.
InitializeMachine(context.Context, *InitializeMachineRequest) (*InitializeMachineResponse, error)
// DeleteMachine call is responsible for VM deletion/termination on the provider
DeleteMachine(context.Context, *DeleteMachineRequest) (*DeleteMachineResponse, error)
// GetMachineStatus call get's the status of the VM backing the machine object on the provider
Expand Down Expand Up @@ -53,6 +62,29 @@ type CreateMachineResponse struct {
LastKnownState string
}

// InitializeMachineRequest encapsulates params for the VM Initialization operation (Driver.InitializeMachine).
type InitializeMachineRequest struct {
// Machine object representing VM that must be initialized
Machine *v1alpha1.Machine

// MachineClass backing the machine object
MachineClass *v1alpha1.MachineClass

// Secret backing the machineClass object
Secret *corev1.Secret
}

// InitializeMachineResponse is the response for VM instance initialization (Driver.InitializeMachine).
type InitializeMachineResponse struct {
// ProviderID is the unique identification of the VM at the cloud provider.
// ProviderID typically matches with the node.Spec.ProviderID on the node object.
// Eg: gce://project-name/region/vm-ID
ProviderID string

// NodeName is the name of the node-object registered to kubernetes.
NodeName string
}

// DeleteMachineRequest is the delete request for VM deletion
type DeleteMachineRequest struct {
// Machine object from whom VM is to be deleted
Expand Down
22 changes: 21 additions & 1 deletion pkg/util/provider/driver/fake.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,27 @@ func (d *FakeDriver) CreateMachine(ctx context.Context, createMachineRequest *Cr
return nil, d.Err
}

// InitializeMachine makes a call to the driver to initialize the VM instance of machine.
func (d *FakeDriver) InitializeMachine(ctx context.Context, initMachineRequest *InitializeMachineRequest) (*InitializeMachineResponse, error) {
sErr, ok := status.FromError(d.Err)
if ok && sErr != nil {
switch sErr.Code() {
case codes.NotFound:
d.VMExists = false
return nil, d.Err
case codes.Unimplemented:
break
default:
return nil, d.Err
}
}
d.VMExists = true
return &InitializeMachineResponse{
ProviderID: d.ProviderID,
NodeName: d.NodeName,
}, d.Err
}

// DeleteMachine make a call to the driver to delete the machine.
func (d *FakeDriver) DeleteMachine(ctx context.Context, deleteMachineRequest *DeleteMachineRequest) (*DeleteMachineResponse, error) {
d.VMExists = false
Expand All @@ -81,7 +102,6 @@ func (d *FakeDriver) GetMachineStatus(ctx context.Context, getMachineStatusReque
case d.Err != nil:
return nil, d.Err
}

return &GetMachineStatusResponse{
ProviderID: d.ProviderID,
NodeName: d.NodeName,
Expand Down
2 changes: 2 additions & 0 deletions pkg/util/provider/machinecodes/codes/code_string.go
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ func (c Code) String() string {
return "DataLoss"
case Unauthenticated:
return "Unauthenticated"
case Uninitialized:
return "Uninitialized"
default:
return "Code(" + strconv.FormatInt(int64(c), 10) + ")"
}
Expand Down
6 changes: 6 additions & 0 deletions pkg/util/provider/machinecodes/codes/codes.go
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,11 @@ const (
// Unauthenticated indicates the request does not have valid
// authentication credentials for the operation.
Unauthenticated Code = 16

// Uninitialized indicates that the VM instance initialization was not performed.
// This is meant to be used by providers in implementation of
// [github.com/gardener/machine-controller-manager/pkg/util/provider/driver.Driver.GetMachineStatus]
Uninitialized Code = 17
)

var strToCode = map[string]Code{
Expand All @@ -164,6 +169,7 @@ var strToCode = map[string]Code{
"Unavailable": Unavailable,
"DataLoss": DataLoss,
"Unauthenticated": Unauthenticated,
"Uninitialized": Uninitialized,
}

// StringToCode coverts string into the Code.
Expand Down
71 changes: 68 additions & 3 deletions pkg/util/provider/machinecontroller/machine.go
Original file line number Diff line number Diff line change
Expand Up @@ -349,8 +349,9 @@ func (c *controller) triggerCreationFlow(ctx context.Context, createMachineReque
nodeName, providerID string

// Initializations
machine = createMachineRequest.Machine
machineName = createMachineRequest.Machine.Name
machine = createMachineRequest.Machine
machineName = createMachineRequest.Machine.Name
uninitializedMachine = false
)

// we should avoid mutating Secret, since it goes all the way into the Informer's store
Expand Down Expand Up @@ -379,6 +380,7 @@ func (c *controller) triggerCreationFlow(ctx context.Context, createMachineReque
klog.Errorf("Error occurred while decoding machine error for machine %q: %s", machine.Name, err)
return machineutils.MediumRetry, err
}
klog.Warningf("For machine %q, obtained VM error status as: %s", machineErr)

// Decoding machine error code
switch machineErr.Code() {
Expand All @@ -392,7 +394,7 @@ func (c *controller) triggerCreationFlow(ctx context.Context, createMachineReque
klog.V(2).Infof("The machine creation is triggered with timeout of %s", c.getEffectiveCreationTimeout(createMachineRequest.Machine).Duration)
createMachineResponse, err := c.driver.CreateMachine(ctx, createMachineRequest)
if err != nil {
// Create call returned an error.
// Create call returned an error
klog.Errorf("Error while creating machine %s: %s", machine.Name, err.Error())
return c.machineCreateErrorHandler(ctx, machine, createMachineResponse, err)
}
Expand Down Expand Up @@ -456,6 +458,7 @@ func (c *controller) triggerCreationFlow(ctx context.Context, createMachineReque
klog.V(2).Infof("Machine %q marked Failed as VM was referring to a stale node object", machine.Name)
return machineutils.ShortRetry, err
}
uninitializedMachine = true
} else {
// if node label present that means there must be a backing VM ,without need of GetMachineStatus() call
nodeName = machine.Labels[v1alpha1.NodeLabelKey]
Expand Down Expand Up @@ -485,6 +488,10 @@ func (c *controller) triggerCreationFlow(ctx context.Context, createMachineReque

return machineutils.ShortRetry, err

case codes.Uninitialized:
uninitializedMachine = true
klog.Infof("VM instance associated with machine %s was created but not initialized.", machine.Name)

default:
updateRetryPeriod, updateErr := c.machineStatusUpdate(
ctx,
Expand Down Expand Up @@ -515,6 +522,13 @@ func (c *controller) triggerCreationFlow(ctx context.Context, createMachineReque
providerID = getMachineStatusResponse.ProviderID
}

if uninitializedMachine {
retryPeriod, err := c.initializeMachine(ctx, createMachineRequest.Machine, createMachineRequest.MachineClass, createMachineRequest.Secret)
if err != nil {
return retryPeriod, err
}
}

_, machineNodeLabelPresent := createMachineRequest.Machine.Labels[v1alpha1.NodeLabelKey]
_, machinePriorityAnnotationPresent := createMachineRequest.Machine.Annotations[machineutils.MachinePriority]

Expand Down Expand Up @@ -574,6 +588,57 @@ func (c *controller) triggerCreationFlow(ctx context.Context, createMachineReque
return machineutils.LongRetry, nil
}

func (c *controller) initializeMachine(ctx context.Context, machine *v1alpha1.Machine, machineClass *v1alpha1.MachineClass, secret *corev1.Secret) (machineutils.RetryPeriod, error) {
req := &driver.InitializeMachineRequest{
Machine: machine,
MachineClass: machineClass,
Secret: secret,
}
klog.V(3).Infof("Initializing VM instance for Machine %q", machine.Name)
resp, err := c.driver.InitializeMachine(ctx, req)
if err != nil {
errStatus, ok := status.FromError(err)
if !ok {
klog.Errorf("Cannot decode Driver error for machine %q: %s. Unexpected behaviour as Driver errors are expected to be of type status.Status", machine.Name, err)
return machineutils.LongRetry, err
}
klog.Errorf("Error occurred while initializing VM instance for machine %q: %s", machine.Name, err)
switch errStatus.Code() {
case codes.Unimplemented:
klog.V(3).Infof("Provider does not support Driver.InitializeMachine - skipping VM instance initialization for %q.", machine.Name)
return 0, nil
case codes.NotFound:
klog.Warningf("No VM instance found for machine %q. Skipping VM instance initialization.", machine.Name)
return 0, nil
}
updateRetryPeriod, updateErr := c.machineStatusUpdate(
ctx,
machine,
v1alpha1.LastOperation{
Description: fmt.Sprintf("Provider error: %s. %s", err.Error(), machineutils.InstanceInitialization),
ErrorCode: errStatus.Code().String(),
State: v1alpha1.MachineStateFailed,
Type: v1alpha1.MachineOperationCreate,
LastUpdateTime: metav1.Now(),
},
v1alpha1.CurrentStatus{
Phase: c.getCreateFailurePhase(machine),
LastUpdateTime: metav1.Now(),
},
machine.Status.LastKnownState,
)

if updateErr != nil {
return updateRetryPeriod, updateErr
}

return machineutils.ShortRetry, err
}

klog.V(3).Infof("VM instance %q for machine %q was initialized", resp.ProviderID, machine.Name)
return 0, nil
}

func (c *controller) triggerDeletionFlow(ctx context.Context, deleteMachineRequest *driver.DeleteMachineRequest) (machineutils.RetryPeriod, error) {
var (
machine = deleteMachineRequest.Machine
Expand Down

0 comments on commit 0e30203

Please sign in to comment.