Skip to content

Uplift azure#72

Merged
luthermonson merged 2 commits intorancher:masterfrom
aiyengar2:uplift_azure
Apr 17, 2020
Merged

Uplift azure#72
luthermonson merged 2 commits intorancher:masterfrom
aiyengar2:uplift_azure

Conversation

@aiyengar2
Copy link
Copy Markdown

@aiyengar2 aiyengar2 commented Mar 4, 2020

This PR updates github.com/Azure/azure-sdk-for-go to 39.1.0 and github.com/Azure/go-autorest to 13.3.3.

Major updates:

  • Redefines AuthenticateDeviceFlow and AuthenticateServicePrincipal (now renamed AuthenticateClientCredentials) as functions that return Authorizer objects after running ValidateAuthorizer instead of tokens
  • Simplifies / eliminates several helper functions to use newer autorest libraries for Azure authentication
  • Using contexts to make calls via the Azure client and evaluating the status of async requests using Azure futures (which support WaitForCompletionRef and future.Result), which removes redundant Get calls
  • Node driver now implements Kill() as Azure now supports a force kill of an instance
  • Remove isSwarm logic as docker swarm is no longer supported

Misc. Updates:

  • rename Azure Driver.ctx to Driver.deploymentContext to avoid ambiguity with Go contexts
  • Registered Microsoft.Subscription and Microsoft.Resources as resource providers
  • Added some more necessary client logic to clients.go for future commits
  • moved some Azure fmts to naming.go to centralize naming logic in one file
  • Added more structs for different types of Azure resources to cleanup.go to create one common way of cleaning up resources instead of having redundant code
  • Changes supportedEnvironments from a map to a list and adds azure.EnvironmentFromName to get the corresponding environment instead.
  • Renamed HasAttachedResources to CanBeDeleted and flipped boolean logic accordingly

Related Issue: rancher/rancher#25342

@aiyengar2 aiyengar2 force-pushed the uplift_azure branch 4 times, most recently from 4687480 to eee5869 Compare March 4, 2020 19:55
@aiyengar2 aiyengar2 requested a review from a team March 4, 2020 19:59
@aiyengar2
Copy link
Copy Markdown
Author

aiyengar2 commented Mar 4, 2020

Apologize for the large PR; I split this change into 4 separate commits with verbose commit messages to make it easier to read, but a broad majority of the file changes are due to changes in vendor/ from dependency upgrades

Comment thread drivers/azure/azureutil/customimage.go Outdated
ltrimmed[i] = prefixRemoved[1]
} else if i < numExpectedRequiredValues {
// failed to parse required values
log.Info("image parsed", l)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably dont need

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed log in latest commit

@aiyengar2
Copy link
Copy Markdown
Author

Squashed 4 commits to 2 which separate vendor changes from code changes

Copy link
Copy Markdown

@superseb superseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does creating a node with the current driver have the same outcome as creating/replacing a node with this driver? Asking for existing clusters and adding a node after upgrading to a version with this driver for instance.

@aiyengar2
Copy link
Copy Markdown
Author

@superseb yes with the following exceptions:

  • Calling Kill() in the previous version of the driver would log Azure does not implement kill. Calling Stop instead. and proceed to stop a virtual machine via an Azure VM PowerOff API call with no flag for a force stop. In the new driver, Kill() directly calls the Azure VM PowerOff API and passes in a flag for a force stop to Azure.
  • The old driver still responds to opening up Docker Swarm ports based on d.BaseDriver.SwarmMaster here, which the new driver does not support:
    // Open swarm port if configured
    if d.BaseDriver.SwarmMaster {
    swarmHost := d.BaseDriver.SwarmHost
    log.Debugf("Swarm host is configured as %q", swarmHost)
    u, err := url.Parse(swarmHost)
    if err != nil {
    return nil, fmt.Errorf("Cannot parse URL %q: %v", swarmHost, err)
    }
    _, swarmPort, err := net.SplitHostPort(u.Host)
    if err != nil {
    return nil, fmt.Errorf("Could not parse swarm port in %q: %v", u.Host, err)
    }
    rl = append(rl, mkRule(500, "DockerSwarmAllowAny", "Allow swarm manager access (TLS-protected)", "*", swarmPort, network.TCP))
    } else {
    log.Debug("Swarm host is not configured.")
    }
  • If a user to submitted an invalid FaultDomainCount value (i.e. the number of availability zones that you want fault domains in for your VM) or an OS disk name that references an object that exists in another region, the old driver would fail on launch; the new driver fails on PreCreate.
  • The old driver failed if a user supplied an image name that was not in the format publisher:offer:sku:version; the new driver allows both the original format as well as a custom image format (i.e. any string that contains an ARM resource identifier for the object)

Other than the above exceptions, all changes are just updating the underlying Azure APIs, so the behavior for creating a node with the current driver should have the same outcome as creating/replacing a node with this driver.

Comment thread drivers/azure/azureutil/customimage.go Outdated
} else if i < numExpectedRequiredValues {
// failed to parse required values
if i == 3 {
log.Warn(versionNotSuppliedWarnString)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is required when using gallery, can we make it part of the error? The error should contain enough to root cause the issue, and while the generic error probably covers it, we do know what the error is but now its only logged in a different level and on a separate line. If it's not required/breaking then this can be the additional warning

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Should be fixed in the latest commit

Comment thread drivers/azure/azure.go Outdated
if resp.Err != nil {
return err
} else if d.FaultCount > resp.MaximumFaultDomainCount {
return fmt.Errorf("location %s has MaximumPlatformFaultDomainCount=%d, recieved FaultDomainCount=%d",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo received

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! Fixed in the latest commit.

@superseb
Copy link
Copy Markdown

superseb commented Apr 9, 2020

Can we double check there are no fixes we want in later releases of vendored libraries? And can we also check the API versions used are generally available

@aiyengar2
Copy link
Copy Markdown
Author

FYI in my latest commit, I ported a couple of changes / fixes relevant to this PR from #77 since it seems like they make more sense as part of this PR (i.e. uplifting Azure) rather than the NSG PR:

  • Added arm.go: this file will be used to centralize all ARM resource identifier logic as the same logic is used for NSGs in the commit mentioned above as well as for custom images in this commit.
  • Modified customimages.go: now uses the logic within arm.go
  • Modified cleanup.go: Renamed HasAttachedResources to CanBeDeleted and flipped boolean logic accordingly in all cleanup functions
  • Modified azureutil.go: misc. bugs / rewords in RegisterResourceProviders + OSDiskExists + virtualNetworkExists + CreateVirtualMachine + CreateAvailabilitySetIfNotExists
  • Modified sku.go: misc. bug in GetMaximumFaultDomainCount

@aiyengar2
Copy link
Copy Markdown
Author

Can we double check there are no fixes we want in later releases of vendored libraries? And can we also check the API versions used are generally available

Good call! As I was double checking I discovered that github.com/Azure/azure-sdk-for-go/storage will be deprecated soon and realized that we could just use the other storage API now to make all the relevant calls. I've updated the PR accordingly in the latest commit.

Also I just checked / updated the following vendored libraries:

  1. github.com/Azure/azure-sdk-for-go/services/compute/mgmt/2019-12-01/compute
  2. github.com/Azure/azure-sdk-for-go/services/network/mgmt/2019-12-01/network
  3. github.com/Azure/azure-sdk-for-go/services/resources/mgmt/2019-05-01/resources
  4. github.com/Azure/azure-sdk-for-go/services/resources/mgmt/2019-11-01/subscriptions
  5. github.com/Azure/azure-sdk-for-go/services/storage/mgmt/2019-06-01/storage

Afaik there are no more pending fixes that we require from these libraries for machine as they should be in their latest versions. Azure puts any service libraries that are on public preview under github.com/Azure/azure-sdk-for-go/services/preview/<pkg>, so the versioned APIs should also be GA.

@aiyengar2
Copy link
Copy Markdown
Author

This PR will resolve the following issues:

@superseb superseb self-requested a review April 14, 2020 20:21
superseb
superseb previously approved these changes Apr 14, 2020
Comment thread drivers/azure/azureutil/timeouts.go Outdated
@@ -0,0 +1,25 @@
package azureutil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeouts should be moved to where they are needed, one is even exported as it's used in a different package. It also looks like some of these are not used. If we don't need them they can be deleted.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And can these be set somehow? If they can't be they should be changed to a const.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense. Moving them to their respective files in the latest commit.

They shouldn't be set so I'll change them to a const too.

Comment thread drivers/azure/azure.go Outdated
log.Warn("Timed out trying to parse the MaximumFaultDomainCount. Continuing execution...")
case resp := <-r:
if resp.Err != nil {
return err
Copy link
Copy Markdown

@luthermonson luthermonson Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt this return resp.Err? err at this point is going to be from the new client call above.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch. fixed in the latest commit!

@luthermonson luthermonson self-requested a review April 14, 2020 23:02
@aiyengar2 aiyengar2 dismissed stale reviews from superseb and luthermonson via b315426 April 15, 2020 17:42
@aiyengar2 aiyengar2 requested a review from dramich April 15, 2020 17:43
@aiyengar2 aiyengar2 force-pushed the uplift_azure branch 3 times, most recently from 8020420 to 8acc0d5 Compare April 15, 2020 22:03
luthermonson
luthermonson previously approved these changes Apr 16, 2020
@superseb superseb self-requested a review April 17, 2020 04:34
superseb
superseb previously approved these changes Apr 17, 2020
@luthermonson luthermonson removed the request for review from dramich April 17, 2020 05:17
This commit updates github.com/Azure/azure-sdk-for-go to 39.1.0 and github.com/Azure/go-autorest to 13.3.3.

Major updates:
- Redefines AuthenticateDeviceFlow and AuthenticateServicePrincipal (now renamed AuthenticateClientCredentials) as functions that return Authorizer objects after running ValidateAuthorizer instead of tokens
- Simplifies / eliminates several helper functions to use newer autorest libraries for Azure authentication
- Introduced arm.go + customimage.go to create a unified way of referring to resources that
are parsed from ARM resource identifiers
- Using contexts to make calls via the Azure client and evaluating the status of async requests using Azure futures (which support WaitForCompletionRef and future.Result), which removes redundant Get calls
- Node driver now implements Kill() as Azure now supports a force kill of an instance
- Remove isSwarm logic as docker swarm is no longer supported

Misc. Updates:
- rename Azure Driver.ctx to Driver.deploymentContext to avoid ambiguity with Go contexts
- Registered Microsoft.Subscription and Microsoft.Resources as resource providers
- Added some more necessary client logic to clients.go for future commits
- moved some Azure fmts to naming.go to centralize naming logic in one file
- Added more structs for different types of Azure resources to cleanup.go to create one common way of cleaning up resources instead of having redundant code
- Changes supportedEnvironments from a map to a list and adds azure.EnvironmentFromName to get the corresponding environment instead.
- Renamed HasAttachedResources to CanBeDeleted and flipped boolean logic accordingly
@luthermonson luthermonson merged commit c963cca into rancher:master Apr 17, 2020
aiyengar2 pushed a commit to aiyengar2/machine that referenced this pull request May 28, 2020
Reverts a change added as part of rancher#72 to fail on PreCreateCheck if the OS disk already exists.

Without this check, provisioning will either still fail on create if a machine tries to attach itself to a managed disk whose settings differ from those listed in `getOSDisk` here: https://github.com/rancher/machine/blob/2fbd7ec65dea41983a8222141df4ce58df7eaeac/drivers/azure/azureutil/azureutil.go#L633-L656. However, currently this also prevents a VM from using a previously created OS Disk (i.e. one that was left over from a previously failed provisioning process if it is a managed disk) that is configured correctly.
aiyengar2 pushed a commit to aiyengar2/machine that referenced this pull request May 28, 2020
Reverts a change added as part of rancher#72 to fail on PreCreateCheck if the OS disk already exists.

Without this check, provisioning will either still fail on create if a machine tries to attach itself to a managed disk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants