Skip to content
This repository has been archived by the owner on Jul 25, 2020. It is now read-only.

Fix/azure resource removal #1240

Closed
wants to merge 2 commits into from

Conversation

nakomis
Copy link
Contributor

@nakomis nakomis commented Sep 12, 2018

Fixes an issue where Azure ARM resource groups are not deleted when a node is destroyed. This fix resolves the following Jiras

https://issues.apache.org/jira/browse/JCLOUDS-1330
https://issues.apache.org/jira/browse/JCLOUDS-1331

Copy link
Member

@nacx nacx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @nakomis! And many thanks for taking care of adding new tests. It's highly appreciated!

@@ -98,7 +109,8 @@ protected VirtualMachineInStatePredicateFactory provideVirtualMachineRunningPred
@Named(TIMEOUT_RESOURCE_DELETED)
protected Predicate<URI> provideResourceDeletedPredicate(final AzureComputeApi api,
final ComputeServiceConstants.Timeouts timeouts, final PollPeriod pollPeriod) {
return retry(new ActionDonePredicate(api), timeouts.nodeTerminated, pollPeriod.pollInitialPeriod,
long timeout = timeouts.nodeTerminated;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo this tiny change.

}
deleted &= ipDeleted;
} catch (Exception ex) {
logger.warn(">> Error deleting ip %s", ip);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark deleted = false too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: log the exception?

@@ -110,6 +122,27 @@ protected VirtualMachineInStatePredicateFactory provideNodeSuspendedPredicate(fi
timeouts.nodeTerminated, pollPeriod.pollInitialPeriod, pollPeriod.pollMaxPeriod);
}

@Provides
@Named(TIMEOUT_RESOURCE_REMOVED)
protected Predicate<IdReference> provideResourceRemovedPredicate(final AzureComputeApi api, final ComputeServiceConstants.Timeouts timeouts,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this method is quite confusing. I'd suggest renaming to InResourceGroup or similar (even if you have to change its semantics and return presence instead of absence).

logger.warn(">> disks not deleted: %s", Joiner.on(',').join(nonDeleted));
}

Set<IdReference> disksNotInRg = Sets.filter(deleteDisks, resourceRemoved);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate can be expensive and is just used to print a log. Put these lines inside a log.IsWarnEnabed guard to save these calls if logging is disabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nacx The call to the predicate causes the method to block until the resource has been removed from the resource group (in addition to being deleted), so it is always necessary to call it

@@ -171,9 +216,12 @@ public boolean cleanupSecurityGroupIfOrphaned(String resourceGroup, String group
logger.debug(">> deleting orphaned security group %s from %s...", name, resourceGroup);
try {
deleted = resourceDeleted.apply(sgapi.delete(name));
resourceRemoved.apply(IdReference.create(securityGroup.id()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to wait for this? Getting all resources in an RG looks like a expensive call? Is there any better call we could make instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, yes, we do need to wait. The issue that this PR is addressing is that even through a resource is deleted and (in this case) the NetworkSecuityGroup API is showing that group has been deleted, it is still showing up in the ResourceGroup. This appears to be an "eventual consistency" issue, so we need to wait until the resource no longer shows as a member of the resource group

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly I can confirm this is the case. Even Azure Portal seems to have this type of eventual inconsistencies causing random failures

Copy link
Member

@nacx nacx Nov 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, please add a comment in places where we are using this, to make explicit this is done to deal with eventual consistency issues. Otherwise, we may be tempted to accidentally change this in the future.

} catch (IllegalArgumentException e) {
if (e.getMessage().contains("InUseSubnetCannotBeDeleted")) {
// Can be ignored as virtualNetwork is in use
logger.debug("Cannot delete virtual network %s as it is in use", virtualNetwork);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better change to warn.

return resourceDeleted.apply(api.getVirtualMachineApi(group).delete(virtualMachine.name()));
URI uri = api.getVirtualMachineApi(group).delete(virtualMachine.name());
boolean deleted = uri == null || resourceDeleted.apply(uri);
resourceRemoved.apply(IdReference.create(virtualMachine.id()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Prefer the VM api get method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above about NetworkSecurityGroups / ResourceGroups

public void testResourceGroupDeletedOnDestroy() throws Exception {
template = buildTemplate(templateBuilder());
nodes = newTreeSet(client.createNodesInGroup(group, 1, template));
NodeMetadata node = nodes.first();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: NodeMetadata node = Iterables.getOnlyElement(client.createNodesInGroup(group, 1, template));

client.destroyNode(node.getId());
List<Resource> resources = view.unwrapApi(AzureComputeApi.class).getResourceGroupApi().resources(resourceGroupName);
if (!resources.isEmpty()) {
Assert.fail("Resource group was not empty, contained " + resources);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: static import


// Verify deleted resource group
verify(api, diskApi);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you're adding this (thanks!) add a method to verify that the virtual networks are deleted too.

Also add a test to verify that the resources that don't have the jclouds tags (availability sets, networks, etc), are not deleted.

@demobox
Copy link
Member

demobox commented Sep 19, 2018

Thanks, @nakomis! And many thanks for taking care of adding new tests. It's highly appreciated!

+1 to that - many thanks!

Optional<org.jclouds.azurecompute.arm.domain.Resource> resourceInGroup = Iterables.tryFind(attachedResources, new Predicate<org.jclouds.azurecompute.arm.domain.Resource>() {
@Override
public boolean apply(org.jclouds.azurecompute.arm.domain.Resource resource) {
return resource.id().equalsIgnoreCase(input.id());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about this - in which cases do we expect IDs to have different cases? Is there some documentation we could like to in a comment here to explain why we need a case-insensitive comparison here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should expect different cases.... everywhere :( Azure ARM is case insensitive and our provider should adapt to this.
There are currently lots of places where this could fail since we use equals or Map keys.
I had this trouble here: https://stackoverflow.com/questions/50068324/azure-arm-api-returns-locations-with-inconsistent-case/50977265#50977265 and it seems to happen to others too https://stackoverflow.com/questions/48561304/resource-group-name-case-insensitive-in-disk-ids
So in short, yes, we should always use equalsIgnoreCase for safety

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in short, yes, we should always use equalsIgnoreCase for safety

Thanks for explaining, @danielestevez! In that case, would it make sense to add a test for that as part of this PR?

resourceRemoved.apply(IdReference.create(virtualNetwork.id()));
} catch (IllegalArgumentException e) {
if (e.getMessage().contains("InUseSubnetCannotBeDeleted")) {
// Can be ignored as virtualNetwork is in use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about this comment - does it mean we can or we have to try not to delete it? Given that the intent here is to delete, as Ignasi points out, should the log message be a warn instead?

If so, we can probably remove the comment, as the log message says pretty much the same thing

URI uri = api.getResourceGroupApi().delete(group);
deleted = uri == null || resourceDeleted.apply(uri);
} else {
logger.debug(">> the resource group %s contains %s. Not deleting...", group, attachedResources);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment as above: should this be logger.warn?

import com.google.common.collect.Lists;

@Test(groups = "unit", testName = "CleanupResourcesTest")
public class CleanupResourcesTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the case-insensitive resource ID comparison logic is important, do we need a test for that?

@nakomis
Copy link
Contributor Author

nakomis commented Nov 21, 2018

Hi, thanks for the feedback. I've addressed a couple of the comments above, and pushed a couple of changes to address the others

@nacx
Copy link
Member

nacx commented Nov 23, 2018

Hi, thanks for the feedback. I've addressed a couple of the comments above, and pushed a couple of changes to address the others

Looks like the changes are note pushed?

@gaul
Copy link
Member

gaul commented Jan 28, 2019

@nakomis does this pull request need anything to move forward?

@gaul
Copy link
Member

gaul commented Jul 25, 2020

Please reopen against apache/jclouds if this is still relevant.

@gaul gaul closed this Jul 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
5 participants