Conversation
…tempting to delete group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @nakomis! And many thanks for taking care of adding new tests. It's highly appreciated!
@@ -98,7 +109,8 @@ protected VirtualMachineInStatePredicateFactory provideVirtualMachineRunningPred | |||
@Named(TIMEOUT_RESOURCE_DELETED) | |||
protected Predicate<URI> provideResourceDeletedPredicate(final AzureComputeApi api, | |||
final ComputeServiceConstants.Timeouts timeouts, final PollPeriod pollPeriod) { | |||
return retry(new ActionDonePredicate(api), timeouts.nodeTerminated, pollPeriod.pollInitialPeriod, | |||
long timeout = timeouts.nodeTerminated; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undo this tiny change.
} | ||
deleted &= ipDeleted; | ||
} catch (Exception ex) { | ||
logger.warn(">> Error deleting ip %s", ip); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mark deleted = false
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: log the exception?
@@ -110,6 +122,27 @@ protected VirtualMachineInStatePredicateFactory provideNodeSuspendedPredicate(fi | |||
timeouts.nodeTerminated, pollPeriod.pollInitialPeriod, pollPeriod.pollMaxPeriod); | |||
} | |||
|
|||
@Provides | |||
@Named(TIMEOUT_RESOURCE_REMOVED) | |||
protected Predicate<IdReference> provideResourceRemovedPredicate(final AzureComputeApi api, final ComputeServiceConstants.Timeouts timeouts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this method is quite confusing. I'd suggest renaming to InResourceGroup
or similar (even if you have to change its semantics and return presence instead of absence).
logger.warn(">> disks not deleted: %s", Joiner.on(',').join(nonDeleted)); | ||
} | ||
|
||
Set<IdReference> disksNotInRg = Sets.filter(deleteDisks, resourceRemoved); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This predicate can be expensive and is just used to print a log. Put these lines inside a log.IsWarnEnabed
guard to save these calls if logging is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nacx The call to the predicate causes the method to block until the resource has been removed from the resource group (in addition to being deleted), so it is always necessary to call it
@@ -171,9 +216,12 @@ public boolean cleanupSecurityGroupIfOrphaned(String resourceGroup, String group | |||
logger.debug(">> deleting orphaned security group %s from %s...", name, resourceGroup); | |||
try { | |||
deleted = resourceDeleted.apply(sgapi.delete(name)); | |||
resourceRemoved.apply(IdReference.create(securityGroup.id())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to wait for this? Getting all resources in an RG looks like a expensive call? Is there any better call we could make instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, yes, we do need to wait. The issue that this PR is addressing is that even through a resource is deleted and (in this case) the NetworkSecuityGroup API is showing that group has been deleted, it is still showing up in the ResourceGroup. This appears to be an "eventual consistency" issue, so we need to wait until the resource no longer shows as a member of the resource group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly I can confirm this is the case. Even Azure Portal seems to have this type of eventual inconsistencies causing random failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, please add a comment in places where we are using this, to make explicit this is done to deal with eventual consistency issues. Otherwise, we may be tempted to accidentally change this in the future.
} catch (IllegalArgumentException e) { | ||
if (e.getMessage().contains("InUseSubnetCannotBeDeleted")) { | ||
// Can be ignored as virtualNetwork is in use | ||
logger.debug("Cannot delete virtual network %s as it is in use", virtualNetwork); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better change to warn
.
return resourceDeleted.apply(api.getVirtualMachineApi(group).delete(virtualMachine.name())); | ||
URI uri = api.getVirtualMachineApi(group).delete(virtualMachine.name()); | ||
boolean deleted = uri == null || resourceDeleted.apply(uri); | ||
resourceRemoved.apply(IdReference.create(virtualMachine.id())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Prefer the VM api get method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above about NetworkSecurityGroups / ResourceGroups
public void testResourceGroupDeletedOnDestroy() throws Exception { | ||
template = buildTemplate(templateBuilder()); | ||
nodes = newTreeSet(client.createNodesInGroup(group, 1, template)); | ||
NodeMetadata node = nodes.first(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: NodeMetadata node = Iterables.getOnlyElement(client.createNodesInGroup(group, 1, template));
client.destroyNode(node.getId()); | ||
List<Resource> resources = view.unwrapApi(AzureComputeApi.class).getResourceGroupApi().resources(resourceGroupName); | ||
if (!resources.isEmpty()) { | ||
Assert.fail("Resource group was not empty, contained " + resources); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: static import
|
||
// Verify deleted resource group | ||
verify(api, diskApi); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you're adding this (thanks!) add a method to verify that the virtual networks are deleted too.
Also add a test to verify that the resources that don't have the jclouds tags (availability sets, networks, etc), are not deleted.
+1 to that - many thanks! |
Optional<org.jclouds.azurecompute.arm.domain.Resource> resourceInGroup = Iterables.tryFind(attachedResources, new Predicate<org.jclouds.azurecompute.arm.domain.Resource>() { | ||
@Override | ||
public boolean apply(org.jclouds.azurecompute.arm.domain.Resource resource) { | ||
return resource.id().equalsIgnoreCase(input.id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about this - in which cases do we expect IDs to have different cases? Is there some documentation we could like to in a comment here to explain why we need a case-insensitive comparison here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should expect different cases.... everywhere :( Azure ARM is case insensitive and our provider should adapt to this.
There are currently lots of places where this could fail since we use equals or Map keys.
I had this trouble here: https://stackoverflow.com/questions/50068324/azure-arm-api-returns-locations-with-inconsistent-case/50977265#50977265 and it seems to happen to others too https://stackoverflow.com/questions/48561304/resource-group-name-case-insensitive-in-disk-ids
So in short, yes, we should always use equalsIgnoreCase for safety
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in short, yes, we should always use equalsIgnoreCase for safety
Thanks for explaining, @danielestevez! In that case, would it make sense to add a test for that as part of this PR?
resourceRemoved.apply(IdReference.create(virtualNetwork.id())); | ||
} catch (IllegalArgumentException e) { | ||
if (e.getMessage().contains("InUseSubnetCannotBeDeleted")) { | ||
// Can be ignored as virtualNetwork is in use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about this comment - does it mean we can or we have to try not to delete it? Given that the intent here is to delete, as Ignasi points out, should the log message be a warn
instead?
If so, we can probably remove the comment, as the log message says pretty much the same thing
URI uri = api.getResourceGroupApi().delete(group); | ||
deleted = uri == null || resourceDeleted.apply(uri); | ||
} else { | ||
logger.debug(">> the resource group %s contains %s. Not deleting...", group, attachedResources); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as above: should this be logger.warn
?
import com.google.common.collect.Lists; | ||
|
||
@Test(groups = "unit", testName = "CleanupResourcesTest") | ||
public class CleanupResourcesTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the case-insensitive resource ID comparison logic is important, do we need a test for that?
Hi, thanks for the feedback. I've addressed a couple of the comments above, and pushed a couple of changes to address the others |
Looks like the changes are note pushed? |
@nakomis does this pull request need anything to move forward? |
Please reopen against apache/jclouds if this is still relevant. |
Fixes an issue where Azure ARM resource groups are not deleted when a node is destroyed. This fix resolves the following Jiras
https://issues.apache.org/jira/browse/JCLOUDS-1330
https://issues.apache.org/jira/browse/JCLOUDS-1331