-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty preview step on tests is proposing changes that are not expected #199
Comments
Reopening. |
Another run of CI hit this same issue again in test Generally speaking, we're observing that the Note: The change in |
Full output logs with Note the changes in
Subnets Definition & Use:Note that The Diff is proposing dropping the last 2 elements in the array of 4 when it is rendered as a string for the CFTemplateBody. pulumi-eks/nodejs/eks/examples/tests/migrate-nodegroups/index.ts Lines 12 to 38 in 651c61c
Computing subnets & Using in CFTemplateBody:pulumi-eks/nodejs/eks/nodegroup.ts Lines 451 to 502 in 651c61c
|
async function computeWorkerSubnets(parent: pulumi.Resource, subnetIds: string[]): Promise<string[]> { | |
const publicSubnets: string[] = []; | |
const privateSubnets: string[] = []; | |
for (const subnetId of subnetIds) { | |
// Fetch the route table for this subnet. | |
const routeTable = await (async () => { | |
try { | |
// Attempt to get the explicit route table for this subnet. If there is no explicit rouute table for | |
// this subnet, this call will throw. | |
return await aws.ec2.getRouteTable({ subnetId: subnetId }, { parent: parent }); | |
} catch { | |
// If we reach this point, the subnet may not have an explicitly associated route table. In this case | |
// the subnet is associated with its VPC's main route table (see | |
// https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html#RouteTables for details). | |
const subnet = await aws.ec2.getSubnet({ id: subnetId }, { parent: parent }); | |
const mainRouteTableInfo = await aws.ec2.getRouteTables({ | |
vpcId: subnet.vpcId, | |
filters: [{ | |
name: "association.main", | |
values: ["true"], | |
}], | |
}, { parent: parent }); | |
return await aws.ec2.getRouteTable({ routeTableId: mainRouteTableInfo.ids[0] }, { parent: parent }); | |
} | |
})(); | |
// Once we have the route table, check its list of routes for a route to an internet gateway. | |
const hasInternetGatewayRoute = routeTable.routes | |
.find(r => !!r.gatewayId && !isPrivateCIDRBlock(r.cidrBlock)) !== undefined; | |
if (hasInternetGatewayRoute) { | |
publicSubnets.push(subnetId); | |
} else { | |
privateSubnets.push(subnetId); | |
} | |
} | |
return privateSubnets.length === 0 ? publicSubnets : privateSubnets; | |
} |
/cc @pgavlin @lukehoban thoughts?
@metral Have you been able to reproduce this locally outside of the test harness by following similar steps? This would be easier to investigate with a repro. I tried to take the test case in question I actually can't tell from the notes above whether this is the same steps that you believe are triggering this in CI, or whether there is a more involved set of steps that are leading to this. I'm also generally confused by many of the notes above. The test explicitly creates a VPC with only 3 public subnets and then deploys the cluster using 2 of the 3. I have no idea where the snippet you show above is ever coming up with 4 subnets - that doesn't even seem physically possible using the example in question here. |
I do not follow this - especially since (a) this example uses |
Oh - I guess I was confused as the examples in this issue shift between two different test cases. I think the references to 4 subnets are all related to |
Sorry for the lack of clarity. The 4 subnets was in ref to the
Also intended for the The repro is just running both I'm running a series of test on the most recent v0.18.10 master to see if this is even possible since I have not seen the error today, and will update with more clarity if necessary. |
Got it. I was unable to repro against |
Do you have copies of the checkpoints before and after this error? What exact set of stack operations are leading to this when it happens?
It should never have been possible to end up deploying only 1 subnet with this test. The key question is really how you managed to get a deployment with only 1 subnet in the first place. |
No, since I was running
Agreed, this is odd. I'll look into running more tests with checkpoints being saved in between each step. |
@lukehoban here is the command output for the envvars set:
|
Another set of output from a failed CI job yesterday for the This time, unlike the usual failed runs of this test, goes from using 1 subnet to 2 subnets, whereas other failed runs usually propose less subnets to use:
envvars set in CI:
|
I believe I know what's happening here. FWIW, this repro'd locally when I ran the Pulumi program manually (i.e. outside of the test harness). I applied the following patch to determine which subnet IDs were being passed to the cluster's constructor and the subnet groupings determined by diff --git a/nodejs/eks/examples/tests/replace-cluster-add-subnets/index.ts b/nodejs/eks/examples/tests/replace-cluster-add-subnets/index.ts
index 2feb420..da7d293 100644
--- a/nodejs/eks/examples/tests/replace-cluster-add-subnets/index.ts
+++ b/nodejs/eks/examples/tests/replace-cluster-add-subnets/index.ts
@@ -13,6 +13,8 @@ const publicSubnetIds = vpc.publicSubnetIds.sort();
// Remove this line after the init update to repro: https://git.io/fj8cn
publicSubnetIds.pop();
+pulumi.all(publicSubnetIds).apply(ids => console.log(`public subnets IDs: ${JSON.stringify(ids)}`));
+
const cluster = new eks.Cluster(projectName, {
vpcId: vpc.vpcId,
subnetIds: publicSubnetIds,
diff --git a/nodejs/eks/nodegroup.ts b/nodejs/eks/nodegroup.ts
index aa26340..e683383 100644
--- a/nodejs/eks/nodegroup.ts
+++ b/nodejs/eks/nodegroup.ts
@@ -551,6 +551,8 @@ async function computeWorkerSubnets(parent: pulumi.Resource, subnetIds: string[]
// this subnet, this call will throw.
return await aws.ec2.getRouteTable({ subnetId: subnetId }, { parent: parent });
} catch {
+ console.log("looking for main route table");
+
// If we reach this point, the subnet may not have an explicitly associated route table. In this case
// the subnet is associated with its VPC's main route table (see
// https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html#RouteTables for details).
@@ -575,6 +577,7 @@ async function computeWorkerSubnets(parent: pulumi.Resource, subnetIds: string[]
privateSubnets.push(subnetId);
}
}
+ console.log(`subnets: ${JSON.stringify({publicSubnets, privateSubnets})}`);
return privateSubnets.length === 0 ? publicSubnets : privateSubnets;
} Running the example produced the following output:
Note that
The worst-case scenario for this race is when some but not all In order to solve this issue, we must either Of these, I think that (b) is a more reliable approach, though it does put more of the onus on the user. In the case of this example, we can either have |
I tried this locally, and it doesn't work: simply adding the VPC as a dependency cannot block the diff --git a/nodejs/eks/examples/tests/replace-cluster-add-subnets/index.ts b/nodejs/eks/examples/tests/replace-cluster-add-subnets/index.ts
index 2feb420..3bbfeef 100644
--- a/nodejs/eks/examples/tests/replace-cluster-add-subnets/index.ts
+++ b/nodejs/eks/examples/tests/replace-cluster-add-subnets/index.ts
@@ -13,6 +13,8 @@ const publicSubnetIds = vpc.publicSubnetIds.sort();
// Remove this line after the init update to repro: https://git.io/fj8cn
publicSubnetIds.pop();
+publicSubnetIds = publicSubnetIds.map(id => pulumi.all([vpc.urn, id]));
+
const cluster = new eks.Cluster(projectName, {
vpcId: vpc.vpcId,
subnetIds: publicSubnetIds, |
cc @CyrusNajmabadi as an FYI as well - as this is an interesting case of something we may ultimately need to fix in |
The alternate approach I suggested above did not work either, presumably because |
We could also migrate that test to use |
I plan on moving it to See #199 (comment) and its definition here: pulumi-eks/nodejs/eks/examples/tests/migrate-nodegroups/index.ts Lines 14 to 22 in c938d30
|
`publicSubnetIds` does not properly incoporate a dependency upon each subnet's `RouteTableAssociation`, which causes a race with the code in `computeWorkerSubnets` that reads the subnet's route table in order to determine the sets of public and private subnets. Part of #199.
I can't speak to that test, as I haven't yet attempted a repro. My read of the |
It is possible that AWS may have an eventual consistency thing happening here on top of the dependency issues mentioned above. |
Per hashicorp/terraform#5335 (comment), it sounds like this class of eventual consistency can happen in AWS.
|
I believe the right answer here is going to be to add the option to pass We could also consider adding Either way - these are changes we should make in the EKS library - there does not appear to strictly speaking be a bug in any lower layers here (except for the overall eventual consistency in AWS itself). |
There is a bug in the deprecated |
@pgavlin Do you want to open an issue in |
After the initial update of the
replace-cluster-add-subnets
test, the empty preview step is being shown changes even though it should not:templateBody
of the ASG being planned for an update.e.g. here is a sample of the changes being proposed from a run of another test, the
migrate-nodegroups
, also hitting this change proposal fortemplateBody
:The text was updated successfully, but these errors were encountered: